Spoke CWT Agent | Prompting Experiments

Bookmark for later

https://plasticlabs.ai/blog/Theory-of-Mind-is-All-You-Need/

I have the SK book creator example running. It works and I like the way it interacts. It is a react frontend that communicates with an api running azure functions. It uses OpenAI and takes your input and generates a list of book ideas. You can select one of the book ideas and then it generates chapters. Semantic Kernel uses ‘skills’ which your input is sent to, and then the skill is sent to openAI as a a prompt. I am creating new skills based on the prompts for CWT, like below. The first step is to get the app to accept the input and feed my custom prompts. The second step will be to parse the returned data. Third - send the returns to another service to generate the sounds, music or images.

The prompts were in a long post, but have been separated into individual posts. They were getting long and I did not like that. All prompts are based on this scene of a friend’s description of his night a few weeks back.

Sitting at my desk with the lamp on, it is dark outside, turned my record player on. I have a ton of incense and light one. I sit down and draw at my desk.

Image Generation

Prompt - Dall-E 2

Again, prompting GPT to generate image prompts of the scene.

Brainstorm images that would be appropriate for the following scene: “Sitting at my desk with the lamp on, it is dark outside, turned my record player on. I have a ton of incense and light one. I sit down and draw at my desk.”

Must: be chronological in order of what happens in the scene

Must: describe a scenario - the event/activity/object being depicted

Must: a style - a description of the artist to mimic, the mood to capture, the artistic technique to emulate

Must: be in the style of images you might find in a comic book, graphic novel, or flipbook

Must: brainstorm ideas and create a list.

Must: use a numbered list.

Must: only one list.

Should: no more than 5 items.

Should: at least 3 items.

Return it in JSON using the following format: [{ “Image”: 1, “content”:”content of the image” }]

I would prefer midjourney… not so hot on stable diffusion. I haven’t looked at other image generation models recently to know another open model to try.

Result

Image 1

[{“Image”: 1, “content”: “The scene is depicted from a low angle, capturing the desk with a dimly lit lamp in the foreground. The record player sits on the corner of the desk, with a vinyl spinning on it. The room is enveloped in darkness, visible through the window behind the desk. The character is seen sitting at the desk, engrossed in drawing, with a pencil in hand and various art supplies scattered around. The air is filled with the aroma of burning incense, creating a serene and contemplative atmosphere. The style emulates the work of a skilled comic book artist, with detailed line work, shading, and textures to enhance the mood and capture the character’s focus and artistic expression.”},

Image 2

{“Image”: 2, “content”: “This image focuses on the record player, which occupies the center of the panel. The stylized turntable is shown spinning, emitting sound waves depicted as vibrant, colorful lines. The lamp on the desk casts a warm glow on the surrounding area, creating a cozy ambiance. The dark outside is hinted at through a window in the background, suggesting the passage of time. The character sits nearby, engrossed in their drawing, with a sketchbook and pencils strewn across the desk. The art style mimics a graphic novel, with bold lines, expressive characters, and dynamic panel composition, capturing the character’s immersion in both music and art.”},

Image 3

{“Image”: 3, “content”: “In this image, the focus is on the character at their desk. The room is enveloped in darkness, with only the desk area illuminated by the lamp. The character is shown sitting in a contemplative posture, engrossed in their drawing. The desk is cluttered with art supplies, including brushes, paint tubes, and sketchbooks, emphasizing the character’s dedication to their craft. Incense smoke gracefully rises from a holder nearby, adding a mystical touch to the scene. The art style emulates the work of a renowned comic book artist, with intricate linework, realistic shading, and attention to facial expressions, capturing the character’s intense focus and passion for art.”}]

Thoughts

This one will take more effort to make a cohesive scene between images. I’m not sure if this will always be doable, especially if the writer wants different images within one scene. I could make it require certain context in each image description. But, again with the problem just mentioned, this might not be desired depending on the scene. Another thing I have thought about it letting the user somehow manage the context for each generator in a slideout. Although, this seems strange because I am using the written scene to infer context. Not sure if putting the first image plus the image description into the model will work.

Also, Dall-E’s prompt box cut the image text short about a sentence and a half on each. So, either find another model that takes longer input or shorten the sentence.

Used Facebook’s MusicGen HF Space for the music samples. Still used the prompt templates from Google’s MusicLM.

I am slowly becoming less enamored with the CWT workflow… because it seems so piecemeal. However, I am still plugging away to put some things together and really see how it works. One thing it does make me realize is that as a writer, more context would help set the mood. So, seeing what the AI generates as descriptions vs what you as the writer have in mind for the scene might be totally different. In that light, it might make the writer be more descriptive or change their language.

Prompting Experiments

Barrett Burnworth

June 9, 2023 2:29am

☝️ What's up?

Bookmark for later

Image Generation

Prompt - Dall-E 2

Result

Thoughts