"Goal-Oriented AI"

Barrett Burnworth

☝️ What's up?

Exploring Semantic Kernel

image

After using autonomous agents, BabyAGI, Auto-GPT, it feels like these things currently waste way too many cycles trying to refine its output. Planning/coordination, prompting and chaining is still the most efficient way to get a desired result.

“One more (of many more), I’ve found that providing hand-crafted mini agents as tools works better than simply providing tools in the traditional sense.

As a simple example, every time you search, you want to scrape. So combine them into one chained tool vs each as separate.

This last example takes some load off of the LLM, and handles it with coding logic.

These kind of trade offs can be limiting at times, but also help w cost and speed.

Seems more art than science. Lots of nuance and variables to balance.”

From [@yoheinakajima via Twitter on the questions of “Where are we with autonomous agents? What can they do? Should we use one?”](<blockquote class="twitter-tweet text-xs text-secondary-400" data-dnt="true">Loading Tweet…</blockquote>?s=20)

So, for now, I will take this path for the CWT work. However, connecting all of this together is the next task. Semantic Kernel is Microsoft’s approach to langchain. I like the simplicity that it brings with connecting things up. I like John Maeda, as-well, so that is another reason I am leaning towards it.

image

According to Maeda, the approach here is “goal-oriented AI.” He described Skills as “the core building blocks in SK” and noted that they can be both simple (“Please summarize this piece of text for me”) or complex (“Please summarize everything I need to know for today and build a plan for how I need to get done what I need to accomplish”).

Memory increases the capability of a Skill tremendously by allowing you to tie the Skill’s capability to your historical data that can be persisted and accessed at any time,” he said, while Connectors “are customizable resources that enable external data access.” He added that connectors help address one of the primary criticisms of LLMs — that they are pre-trained, and so “essentially frozen in time.” 1

Some resources, links and more Maeda quotes below.

Quotes

“Traditionally, computer science education has been about achieving structured outputs from well-structured syntax,” he replied, “but actually the more flexible mindset of data scientists or even creative artists can be assistive in order to navigate this new world.”

“The software world has been dominated by people who can literally speak machine,” he replied. “So, it’s an interesting turn of events that this new kind of programming is much closer to natural language. If your goal is to produce writing as an output, then there’s certainly room for many language-proficient English teachers to have an impact. That said, to be a productive ‘prompt engineer’ still requires you to have the ability to think like an engineer. There’s a reason why the engineering field emerged as a discipline; it’s always attracted those who love to build machinery. In the future, we can count on prompts that are engineered by developers to have qualities we both need and want — like reliability and efficiency. That won’t change. The difference is that developers will be able to pair up with AIs to create even more reliable and efficient systems than ever before.”

Why the word Kernel in the name?

“It’s a tip of the hat to that all-time enabler of computational productivity, the UNIX kernel,” he replied. “For those of your readers who remember when the UNIX kernel emerged, I think we all were a little confused by commands comprised of two characters ‘ls’, ‘cd’, ‘ps’ etc. But the big ‘a-ha’ was when we piped commands with the ‘|’ symbol and suddenly the light came on. The UNIX kernel’s simplicity as a landmark user experience for developers has been the north star for SK during its evolution. And we definitely don’t feel we have it right yet. That’s why we released it as open source. So that we can learn in the open as a community, and hopefully together build the right user experience for developers who are excited as we are by this new shift from syntax to semantics.” 1

John’s “Kitchen Sink” talk to the Onetug.net group:

Bookmarks

Barrett Burnworth

☝️ What's up?

https://promptperfect.jina.ai/

https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/

https://github.com/microsoft/guidance

https://lilianweng.github.io/posts/2023-06-23-agent/

Vector Database

Qdrant

Reading: https://qdrant.tech/documentation/concepts/collections/

Pinecone

Reading: https://www.pinecone.io/learn/vector-database/

^^ !!!! A camera that takes no photograph, but records geolocation and describes the scene. Generates an image based on the description. 😍

https://bjoernkarmann.dk/project/paragraphica

Paragraphica is a camera that uses location data and artificial intelligence to visualize a “photo” of a specific place and moment. The camera exists both as a physical prototype and a virtual camera that you can try.

https://www.rephrase.ai/

https://www.blings.io/ (ooh - mp5(?) JSON with layers… nothing in docs or specs on what this is, really, but is intriguing for CWT)

https://zapier.com/blog/best-ai-video-generator/#invideo

Mind to Image

Mind reading hats on the way! https://medarc-ai.github.io/mindeye/

Text to Image Personalization

https://skybox.blockadelabs.com/

Start with an image; personalize it to a new image based on the prompt + starter image

image

Text to Video

https://www.kapwing.com/ai-video-generator

https://imagen.research.google/video/

https://fliki.ai/ - Blog Posts to Videos

Video Diffusion Model

https://video-diffusion.github.io/

Video to Video

https://stable-diffusion-art.com/video-to-video/

Sound

Narration

https://google-research.github.io/seanet/soundstorm/examples/

https://github.com/lucidrains/soundstorm-pytorch

Jarvis

Jarvis is probably close to what I’m thinking with automated CWT tooling. But, I’m not completely certain I want to trust everything to AI. Automated AI agents seem to have to reason about their actions multiple over cycles to refine their output. It feels like more granular control will be less expensive. Orchestration tools like Langchain or Semantic Kernel would probably be my first choice over agents. @mojombo’s specialty is in seeing simplistic, obvious solutions to technical problems like this. ::waiting:: for him to get interested in AI tooling connectivity :grinning:

https://github.com/microsoft/JARVIS

image

Semantic Kernel

image https://learn.microsoft.com/en-us/semantic-kernel/media/kernel-flow.png

Carvana Generates Highly Personalized Videos for Customers Using AI

Barrett Burnworth

☝️ What's up?

Each ‘Joyride’ video relates the unique origin story of how an owner ‘met’ their car

https://adage.com/article/digital-marketing-ad-tech-news/carvana-gifts-ai-generated-video-histories-car-owning-customers/2493301

From twitter, showing a tiktok by ‘ElectronicBert’

Worth watching this whole video, pretty wild showcase of real world generative video.

Carvana created 1.3 million hyper-personalized videos.

Script, voice and video personalized to the specific user and their vehicle.

Customized based on details from user purchase: names, dates, addresses & created all of the videos using this information. Created a 2 minutes video script.

?s=20

Cinematic Writing Tool Landscape Map

Barrett Burnworth

☝️ What's up?

Posting a simple wardley map. I often feel that wardley maps illustrate an obvious situation. However, a primary goal is to get everyone on common ground by looking at the same things. With only verbal discussion, people can easily lose context, their mind can wander, and they can drift in focus. With a graph, position sometimes has no meaning and is often vague because of no anchor point. With charts there is no space to maneuver. A map like this shows the landscape of a product or idea, the adjacent space to maneuver within, and is anchored to a common point - usually the User. Positional changes affect the map completely. Just like a military map, it is essential to show everyone the landscape which a situation is unfolding within, so that discussion and strategy stem from the same point.

Lastly, even if things are not true on a wardley map, or are up for debate (which they are!), it helps to know where others on your team believe things stand. It helps because it gets everyone talking and thinking about the space around the map. It helps to move pieces around, which brings challenge to the strategy, because the position does matter. For instance, moving something along the evolution axis impacts project management style, among other things. Product visibility (Y-Axis) to the customer helps to know what the customer will see, or not see, and can also influence decisions or direction. With a map, discussion and challenge is the goal.

Discussion

As AI Tooling evolves, it opens up new areas for those tools to be used. One area will be writing tools. The map could have shaded areas, but for now the colors represent old/new.

  • The old, established writing tools are blue. These are ubiquitous, and are in a stable, basically unchanging state. These are a commodity. Pen and paper. blog platforms, journals, etc.

  • The new, evolving writing tools are red. These are evolving and in a custom state. They are still finding their market, and the market will change as the tools evolve. This is where generative AI is opening the landscape.

CWTs (Cinematic Writing Tools) fall into the Genesis/Custom state. Very visible to the customer. Very new. Still evolving. Generative AI is part of them, but does not encompass everything yet. Many of the pieces are there, ready to be connected. How they will evolve, and what role they will play, remains to be seen. If there is competition, evolution will happen. As with many evolving products, the markets will change as the product evolves. Initial potential markets are: Journaling Apps, Writing Apps, Directing/Storyboard Apps. These are just the obvious areas that I know about.

image

Post-717

Barrett Burnworth

☝️ What's up?

Looking for music soundtrack generation also. MusicLM from google https://google-research.github.io/seanet/musiclm/examples/

So the idea is to create layers of audio. Have an underlying soundtrack, have the scene related sounds, and then possibly narration overlaid.

Example prompt from the above link:

The main soundtrack of an arcade game. It is fast-paced and upbeat, with a catchy electric guitar riff. The music is repetitive and easy to remember, but with unexpected sounds, like cymbal crashes or drum rolls.

This can be one shot to the scene generator as an example output.

Also, bookmarking this: https://vcai.mpi-inf.mpg.de/projects/DragGAN/ because it looks amazing!