Agentic AI – 2
Daniel Warfield writes: “It’s quickly becoming apparent that, while LLMs are exciting, they’re not a silver bullet. AI needs clever designers to wrangle it into a focused and powerful offering so it can actually be useful to consumers. Agentic systems seem to be the shining north star towards building successful LLM powered applications…Imagine if, instead of asking a language model to give you some output immediately, you asked a language model to do things like this: “You’ve been given a complex question, think about what to do next”, “You have access to a few tools, think about which one you can use them to best assist the user”, “You just output some information. Was it correct? Would you like to revisit that idea or move on?”. Essentially, Agents create a framework which allows a language model to reason about it’s previous output and decide to use tools to seek external sources of information.
Andrew Ng has been discussing this in his newsletters.
March 6, 2024: “Although today’s research agents, whose tasks are mainly to gather and synthesize information, are still in an early phase of development, I expect to see rapid improvements. ChatGPT, Bing Chat, and Gemini can already browse the web, but their online research tends to be limited; this helps them get back to users quickly. But I look forward to the next generation of agents that can spend minutes or perhaps hours doing deep research before getting back to you with an output. Such algorithms will be able to generate much better answers than models that fetch only one or two pages before returning an answer.”
March 20, 2024: Today, we mostly use LLMs in zero-shot mode, prompting a model to generate final output token by token without revising its work. This is akin to asking someone to compose an essay from start to finish, typing straight through with no backspacing allowed, and expecting a high-quality result. Despite the difficulty, LLMs do amazingly well at this task! With an agent workflow, however, we can ask the LLM to iterate over a document many times. For example, it might take a sequence of steps such as: plan an outline, decide what, if any, web searches are needed to gather more information, write a first draft, read over the first draft to spot unjustified arguments or extraneous information, revise the draft taking into account any weaknesses spotted, and so on. This iterative process is critical for most human writers to write good text. With AI, such an iterative workflow yields much better results than writing in a single pass.
Andrew shared a framework for categorising design patterns for building agents:
- Reflection: The LLM examines its own work to come up with ways to improve it.
- Tool Use: The LLM is given tools such as web search, code execution, or any other function to help it gather information, take action, or process data.
- Planning: The LLM comes up with, and executes, a multistep plan to achieve a goal (for example, writing an outline for an essay, then doing online research, then writing a draft, and so on).
- Multi-agent collaboration: More than one AI agent work together, splitting up tasks and discussing and debating ideas, to come up with better solutions than a single agent would.
This video has more. [Kitty (Sijia) Shen has a summary.]