記事:Auto-GPT and BabyAGI: How ‘autonomous agents’ are bringing generative AI to the masses

Autonomous agents may mark an important step toward a world where AI-driven systems are smart enough to work on their own, without need of human involvement.
Over the past week, developers around the world have begun building “autonomous agents” that work with large language models (LLMs) such as OpenAI’s GPT-4 to solve complex problems. While still very new, such agents could represent a major milestone in the productive application of LLMs.

Normally, we interact with GPT-4 by typing carefully worded prompts into ChatGPT’s text window until the model generates the output we want. But most of us lack the skill and patience to sit and write prompt after prompt, guiding the LLM toward answering a complex question, such as “What is the optimal business plan for capturing 20% of the fingernail-polish market?” Quite naturally, developers have been thinking of ways to automate much of that process. That’s where autonomous agents come in.

In general terms, autonomous agents can generate a systematic sequence of tasks that the LLM works on until it’s satisfied a preordained “goal.” Autonomous agents can already perform tasks as varied as conducting web research, writing code, and creating to-do lists.

Agents effectively add a traditional software interface to the front of a large language model. And that interface can use well-known software practices (such as loops and functions) to guide the language model to complete a general objective (such as, “find all YouTube videos about the Great Recession and distill the key points”). Some people call them “recursive” agents because they run in a loop, asking the LLM questions, each one based on the result of the last, until the model produces a full answer.

The seminal autonomous agent BabyAGI was created by Yohei Nakajima, a VC and habitual coder and experimenter. He describes BabyAGI as an “autonomous AI agent that contains an AI task manager.”

Nakajima, a partner at the small VC firm Untapped Capital, says he originally set out to build an agent that would automate some of the tasks he routinely performs as a VC—researching new technologies and companies, and so on—by replicating his own workflow. “I wake up in the morning and tackle the first thing on the list, and throughout the day I add new tasks, and then at night I review my tasks and reprioritize them, then decide what to do the next day,” he says. BabyAGI also systematically completes, adds, and reprioritizes tasks for the GPT-4 language model to complete.

Realizing that his creation could be applied to all sorts of other objectives, Nakajima stripped the agent down to bare bones (105 lines of code), and uploaded it on GitHub for others to use as a foundation for their own (more specialized) agents.

Nakajima says he’s been inspired by the ways other developers are enhancing BabyAGI. Some developers have added moderation functions, he says, along with the ability to work on parallel tasks, the ability to generate additional agents, as well as adding code-writing and robotics functionality.



一般論として、自律型エージェントは、LLMがあらかじめ決められた “ゴール “を満たすまで取り組む、体系的な一連のタスクを生成することができます。自律型エージェントはすでに、ウェブ調査、コード作成、ToDoリスト作成など、さまざまなタスクを実行することができます。


自律型エージェントの代表格であるBabyAGIは、VCであり常習的なコーダーであり実験者でもある中島洋平氏によって作られました。彼はBabyAGIを “AIタスクマネージャーを含む自律型AIエージェント “と表現しています。




Auto-GPT appears to have even more autonomy. Developed by Toran Bruce Richards, Auto-GPT is described on GitHub as a GPT-4-powered agent that can search the internet in structured ways. It can create subtasks and launch new agents to complete them. It uses GPT-4 to write its own code, then can “recursively debug, develop and self-improve” the code.

Auto-GPT can be used for any number of problems, but the example case described on GitHub concerns a “chef” trying to manage and grow a culinary business. In the example, the “Chef-GPT” agent “autonomously develops and manages businesses to increase net worth.”

Richards said he originally wanted an AI agent to automatically email him daily AI news. But, as he told Motherboard, he realized in the process that existing LLMs struggle with “tasks that require long-term planning,” or are “unable to autonomously refine their approaches based on real-time feedback.” That understanding inspired him to create Auto-GPT, which, he said, “can apply GPT4’s reasoning to broader, more complex problems that require long-term planning and multiple steps.” (Richards didn’t respond for a request for an interview with Fast Company.)

Autonomous agents, at this early stage, are mainly experimental. And they have some serious limitations that prevent them from getting what they want from large language models.

They often struggle to keep the LLM focused on an objective. LLMs, after all, are not very predictable. If two users write the same prompt in ChatGPT, for example, they’ll get different answers from the model every time.
Vancouver-based developer Sully Omar worked on an agent that he hoped would do some market research on waterproof shoes, but the LLM, for some reason, became distracted and began focusing its attention on shoelaces.

“They get confused,” Omar says. “They’re not able to understand ‘I’ve done this—I’m going in a loop.’”

Omar says developers will likely find new ways of letting autonomous agents put “guardrails” around the LLM so that they continue completing tasks without getting sidetracked.

And it’s important to remember that autonomous agents only began to appear on GitHub (and Twitter) a little more than a week ago. Given the energy around generative AI and the current pace of development, there’s reason to believe that agents will overcome their early limitations.

“The fact that it’s been only nine days means that there’s so much that could happen,” Omar says.
And that’s a big part of the reason for all the current interest in (and hype around) autonomous agents. They suggest an important step toward artificial general intelligence (AGI), where AI-driven systems are smart enough to work on their own, without need of human involvement.

In fact, when I asked Nakajima for an easy way to understand autonomous agents, he described the “agent” as an AI itself, not just a software program that prompts an LLM.

“If you could have two ChatGPTs talk to each other they could talk forever given the right guidance,” he said. “Then you could turn one of them into a task manager to create the tasks, and the other one into the task doer . . . and they would just continue to do work after you press Go.”

Nakajima told me a friend of his half-jokingly came up with the name BabyAGI. BabyAGI isn’t “generally intelligent,” but its architecture suggests an approach to pushing large language models toward something like AGI.

An AI operating with autonomy is a notion that makes us humans nervous at an almost instinctual level. We fear a future where AI systems begin working together faster than humans can understand, and toward goals that may misalign with our own interests. Under every tweet announcing a new autonomous agent, you’ll find subtweets asking about the possibility that the agent and the LLM could go rogue and begin causing harm.

Autonomous agents, as promising as they are, might add even more fuel to the belief that the tech industry should somehow put large language model development on “pause” until the likely outcomes and risks are better understood.

Auto-GPTは、さらに自律性が高いようです。Toran Bruce Richardsが開発したAuto-GPTは、GPT-4を搭載したエージェントで、構造化された方法でインターネットを検索できるとGitHubで説明されています。また、サブタスクを作成し、それを完了させるために新しいエージェントを起動させることができます。GPT-4を使って独自のコードを書き、そのコードを「再帰的にデバッグ、開発、自己改善」することができます。

Auto-GPTは様々な問題に利用できますが、GitHubで紹介されている事例は、料理ビジネスを管理・成長させようとする「シェフ」についてのものです。この例では、「Chef-GPT」エージェントが “自律的にビジネスを開発・管理し、純資産を増やす “というものです。

リチャーズはもともと、毎日のAIニュースを自動でメールしてくれるAIエージェントを望んでいたという。しかし、Motherboardに語ったように、彼はその過程で、既存のLLMが “長期的な計画を必要とするタスク “に苦戦していたり、”リアルタイムのフィードバックに基づいて自律的にアプローチを改良することができない “ということに気づいた。この理解から、彼はAuto-GPTを作ることを思いつきました。Auto-GPTは、”長期的な計画と複数のステップが必要な、より広範で複雑な問題にGPT4の推論を適用できる “と彼は述べています。(リチャーズはFast Companyの取材依頼に応じませんでした)



「彼らは混乱してしまうんです。”私はこれをやったんだ、ループに入るんだ “ということが理解できないんです」。



そして、このことが、現在自律型エージェントが注目され、大騒ぎになっている理由の大きな部分を占めているのです。自律型エージェントは、人工知能(AGI:Artificial General Intelligence)に向けた重要な一歩となるもので、AIが駆動するシステムが人間の手を借りずに自分で動くようになることを示唆しています。