Can stateless and sycophantic LLMs collaborate?

In this write-up, I analyze why the multi-agent approach struggles, particularly for what I call "high-memory-load knowledge tasks" (which I'll define in the stateless section).

This analysis consists of two sections:

In the section on stateless, I explain how the stateless nature of LLMs poses miscommunications among LLM agents.
In the section on sycophancy, I explain how LLMs' lack of perspectives and tendency to be easily swayed prevent them from being effective in collaboration.

If you're already familiar with multi-agent systems, feel free to skip the introduction.

Introduction

There has been a lot of research done and frameworks developed for multi-agent systems. The start was a Stanford research team who created a virtual town where 25 LLM agents interact on their own by planning days, building relationships, and organizing events. This project gave people an idea that new possibilities emerge when you have a group of LLM agents with diverse roles and personalities interacting with each other.

Virtual town of 25 LLM agents built by Stanford researchers

Since then, numerous agentic frameworks have emerged. Microsoft, for instance, introduced AutoGen—a Python framework designed to streamline the development of multi-agent applications. It allows developers to define specialized agents and have them collaborate through conversation. Microsoft also introduced TinyTroupe that is more focused on simulation of LLM agents. With TinyTroupe, You can define a world that consists of agents who have unique personalities and goals, and run a simulation to generate ideas or solutions.

Various architectures have been introduced for multi-agent systems. For instance, you could have agents communicate with each other equally, or implement a hierarchical structure with a supervisor who directs subordinate agents and aggregates results. AutoGen provides pre-defined architectures that you can easily start with. For more custom architectures, you can use LangGraph—a Python and JavaScript framework for designing flexible workflows.

AutoGen’s pre-defined architecture—group chat

Multi-agent architecture suggestions by LangGraph

After months of experimenting with these tools and architectures, I've concluded that multi-agent systems face a fundamental problem. This issue stems from the inherent characteristics of current LLMs and won't be resolved merely by developing new tools or architectures. If we want to create truly collaborative multi-agent systems, we need to address these fundamental issues.

Effective collaboration among AI agents has the potential to revolutionize problem-solving, decision-making, and creative processes across various fields. However, without addressing core limitations, we risk developing systems that merely mimic collaboration without achieving genuine synergy. We'll explore two key challenges: stateless and sycophancy. Understanding these issues is crucial for anyone involved in multi-agent AI systems. Recognizing these limitations will guide us in developing more effective and truly collaborative AI systems. Now, let's dive into the first critical challenge: the problem of stateless in AI agents.

Stateless

A high-memory-load knowledge task is a cognitive work that requires significant time to load and organize relevant information in memory, ultimately synthesizing it to accomplish the task.

The memory loading process isn't just about aggregating snippets of information. It's a dynamic process where memories are rearranged and connected to better model the goal task. For instance, when a novelist writes a story, she mulls over the characters and plot for months, trying multiple drafts before actually writing the final manuscript. This pre-writing period allows her to develop complex narratives and nuanced characters, and to internalize them as if they were real people she knows intimately.

LLMs are stateless, meaning they don’t remember anything that happened after the pre-training or fine-tuning stage. We must artificially coordinate memory by manually gleaning information from various sources and curating it into the prompt.

To better understand the limitation let’s imagine an example scenario. You are a supervisor of three employees collaborating on a fantasy novel for a client who wants herself as the main character. However, there's a crucial twist: all three people have severe amnesia. Their cognitive abilities are intact, but their memory lasts only 10 seconds. However, like LLMs, they can read and write long text almost instantly.

You devised a clever strategy:

One person interviews the client and writes a brief report on a card about what she's learned. Since she can’t remember the previous questions that she asked and the answers from the client, she keeps a long interview log and read through it each time.