Breaking down Bolt.new

Bolt is a web app where you can create websites with an LLM. It worked so well that the product went viral, achieving $8 million in annual recurring revenue within two months of launch.

Fortunately, they open-sourced the entire code, and I looked into the codebase to understand what made it so effective. Here's what I found:

System prompt with a simple chat message

Rather than implementing complex agentic loops or workflows, Bolt relies on a carefully designed 3,000-token system prompt.

The system prompt is meticulously structured using XML tags, containing several components. I won’t go over every one of them. Instead, let’s look at the example input and output which was also included in the system prompt.

Click to see an example output

Here, we see a user query asking Bolt to make a website with a bouncing ball using ReactJS. The assistant responds with several <boltAction>s that execute sequentially. For example, the first action has file as type, which means it requests to create a file with the specified content at package.json. Actions can also be shell commands, like npm run dev, as shown in the example's final action.

The way that Bolt executes these actions is simple but brilliant: they stream the LLM's output and use a parser that they’ve built to detect actions in real time. Whenever the parser identifies opening and closing XML tags for boltAction, it immediately executes that action.

No task decompositions and structured outputs

This approach differs significantly from my usual method of orchestrating LLM calls. Typically, I'd break down the task into subtasks and use LLMs for atomic sequential operations. For instance, I'd split Bolt's process into planning and execution phases. And in the planning phase, I’d use structured output or function calling to get a list of subtasks from an LLM, i.e. ["write package.json with dependencies …", "write index.html...", "...", ...]. This result will loop through an agentic flow where an LLM reviews and refines the generated plan for errors until everything looks good. Once the final plan is ready, then I’d execute it with an LLM.

The Bolt team took the opposite approach. First, they allowed Claude 3.5 Sonnet to generate the entire solution in one continuous output rather than breaking tasks down. To encourage high-quality actions from the LLM, they simply included this line in the system prompt: “CRITICAL: Think HOLISTICALLY and COMPREHENSIVELY”. Second, instead of using function calling or structured output, they treated the output string as a canvas where the LLM could generate either plain text messages or actions for the parser to detect. This streamlined approach fully leverages the frontier model's reasoning and planning capabilities.

No RAGs

Another notable aspect is that the Bolt team didn't use RAG or any context management system beyond tracking user file modifications and loading error messages. Instead, they relied solely on message history—including all previous messages whenever a user responded—which effectively leveraged Claude Sonnet's 200k token context window. Since the conversation history contained all LLM-generated code within AI messages, the model had everything it needed at hand. This streamlined approach freed Bolt engineers from context curation work while avoiding the performance issues common in unoptimized RAG systems.

So, why does Bolt work so well?

Bolt’s success can be broken down into 3 aspects.

Simplicity: They successfully built an simple LLM application. They boldly didn’t impose any pre-defined workflows or agentic flows that can get complicated quite quickly, but just used a single LLM call. They boldly kept all information in one message chain removing the finicky problem of managing context, but sacrificing the cost of LLM call and potential context limit issue. They’ve successfully broken down what is necessary for full-stack LLM agent into 3 actions(create file, edit file, execute shell command) which made the output as simple as for LLMs to follow the structure and later to be parsed.

Betting on future LLMs: Models are improving rapidly, and Bolt built their product with the assumption that next-generation LLMs would be more capable than current ones. While many understand this concept in theory, executing this vision successfully is far more challenging. It's tempting to overcomplicate the system to make it work. More importantly, it's difficult to create the right abstractions that allow next-generation models to reach their full potential. Bolt's team achieved this with their simple action types, as mentioned earlier.

Owning the environment: Bolt was developed by Stackblitz, a company that has been building web IDEs for many years. One of their key innovations is Web container technology, which enables server-side environments like Node.js to run completely within the browser sandbox. As Stackblitz's CEO explains, "If you're running Cursor on anyone's machine, you don't know what you're dealing with—there could be OS issues, config problems, or countless other variables. But with WebContainer, since we built it from scratch, we can run projects in a unified environment and instrument it at every level—process, runtime, etc. This allows us to capture detailed error information that would be extremely difficult to obtain reliably across different operating systems."

Limitations

I suspected that this approach has a significant limitation: the system's controllability is reduced since everything relies on a single LLM call with a simple message history. Although this simplicity approach will shine in a short term especially right after the model’s capability improves, it will hit a wall very soon.