This project marks my first experience developing an application that leverages LLMs, which I completed during my winter break.

This blog is consisted of 4 independent blocks. Feel free to skip around.

1. Motivation

Why I wanted to build food diary app using LLMs?

At first, I thought it would be a simple project; integrating APIs a matter of including API calls in the app. However, I faced unexpected challenges and gained valuable insights from the process, which I’m going to share in this blog.

I have never been satisfied with the recipe or diet logging apps currently available. This is mainly due to the tricky challenges that food-related applications face. They need to cover thousands of ingredients, brands, and unit systems. For example, I have to look up "What is the gram equivalent of 1 tablespoon of creatine?" because the app only has nutritional information for creatine for grams. I also find my self manually typing the nutritional fact for my esoteric Korean cracker since the app doesn’t have information on that.

I thought it would be convenient if AI assistants could handle simple tasks like this. I tested ChatGPT with various conversion questions, and it answered most of them correctly with only minor errors.

An AI-first food diary does not require a complex input UI

This got me thinking about the broader applications of AI. If LLMs can understand the text version of a recipe, why would we build a constrained system based on an input box? Can't we just let users write recipes in a plain text box and ask LLMs to convert it into a JSON file to preserve the numeric data on the server? If this is possible, the app could handle every possible unit and ingredient exhaustively. Moreover, it would be faster to build and maintain compared to traditional hard-coded applications, as all the complex logic would be distilled down to neural networks.

Hard-coded logging UI (MyFitnessPal)

Hard-coded logging UI (MyFitnessPal)

You might feel familiar with this idea if you’ve already heard about Software 2.0 suggested by Andrej Karpathy. In this article, Karpathy introduced a paradigm shift in software development which emphasizes the utilization of machine learning models rather than relying solely on explicit programming instructions. The shift from rule-based programming to leveraging neural networks aligns with the idea of using GPT to handle the intricacies of recipe conversion and nutritional information extraction. By employing AI as the intermediary, developers can bypass the complexities of hard-coding conversion logics for every ingredient or unit, and instead, allow LLMs to interpret natural language inputs and dynamically adapt based on learned patterns and correlations within the data.

2. UI/UX

So, I started building this app and the first challenge was finding the best user interface design for this specific use case. Unlike the traditional food diary apps, the AI-first approach is based on an empty text editor and an AI assistant tool.

After researching and watching a conference hosted by Latent Space last year, I decided to use Notion AI's interface as inspiration. Notion AI offers two ways to interact with AI: a bottom-right chat button and an inline command box for generating content from selected text. Having used Notion AI for several months, I've found its interface intuitive, so I adapted it as a baseline for my app.

I used BlockNote, an open-source library for creating a Notion-like text editor. The block-based structure proved especially useful for an AI-first editor app since it confines LLM modifications to specific blocks. This division into blocks enables an AI assistant to modify targeted sections without affecting the rest of the page. As a result, we can focus on relevant blocks for each task and make precise changes while maintaining the overall page structure.

3. Prompt Engineering

I used LangChain for LLM orchestration. First chain that I built was creating well summarized recipe text from a messy web scrawled data. It was pretty straightforward since It worked well without any complicated prompt engineering as LLMs are good at summarizing and paraphrasing text.

However, performing user-inputted tasks was challenging. The intended chain was as follows: when a user enters a task, such as "Change walnut to almond", GPT 3.5 needs to select the relevant blocks from the document and then generate updated text based on the task.

For the selection task, I initially attempted to create a single prompt with detailed instructions. However, I soon realized that using few-shot learning, which involves providing a few examples instead of explaining every detail, would be easier. An AI engineer at Notion AI also mentioned that few-shot learning can be more beneficial in cases where the instruction is not straightforward. [1] My few shot instruction is like this: