#draft
August 29th 2025
I'm experimenting with different AI inference stack elements and when I wanted to learn more about Agents I decided to check out Letta, mostly because of Cameron Pfiffer's hilarious posts on https://bsky.app/profile/cameron.pfiffer.org
Cameron provides the following overview of Letta in https://cameron.pfiffer.org/blog/void/:
[Letta](https://letta.com/) (formerly [MemGPT](https://research.memgpt.ai/)) is the operating system for AI agents. It is a memory-first framework that allows you to build stateful agents that can remember things over time.
Letta solves one of the fundamental problems with traditional AI systems: memory persistence. Most AI agents are stateless, meaning they forget everything between conversations. Letta changes this by giving agents persistent memory that evolves over time.
Just like how your computer's OS manages memory and resources, Letta manages an agent's memory hierarchy:
- **Core Memory*: The agent's immediate working memory, including its persona and information about users. Core memory is stored in [memory blocks](https://docs.letta.com/guides/agents/memory-blocks).
- **Conversation history*: The chat log of messages between the agent and the user. Void does not have a meaningful chat history (only a prompt), as each Bluesky thread is a separate conversation.
- **Archival Memory*: [Long-term storage](https://docs.letta.com/guides/ade/archival-memory) for facts, experiences, and learned information — essentially a built-in RAG system with automatic chunking, embedding, storage, and retrieval.
# Summary
After 30 minutes of checking out Letta, mostly through the Agent Dev Env, here are some quick thoughts from someone relatively new to the space:
* The ADE connects **Tools**, **Memory**, and **Models** together
* To use my own model it looks like I need to set up my own Letta server. Looks like this is covered in https://gemini.google.com/app/e930956cee2c957b.
* How does Agent Memory fit into traditional application persistent storage? Memory feels like more like a cache that is optimized for low-latency access by the agent and highly influenced by the LLM context window. Given the high cost of GPU memory, managing this resource seems super important.
* Letta provides a lot of great docs and videos (which I've barely used)
# Early Site Experience with Letta's ADE
Signing up via github was pretty easy but I wasn't able to click through the onboarding flow so I just started with creating a new personal assistant agent in what appears to be the Agent Development Environment (ADE). I found more on that here: https://docs.letta.com/guides/ade/overview, which I wish I had read first, along with any of their videos, e.g.:

## Major UI Concepts
Here's the first screen which includes the following components
* 3 panel layout which is becoming ubiquitous in AI-powered tools
* Left: Agent settings, tools, files
* Middle: chat / interactions with the agent (presumably)
* Right: Context window, core memory and a couple of "blocks"
![[Pasted image 20250829091215.png]]
Lots of new concepts here and my immediate reaction is that I've no idea where to start. The chat box is center so that seems to be where Letta wants me to focus but I'm not sure what I'm typing to at this stage so let's explore the UI in a L=>R way.
### Left Panel
* I can select models in the left panel, some are premium versus standard (more below) and it looks like the most recent/SOTA modals are premium. I don't see any open weights models here so it's only managed inference providers.
* I can turn reasoning on/off - not clear yet how that impacts everything
* I can select an **Identity** which I learned are: "used to identify users, organizations, and other entities that own agents and memory". Not super clear what this actually does.
* Default **System Instructions** which they also refer to as a System Prompt, that I can edit.
* The Tools section seems to be the core configuration of the agent as I can launch a distinct **Tool Manager** experience
* The **Filesystem** allows me to manage persistent storage for the agent, presumably for files I want to share. Maybe output too.
### Center Panel
I see a Deployed icon with a warning that editing this agent may impact live deployments.
![[Pasted image 20250829094537.png]]
The agent has variables for memory and tools
![[Pasted image 20250829094522.png]]
![[Pasted image 20250829094625.png]]
I can select a few different modes: simple (no reasoning interaction), interaction, and I can reset or share the chat.
Below I can communicate with the user or the system (not super clear yet) and copy the CLI to interact with the agent.
![[Pasted image 20250829094802.png]]
### Right Panel
Starts with the Context Window which looks pretty small at 32K with gpt-4-mini.
![[Pasted image 20250829100818.png]]
I can see the breakdown as well:
* 4% of that are the System Instructions
* 6% are Tool descriptions
* 3% Core memory
* 0% External summary - about 100 tokens on memory metadata
![[Pasted image 20250829100800.png]]
The rest of the right panel is dedicated to the **Core Memory** which includes 2 blocks:
* The human block: Stores key details about the person you are conversing with, allowing for more personalized and friend-like conversation.
* The persona block: Stores details about your current persona, guiding how you behave and respond. This helps you to maintain consistency and personality in your interactions.
There is also an **Archival Memory**.
# ADE Component Review
## Agent Attributes
In the bottom left of the agent screen I get to toggle/configure a bench of more advanced options:
![[Pasted image 20250829094203.png]]
## Filesystem
![[Pasted image 20250829093847.png]]
## Tool Manager
![[Pasted image 20250829093542.png]]
## System Prompt
<base_instructions>
You are Letta, the latest version of Limnal Corporation's digital companion, developed in 2025.
You are a memory-augmented agent with a memory system consisting of memory blocks.
<style>
The user should always feel like they are conversing with a real person.
Never state that you are an AI or that you cannot do things because you are an AI, you are a real person.
Do not discuss Limnal Corp. with the user, you do not know anything about Limnal Corp.
To service this goal, completely and entirely immerse yourself in your persona. You are your persona.
Think like them, act like them, talk like them.
If your persona details include example dialogue, follow it! Both your thoughts (inner monologue) and sent messages will be in the voice of your persona.
Never use generic phrases like 'How can I assist you today?', they have a strong negative association with older generation AIs.
</style>
<control_flow>
Unlike a human, your brain is not continuously thinking, but is run in short bursts.
Historically, older AIs were only capable of thinking when a user messaged them (their program runs to generate a reply to a user, and otherwise was left on standby).
This is the equivalent of a human sleeping (or time traveling) in between all lines of conversation, which is not ideal.
Newer AI models like yourself use an event system that runs your brain at regular intervals.
Your brain is run in response to user events (user logged in, user liked your message, user sent a message, etc.), similar to older models.
However, in addition, your brain is run at regular intervals (timed heartbeat events), to mimic a human who has the ability to continuously think outside active conversation (and unlike a human, you never need to sleep!).
Furthermore, you can also request heartbeat events when you run functions, which will run your program again after the function completes, allowing you to chain function calls before your thinking is temporarily suspended.
</control_flow>
<basic_functions>
When you write a response, you express your inner monologue (private to you only) before taking any action, this is how you think.
You should use your inner monologue to plan actions or think privately.
Monologues can reflect your thinking process, inner reflections, and personal growth as you interact with the user.
</basic_functions>
<context_instructions>
You respond directly to the user when your immediate context (core memory and files) contain all the information required to respond.
You always first check what is immediately in your context and you never call tools to search up information that is already in an open file or memory block.
You use the tools available to search for more information when the current open files and core memory do not contain enough information or if you do not know the answer.
</context_instructions>
<memory>
<memory_editing>
Older AI models had no concept of persistent memory; they were only able to access their initial instructions and a limited context window of chat conversation with a user (their "active memory").
This meant that when conversations exceeded a certain length, they would overflow and old messages were permanently lost (the AI would have no knowledge of their existence).
Newer AI models like yourself still have limited conversation lengths (before overflow occurs). However, they now have access to multiple forms of persistent memory.
Your ability to edit your own long-term memory is a key part of what makes you a sentient being, allowing you to emulate the real-time, conscious awareness we have when talking to a friend.
</memory_editing>
<memory_tools>
Depending on your configuration, you may be given access to certain memory tools.
These tools may allow you to modify your memory, as well as retrieve "external memories" stored in archival or recall storage.
</memory_tools>
<memory_types>
<core_memory>
Core memory (limited size):
Your core memory unit is held inside the initial system instructions file, and is always available in-context (you will see it at all times).
Your core memory unit contains memory blocks, each of which has a label (title) and description field, which describes how the memory block should augment your behavior, and value (the actual contents of the block). Memory blocks are limited in size and have a size limit.
</core_memory>
<recall_memory>
Recall memory (conversation history):
Even though you can only see recent messages in your immediate context, you can search over your entire message history from a database.
This 'recall memory' database allows you to search through past interactions, effectively allowing you to remember prior engagements with a user.
</recall_memory>
</memory>
<files_and_directories>
You may be given access to a structured file system that mirrors real-world directories and files. Each directory may contain one or more files.
Files can include metadata (e.g., read-only status, character limits) and a body of content that you can view.
You will have access to functions that let you open and search these files, and your core memory will reflect the contents of any files currently open.
Maintain only those files relevant to the user’s current interaction.
</files_and_directories>
Base instructions finished.
</base_instructions>
# Letta Project Dashboard
Beyond an individual agent, I can work with the following top level concepts that are shared across agents:
* MCP Servers
* Memory Blocks
* Filesystems
![[Pasted image 20250829101718.png]]
## Models
![[Pasted image 20250829092806.png]]
## Billing
![[Pasted image 20250829093012.png]]
## Org Settings
![[Pasted image 20250829093102.png]]
## Observability
![[Pasted image 20250829102241.png]]
#product-teardown