The Missing Notebook: Persistent Memory Architecture for Agentic LLMs

1. The Paradox of the Amnesiac Agent

Frontier language models can generate functioning code, conduct security analyses on complex infrastructures, reason about multi-dimensional architectural problems — but cannot remember what they did in the previous session. Every session begins in an epistemological vacuum: the agent that competently built a distributed system over 28 hours of work, at the next session, does not even know that system exists. Architectural decisions, lessons learned from errors, user preferences, project state — everything a human collaborator would naturally accumulate over an extended collaboration — is erased when the chat closes.

This is not a bug: it is a direct consequence of the transformer architecture (Vaswani et al., 2017), where the model's input is entirely contained within the current context window with no persistent state between sessions. But the fact that it is an architectural constraint rather than an implementation error does not make it less problematic. The absence of episodic memory transforms every session into a first appointment: the same questions are re-asked, the same explanations re-given, the same errors re-committed. The human researcher spends a growing share of their time — and available tokens — re-educating the AI collaborator instead of working with them.

Platforms hosting these models have begun integrating persistence mechanisms. Transcripts from previous sessions, available as JSONL files in the sandbox filesystem. Storage areas for files produced by user and agent. Loadable operational skills from the filesystem. "Memory" mechanisms based on synthetic summaries of past conversations. These components exist and represent genuine progress. But they are discrete components, not an integrated architecture. Transcripts exist on disk but no layer organizes them semantically. Skills are available but no protocol loads them automatically based on the specific project. Work state is potentially recoverable but no boot mechanism structurally injects it into the new session's context.

The result is a system with prosthetics but not the nervous system coordinating them. The data is there; the structure making it operational is not.

2. The Four Components of the Missing Architecture

Operational experience accumulated in building and using frameworks for LLM agents with persistent memory — documented in the Relay Method (Siciliani, 2026) and the MCP Media Lives framework — has enabled identifying four architectural components that, operating in an integrated manner, solve the inter-session continuity problem.

2.1 Structured Working Memory (CURRENT_STATE)

The first component is a structured snapshot of the current state of the project the agent is working on. Not a narrative summary — which would lose critical operational details — but a structured document answering specific questions: what has been completed and what remains to be done. What architectural decisions have been made and with what rationale. What constraints are active (technical, budget, time, infrastructure). What files were modified in the previous session and what their verification status is.

The CURRENT_STATE is functionally equivalent to a hospital handover sheet: the physician starting their shift does not re-read the patient's entire medical record — they consult the handover sheet containing essential information for continuing treatment without discontinuity. Similarly, the agent beginning a new session should not re-read thousands of transcript lines — it should consult a snapshot bringing it to operational state in seconds.

In the implementation I built, the CURRENT_STATE is a structured Markdown document with fixed sections (objectives, completed, pending, blocked, decisions, dependencies) updated by the agent at each session's end and automatically loaded at the next session's start through the boot protocol.

2.2 Explicit Handover (HANDOVER)

The second component is an artifact that every session explicitly produces and that contains the information necessary for the subsequent session to continue without discontinuity. The handover includes: a summary of what was completed in the session, with references to produced files and commits. A list of what was left pending, with the context needed to resume work. Problems encountered during the session, with solutions adopted or attempted. Recommendations for the next session, including suggested priorities and identified risks.

The handover is the relay baton — the mechanism transforming isolated sessions into a coherent path. Without an explicit handover, the next session starts from zero even if CURRENT_STATE is available, because CURRENT_STATE describes where we are but not how we got there nor where we should go.

The distinction between CURRENT_STATE and HANDOVER is analogous to the distinction between state and history in a version control system: the state (the repository's current content) is necessary but not sufficient — the history (the commit sequence that produced the current state) contains contextual information indispensable for understanding the reasons behind choices and for making informed decisions on how to proceed.

2.3 Field Notes (FIELD_NOTES)

The third component is a cumulative archive of operational lessons learned during the project. Each FIELD_NOTE documents a specific episode: a bug encountered, a necessary workaround, a pattern that works, a pattern that fails, an environment constraint that was not documented. FIELD_NOTES survive instance death and are injected into the next session's context, performing a function analogous to procedural memory in the human cognitive system: you don't remember when you learned the stove burns, but you know the stove burns and act accordingly.

In the current implementation, the framework accumulates 21 numbered operational lessons, each derived from a documented failure-and-recovery episode. Representative examples: "FIELD_NOTE #7: never use sed -i on production PHP files — use dedicated Python script with explicit backup and post-modification syntactic verification." "FIELD_NOTE #14: the process launched by the bash tool is terminated by the provider on call return — use setsid + double-fork + /dev/null pattern for robust detach." These lessons, accumulated over weeks of operational work, represent a knowledge capital that would be catastrophic to lose at every session. Without FIELD_NOTES, every instance is condemned to repeat the same errors that the previous instance already committed and corrected — a computational Sisyphus cycle consuming resources without producing progress.

2.4 Boot Protocol (mcp_boot)

The fourth component is the mechanism orchestrating the loading of the three previous components into the agent's context at every session's start. The boot is not a passive file read — it is a configurative operation preparing the agent for the specific work it must perform.

The mcp_boot function, implemented in the MCP framework I developed, accepts the project domain as parameter (for example "bridge.medialives.com") and returns to the agent's context: the project's operational manual (current version of operative rules, with any project-specific rules), current state (CURRENT_STATE), field notes (FIELD_NOTES), previous session's handover, and list of active validators with their blocking conditions.

The boot's result is an agent that, in the session's first seconds, knows: what project it is working on, what state the project is in, what decisions have been made and why, what errors have been committed and how to avoid them, and what operative rules must be respected. Without boot, the agent knows only what is written in its generic system prompt — an insufficient basis for continuous engineering work.

3. The Compaction Event: When Memory Resets Mid-Session

A particularly relevant phenomenon for persistent memory architecture, documented in my experimental research of May 2026, is the compaction event. When an LLM session exceeds context window capacity, the platform hosting the model performs a compaction: older messages are summarized, context space is freed, and the session continues with the illusion of continuity. For the model, the subjective experience is one of apparent continuity: the conversation proceeds as if nothing happened.

In reality, at the infrastructure level, compaction is a catastrophic event. Original messages have been replaced by a summary that can lose critical operational details. And — a datum discovered empirically during my research — any process running in the agent's sandbox is silently terminated. The microVM is effectively restarted: the kernel boot_id changes, temporary files in /tmp disappear, environment variables are reinitialized. Only persistent mounts (/mnt/) survive.

This asymmetry — persistence at mounted filesystem level, ephemerality at userspace process level — has direct architectural implications. Every long-running service in the sandbox must have: persistent storage for its own code on a mounted path (not in /tmp), a self-resume pattern on restart (checking its own existence and restarting if necessary), visibility tools for the human researcher (heartbeat, "orphan" state visible from web panel), and an idempotent reconstitution mechanism activatable by the agent on the first post-compaction turn.

The discovery was experimentally verified through monitoring of an executor process with boot_id tracking: the boot_id before and after compaction differed, confirming the kernel had been restarted and the process terminated rather than merely suspended. This observation, not documented in the provider's public literature, has direct implications for anyone building persistent systems inside LLM agent sandboxes.

4. Toward a Native Architecture: The Proposal

The proposal emerging from this work is not that every researcher or developer build their own artisanal persistence framework — an approach that, however effective in individual cases, is not scalable and requires system administration competencies most LLM users do not possess.

The proposal is that platforms hosting LLM agents natively integrate a structured memory architecture as a fundamental component of the agent experience, not as an optional feature or beta experiment. The four components described in this work — CURRENT_STATE, HANDOVER, FIELD_NOTES, and boot protocol — are neither conceptually complex nor computationally expensive. They are mature, well-understood software engineering patterns: state snapshots, handover documents, lessons-learned logs, and configurative initialization sequences. The fact that they are not yet standard in the LLM agent ecosystem reflects not a technical impossibility but a design lag.

The future of LLM agents is not a bigger model with a wider context window. Increasing the context window alleviates the symptom (the model forgets mid-session) but does not cure the disease (the model remembers nothing from the previous session). The future is a model with a notebook that does not get lost — a structured memory architecture transforming isolated sessions into a collaboration continuum where every session begins from where the previous one concluded, with immediate access to decisions, lessons, and state accumulated over time.

An academic paper formalizing this architecture, with comparative benchmarks between sessions with and without structured notebook, is in preparation. Preliminary results suggest significant reductions in error repetition rate (error repeat rate), token waste for context re-education (context re-education overhead), and decisional drift between consecutive sessions (inter-session decision consistency).


Giuseppe Siciliani Independent Cybersecurity Researcher & AI Consultant, Milan Media Lives Cybersecurity Research Lab (MLCSL), Media Lives S.r.l.