Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessThe way I see it — The development of autonomous vehicles is fraught with ethical concerns. And: The notion that the separatiDev.to AIThe Architect’s Reflection: The 5D MiddlewareMedium AII Am a Software Engineer Teaching Myself AI Engineering. Here Is Where I Am Starting.Medium AI20 Meta-Prompts That Boost AI Response Quality by 300%Dev.to AI5 Projects That Put a Fully Customizable AI Assistant on Your Wrist in Under $15Dev.to AIWhy OpenAI’s TBPN Acquisition Is a Turning Point for Enterprise AIMedium AIFrom 1.5s to 250ms: How We 6x'd API Latency with Spring Boot OptimizationDev.to AIReinventing Brands in the Decentralised Era: Web3, Immersive Worlds, and User-Owned IdentityDev.to AI5 Key Insights That Cut My AI Wearable Development Time by 40%Dev.to AI5 AI Side Hustles That Generated $1,000 in 3 Months for a Beginner Like MeDev.to AIDesktop Canary v2.1.48-canary.25LobeChat ReleasesHow to build an MCP server from scratch (Python, 2026 guide)Dev.to AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessThe way I see it — The development of autonomous vehicles is fraught with ethical concerns. And: The notion that the separatiDev.to AIThe Architect’s Reflection: The 5D MiddlewareMedium AII Am a Software Engineer Teaching Myself AI Engineering. Here Is Where I Am Starting.Medium AI20 Meta-Prompts That Boost AI Response Quality by 300%Dev.to AI5 Projects That Put a Fully Customizable AI Assistant on Your Wrist in Under $15Dev.to AIWhy OpenAI’s TBPN Acquisition Is a Turning Point for Enterprise AIMedium AIFrom 1.5s to 250ms: How We 6x'd API Latency with Spring Boot OptimizationDev.to AIReinventing Brands in the Decentralised Era: Web3, Immersive Worlds, and User-Owned IdentityDev.to AI5 Key Insights That Cut My AI Wearable Development Time by 40%Dev.to AI5 AI Side Hustles That Generated $1,000 in 3 Months for a Beginner Like MeDev.to AIDesktop Canary v2.1.48-canary.25LobeChat ReleasesHow to build an MCP server from scratch (Python, 2026 guide)Dev.to AI
AI NEWS HUBbyEIGENVECTOREigenvector

Five Agent Memory Types in LangGraph: A Deep Code Walkthrough (Part 2)

DEV Communityby Seenivasa RamaduraiApril 3, 202631 min read0 views
Source Quiz

In Part-1 [ https://dev.to/sreeni5018/the-5-types-of-ai-agent-memory-every-developer-needs-to-know-part-1-52fn ] we covered the five memory types , why the LLM is stateless by design, and why memory is always an infrastructure concern. This post is the how. Same five types, but now we wire each one up with LangGraph , dissect every line of code, flag the gotchas, and leave you with a single working script you can run today. Before We Write a Single Line: Two Things You Must Understand The Context Window Is the Only Reality Repeat this like a mantra and the model only knows what is in the context window at inference time. Every token your message, retrieved facts, conversation history, tool results, system instructions has to be physically present in that window at the moment of the call. I

In Part-1 [https://dev.to/sreeni5018/the-5-types-of-ai-agent-memory-every-developer-needs-to-know-part-1-52fn] we covered the five memory types, why the LLM is stateless by design, and why memory is always an infrastructure concern. This post is the how. Same five types, but now we wire each one up with LangGraph, dissect every line of code, flag the gotchas, and leave you with a single working script you can run today.

Before We Write a Single Line: Two Things You Must Understand

  • The Context Window Is the Only Reality Repeat this like a mantra and the model only knows what is in the context window at inference time. Every token your message, retrieved facts, conversation history, tool results, system instructions has to be physically present in that window at the moment of the call. If it is not there, the model does not know it exists. Your memory infrastructure's entire job is to decide what goes in, when, and in what form.

  • Checkpointer ≠ Store This Confusion Breaks Designs LangGraph gives you two distinct persistence hooks and mixing them up is the most common architecture mistake beginners make.

The practical consequence: if you store a user preference in the checkpointer (i.e., in state["messages"]), it vanishes the moment you start a new thread_id. If you store it in the store, it is there regardless of which thread the user returns on. Choose deliberately.

For local production setups you typically use SQLite for both, as two separate files:

SqliteSaver → durable per thread checkpoint history SqliteStore → durable cross thread LTM/episodic records

The demos below use InMemory* backends so you can run them with zero setup. That is a teaching choice, not a recommendation for production.*

Environment Setup

bashpip install langgraph langchain-openai langchain-community faiss-cpu python-dotenv

export OPENAI_API_KEY=sk-... export OPENAI_CHAT_MODEL=gpt-4o-mini # optional, this is the default

macOS note: If you have PyTorch installed alongside FAISS, two OpenMP runtimes may be loaded and Python will abort on import. The fix is one line: os.environ.setdefault("KMP_DUPLICATE_LIB_OK", "TRUE") — set it before importing FAISS. The full script at the end does this automatically.

from __future__ import annotations

import os os.environ.setdefault("KMP_DUPLICATE_LIB_OK", "TRUE") # Must be before FAISS import

import operator import sys import uuid from pathlib import Path from typing import Annotated, TypedDict

from dotenv import load_dotenv from langchain_core.documents import Document from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, SystemMessage from langchain_core.tools import tool from langchain_community.vectorstores import FAISS from langchain_openai import ChatOpenAI, OpenAIEmbeddings from langgraph.checkpoint.memory import InMemorySaver from langgraph.config import get_store from langgraph.graph import END, START, MessagesState, StateGraph from langgraph.graph.message import add_messages from langgraph.prebuilt import ToolNode, tools_condition from langgraph.store.memory import InMemoryStore`

Enter fullscreen mode

Exit fullscreen mode

Memory Type 1: Short Term Memory (STM) The Conversation Buffer

What it is?

Short-term memory (STM) is the rolling transcript of the current conversation. It is what allows the model to understand "make it shorter" without you specifying what "it" refers to. Every prior message in the session is assembled into the context window on each subsequent call.

pythondef demo_short_term_memory(llm: ChatOpenAI) -> None:  """  Short-term memory = this thread's message list, restored by the checkpointer.

The same thread_id on each invoke reloads prior turns into state["messages"] so the model sees continuity without you manually merging history. """

def chat(state: MessagesState) -> dict:

state["messages"] already contains ALL prior turns for this thread_id,

restored from the checkpoint. We pass the full list to the LLM.

return {"messages": [llm.invoke(state["messages"])]}

graph = StateGraph(MessagesState) graph.add_node("model", chat) graph.add_edge(START, "model") graph.add_edge("model", END)

Compile with a checkpointer. Without this, state is not saved between invokes.

app = graph.compile(checkpointer=InMemorySaver())

tid = "session-stm-demo" cfg: dict = {"configurable": {"thread_id": tid}}

First turn: store the codename.

app.invoke({"messages": [HumanMessage("My codename for this session is Bluejay.")]}, cfg)

Second turn: only the new message is passed in.

The checkpointer reloads the first turn automatically.

out = app.invoke({"messages": [HumanMessage("What codename did I give?")]}, cfg) print("[STM] Last reply:", out["messages"][-1].content)`

Enter fullscreen mode

Exit fullscreen mode

Line-by-line breakdown

def chat(state: MessagesState) -> dict: This is the only node in the graph. MessagesState is a TypedDict with one key: messages. By the time this function executes on the second invoke, state["messages"] already contains both turns the original "My codename…" message, the model's reply to it, and the new "What codename…" message. The checkpointer loaded the prior checkpoint and the add_messages reducer merged the new input on top.

app = graph.compile(checkpointer=InMemorySaver()) This is the critical line. Without checkpointer=, each invoke starts with an empty state. With it, LangGraph saves a snapshot after every node completes and restores it at the start of the next invoke for the same thread_id.

cfg: dict = {"configurable": {"thread_id": tid}} This config dict is how you identify which conversation thread this call belongs to. The same thread_id = same checkpoint = continuity. A different thread_id = blank slate. This is intentional — you support multiple concurrent users by giving each a unique thread_id.

app.invoke({"messages": [HumanMessage("What codename did I give?")]}, cfg) Notice we only pass the new message. We do not rebuild the history manually. The checkpointer and the add_messages reducer do that for us.

The token budget problem and how to handle it STM has one fundamental weakness: as the conversation grows, the context window fills up. For production systems you have two standard strategies

  • Truncation — drop the oldest messages once you exceed a token threshold. Simple, but the model loses early context.

  • Summarization — periodically ask the LLM to write a running summary of the conversation so far, then replace the old messages with that summary. More expensive, but preserves the gist.

LangGraph does not do this automatically for you. You would add a summarization node that fires conditionally when len(state["messages"]) exceeds a threshold.

Production upgrade

Swap InMemorySaver() for

SqliteSaver.from_conn_string("checkpoints.db") and thread history survives process restarts. Swap for AsyncPostgresSaver for a cloud deployed multi instance setup.

Memory Type 2: Long Term Memory(LTM) Cross Thread Persistence

What it is?

Long-term memory (LTM) solves the problem that checkpoints can't: persistence across different thread_id values. When a user returns next week in a new session (new thread_id), their preferences, constraints, and facts should still be available. That requires the store.

def demo_long_term_memory(llm: ChatOpenAI) -> None:  """  Long-term memory = LangGraph Store: keyed data shared across thread_ids.

Checkpoints reset per thread; store.put / get survives that boundary. """

def remember_node(state: MessagesState) -> dict:

get_store() is injected by LangGraph at runtime because the graph

was compiled with store=. Do not pass the store as a function argument.

store = get_store()

Namespace is a tuple of strings — like a file path for your data.

("users", "demo-user", "facts") scopes this record to one user.

ns = ("users", "demo-user", "facts")

last = state["messages"][-1] text = last.content if isinstance(last.content, str) else str(last.content)

if text.lower().startswith("remember:"):

Extract the fact and store it under key "profile" in this namespace.

fact = text.split(":", 1)[1].strip() store.put(ns, "profile", {"text": fact}) return {"messages": [AIMessage(content=f"Stored: {fact}")]}

For any other query, retrieve the stored fact and inject it as context.

item = store.get(ns, "profile") fact = item.value.get("text", "") if item else ""

The retrieved fact goes into a SystemMessage so it conditions the reply

without appearing as part of the user's message.

msg = llm.invoke([ SystemMessage(content=f"Stored user fact (long-term): {fact or 'none'}"), HumanMessage(content=text), ]) return {"messages": [msg]}

graph = StateGraph(MessagesState) graph.add_node("agent", remember_node) graph.add_edge(START, "agent") graph.add_edge("agent", END)

store = InMemoryStore() app = graph.compile(checkpointer=InMemorySaver(), store=store)

Thread A: Store the user's preference.

app.invoke( {"messages": [HumanMessage("Remember: I always want concise bullet answers.")]}, {"configurable": {"thread_id": "ltm-a"}}, )

Thread B: Completely different thread_id. No shared checkpoint history.

But store.get still finds the preference stored under the same namespace.

out = app.invoke( {"messages": [HumanMessage("What style do I prefer?")]}, {"configurable": {"thread_id": "ltm-b"}}, ) print("[LTM] Reply on a different thread_id:", out["messages"][-1].content)`

Enter fullscreen mode

Exit fullscreen mode

Line-by-line breakdown store = get_store()

This is not get_store from a module level import in the traditional sense it is called inside the node function at runtime. LangGraph's execution engine makes the compiled store available via this call. If you try to use the store object directly from the outer scope inside a node, it works in this simple example, but get_store() is the correct pattern for production because it handles async contexts and subgraph injection correctly.

ns = ("users", "demo-user", "facts") Namespaces are tuples of strings. Think of them as a path in a key-value hierarchy. You could have ("users", user_id, "facts") for facts, ("users", user_id, "episodes") for events, and ("global", "config") for shared config. The store does not enforce any schema — the structure is entirely yours.

store.put(ns, "profile", {"text": fact}) Three arguments: namespace tuple, key string, value dict. The value must be JSON-serializable. Here we use a single "profile" key which gets overwritten each time. For multi-fact storage you'd use a unique key per fact (perhaps the fact's text, hashed, or a UUID). item = store.get(ns, "profile")

Returns an Item object (or None if the key does not exist). The dict you stored is at item.value. Always check for None before accessing .value a missing key returns None, not an exception.

The SystemMessage injection pattern Retrieved LTM facts almost always go into a SystemMessage, not a HumanMessage. This is intentional: you are giving the model background context before it reads the user's actual query. Putting it in the system prompt keeps it conceptually separate from the conversation.

What "vector-based LTM" looks like

In the demo, retrieval is a direct key lookup: store.get(ns, "profile"). In production you typically want semantic retrieval — given the user's current query, find the most relevant stored facts, not all of them. The pattern is:

On write: embed the fact text, store embedding + text + metadata. On read: embed the current query, run similarity search, inject top-k results.

LangGraph's SqliteStore and InMemoryStore both support a search(namespace, query=..., limit=k) call when an embedding function is configured. For larger scale, swap the store backend for Pinecone, Weaviate, or ChromaDB with the same put/get/search interface pattern.

Production upgrade

Replace InMemoryStore() with SqliteStore.from_conn_string("ltm.db") for local durability, or use a cloud vector store for multi-instance deployments.

Memory Type 3: Working Memory — The Reasoning Scratchpad

What it is?

Working memory is the temporary state that accumulates across multiple nodes within a single graph run. When an agent needs to research five things before answering one question, intermediate results need somewhere to live between steps. That place is an extra field in the graph state, cleared when the run ends. The code

class WorkingState(TypedDict):  """  Custom state schema: messages + a scratchpad notes list.

The Annotated[list[str], operator.add] declaration tells LangGraph: when multiple nodes return a 'notes' key, concatenate the lists rather than replacing the field. This is the 'reducer' pattern. """ messages: Annotated[list[BaseMessage], add_messages] notes: Annotated[list[str], operator.add]

def research_step(: WorkingState) -> dict: """ Simulated research/tool step. In a real agent this would call APIs, databases, or search tools. Returns a partial state update — only the 'notes' field. """ return {"notes": ["Competitor A monthly price = $49", "Competitor B monthly price = $39"]}

def demo_working_memory(llm: ChatOpenAI) -> None: """Working memory: research node fills notes, answer node reads them in one run."""

def answer_from_notes(state: WorkingState) -> dict:

By the time this node runs, state["notes"] contains everything

appended by research_step (and any other upstream nodes).

notes = "\n".join(state["notes"]) msg = llm.invoke([ SystemMessage( content="Answer using only the working notes below.\n## Working notes\n" + notes ), HumanMessage(content="Which competitor is cheaper and by how much?"), ]) return {"messages": [msg]}

graph = StateGraph(WorkingState) graph.add_node("research", research_step) graph.add_node("answer", answer_from_notes) graph.add_edge(START, "research") graph.add_edge("research", "answer") graph.add_edge("answer", END)

No checkpointer needed for working memory.

The scratchpad lives only for the duration of this single invoke call.

app = graph.compile() out = app.invoke({"messages": [], "notes": []}) print("[Working] Final:", out["messages"][-1].content)`

Enter fullscreen mode

Exit fullscreen mode

Line-by-line breakdown notes: Annotated[list[str], operator.add]

This is the key architectural decision. Without the operator.add reducer, if two nodes both return {"notes": [...]}, the second write would overwrite the first. With operator.add, LangGraph calls operator.add(current_notes, new_notes) — which for lists is concatenation. Multiple research nodes can all write notes and they accumulate correctly.

graph.add_edge(START, "research") and graph.add_edge("research", "answer")

This creates a sequential two-step pipeline. The research node runs first, populates notes. Then answer runs and reads the accumulated notes. This is a simple linear chain — real agents might have fan-out (multiple parallel research nodes) feeding into a single synthesis node. app = graph.compile() (no checkpointer) Working memory is intentionally ephemeral. You do not need a checkpointer for it. Adding one would checkpoint the scratchpad state, which is sometimes useful for debugging but not necessary for the pattern to work.

app.invoke({"messages": [], "notes": []}) Both fields must be initialized. If you omit "notes": [], LangGraph will error because the state schema declares notes as required. The initial empty list is the starting point for the operator.add reducer.

The multi-node fan-out pattern

The real power of working memory emerges when you parallelize: START → [research_a, research_b, research_c] → synthesize → END Each research node appends to notes. Because all three use operator.add, their results accumulate in whatever order they complete. The synthesize node sees all of them. You would wire this with:

graph.add_edge(START, "research_a") graph.add_edge(START, "research_b") graph.add_edge(START, "research_c") graph.add_edge("research_a", "synthesize") graph.add_edge("research_b", "synthesize") graph.add_edge("research_c", "synthesize")

Enter fullscreen mode

Exit fullscreen mode

Working memory vs long-term memory the key difference Working MemoryLong-Term MemoryLifespanOne invoke callIndefinitely, across sessionsStorageGraph state (in-process)Store backend (in-memory or durable)PurposeAccumulate intermediate resultsPersist user facts and preferencesCleared wheninvoke returnsExplicitly deleted, or never

Memory Type 4: Episodic Memory — The Event Log

What it is Episodic memory stores what happened, not just what is true. Long-term memory holds preferences ("I like bullet points"). Episodic memory holds events ("Last Tuesday we reviewed three quotes and chose Plan B"). It is the agent's diary — structured, timestamped, queryable. The code

def demo_episodic_memory() -> None:  """  Episodic memory = append-only events (task, outcome, ...), recalled by search.

In production: add timestamps, semantic search over episode summaries, and filters by date range, task type, or user ID. """ store = InMemoryStore()

Namespace: scoped to this user's episode log.

ns = ("users", "demo-user", "episodes")

Each episode gets a UUID so records are uniquely addressable.

If the same event needs to be updated later (e.g., outcome changed),

use the same key. For append-only logs, always generate a fresh UUID.

eid = str(uuid.uuid4())

store.put( ns, eid, { "task": "pricing_review", "outcome": "Chose plan B after comparing three quotes",

In production, add: "timestamp": datetime.utcnow().isoformat()

and embed the outcome text for semantic search.

}, )

Retrieve recent episodes. In production, filter by timestamp or

use store.search(ns, query="pricing decision", limit=5) for semantic recall.

results = store.search(ns, limit=5) print("[Episodic] Stored episodes:", [r.value for r in results])`

Enter fullscreen mode

Exit fullscreen mode

Line-by-line breakdown eid = str(uuid.uuid4())

Each episode is a separate record with a unique key. This is the append only pattern: you never overwrite an existing episode, you always create a new one. If you need to mark an episode as completed or update its outcome, you can use the same UUID as the key (the put call will overwrite it). The choice depends on whether you want a full audit trail or just the latest state of each event. store.put(ns, eid, {...})

The value dict can contain any JSON serializable data. In production, you would always include a timestamp so you can filter by date range. You might also store the full conversation summary, the user who triggered it, the tool calls made, and structured outcomes. store.search(ns, limit=5)

Without a query parameter, search returns the most recently written records up to limit. With a query string and an embedding function configured on the store, it performs semantic similarity search over stored records. The toy demo uses simple listing; real recall would look like:

python# Production-style episodic recall (pseudocode): results = store.search( ns, query="what pricing decisions did we make?", limit=5 ) The r.value access store.search returns a list of SearchItem objects. Each has .key, .namespace, and .value (the dict you stored). Filter and process them however you need before injecting into context.

Connecting episodic memory to the conversation

The episodic demo is intentionally standalone — it shows the storage pattern without a full graph. In a real agent, you'd write episodes in an after-action node that fires after every task completes, and you'd surface them in a context-building node at the start of each new session:

START → retrieve_episodes → main_agent → [task] → log_episode → END

Memory Type 5: Semantic Memory RetrievalAugmented Generation (RAG)

What it is?

Semantic memory is your agent's domain knowledge layer grounded in a corpus of verified text, retrieved dynamically rather than hallucinated from training weights. The pattern is: embed a query, find the most relevant document chunks, inject those chunks as tool output, let the model answer from the retrieved evidence.

def build_kb() -> FAISS:  """  Build a small FAISS vector index over profile documents.

In production: load from PDFs, databases, or a web crawl. Use a persistent vector store (Pinecone, Weaviate, ChromaDB) instead of FAISS so the index survives process restarts. """ return FAISS.from_documents( [ Document( page_content=( "Seenivasa Ramadurai works at Provizient. He architects cloud-native software — " "microservices, gRPC, REST — and delivers GenAI, LLMs, and agentic patterns." ) ), Document( page_content=( "At Provizient, skills include C#, Python, Java, Scala, TypeScript; LLMs, RAG, " "orchestration; ML and MLOps; vector databases; APIs; Kubernetes and Docker." ) ), ], OpenAIEmbeddings(), )

def bind_tools(model: ChatOpenAI, tools: list): """ Node factory: bind a list of tools to the LLM and return a graph node function.

bind_tools() tells the model what tools are available and how to call them. The model's response may be a plain AIMessage OR an AIMessage with tool_calls populated. """ bound = model.bind_tools(tools)

def node(state: MessagesState) -> dict:

Pass the full message history (including any prior tool results) to the model.

return {"messages": [bound.invoke(state["messages"])]}

return node

def demo_semantic_memory(llm: ChatOpenAI) -> None: """ Semantic memory: model calls a KB search tool, ToolNode executes it, results are appended to messages, model reads them and answers. This is the standard ReAct (Reason + Act) loop. """ kb = build_kb()

@tool def profile_kb_search(query: str) -> str: """ Retrieve top-k chunks from the profile knowledge base.

The docstring is shown to the LLM as the tool description — write it clearly so the model knows when and how to use this tool. """ docs = kb.similarity_search(query, k=2) return "\n".join(d.page_content for d in docs)

tools = [profile_kb_search] graph = StateGraph(MessagesState)

Two nodes: the LLM agent and the tool executor.

graph.add_node("agent", bind_tools(llm, tools)) graph.add_node("tools", ToolNode(tools))

graph.add_edge(START, "agent")

Conditional routing: if the agent emitted tool calls → run ToolNode.

If the agent emitted a final answer → END.

graph.add_conditional_edges( "agent", tools_condition, {"tools": "tools", "end": END} )

After ToolNode runs, go back to the agent so it can read the tool results.

graph.add_edge("tools", "agent")

No checkpointer needed for this demo, but you'd add one in production.

app = graph.compile()

out = app.invoke({ "messages": [ HumanMessage( "Which company does Seenivasa work for, and what are some of his skills? " "Use the knowledge tool." ) ] }) print("[Semantic] Last message:", out["messages"][-1].content)`

Enter fullscreen mode

Exit fullscreen mode

Line-by-line breakdown

FAISS.from_documents([...], OpenAIEmbeddings())

FAISS (Facebook AI Similarity Search) builds an in memory vector index. OpenAIEmbeddings() calls text-embedding-ada-002 (or the latest embedding model) to convert each document chunk into a vector. from_documents is a class method that handles both embedding and indexing in one call. For production, replace FAISS with a persistent vector store — FAISS is RAM-only and rebuilds from scratch on every process start.

@tool decorator

The @tool decorator from langchain_core.tools does three things: (1) wraps the Python function so it can be called by ToolNode, (2) extracts the function signature to build a JSON schema for the tool parameters, and (3) uses the docstring as the tool description sent to the LLM. Write clear docstrings — the model reads them to decide which tool to call and when model.bind_tools(tools)

This attaches the tool definitions to the model in the format required by the OpenAI function-calling API. When you call bound.invoke(messages), the model can now return an AIMessage with a populated tool_calls list in addition to (or instead of) plain text content.

tools_condition

This is a prebuilt LangGraph router function. It inspects the last message in state: if it has tool_calls, it returns "tools"; otherwise it returns "end". The conditional edge uses this to route traffic. The {"tools": "tools", "end": END} dict maps those return values to node names.

graph.add_edge("tools", "agent")

After ToolNode executes the tool call and appends the result as a ToolMessage to state, control returns to the agent. The agent now sees the tool result in its message history and generates a final answer. This loop continues until the agent produces a response with no tool calls.

The execution flow, step by step

User: "Which company does Seenivasa work for?"

  1. agent node runs:
  • LLM sees the question + tool definition

  • LLM responds: AIMessage(tool_calls=[{name: "profile_kb_search", args: {query: "Seenivasa company"}}])

  • tools_condition sees tool_calls → routes to "tools"

  1. tools node runs:
  • ToolNode calls profile_kb_search("Seenivasa company")

  • FAISS returns the two most similar chunks

  • Result appended as ToolMessage to state["messages"]

  • Edge sends control back to "agent"

  1. agent node runs again:
  • LLM now sees: original question + tool call + tool result

  • LLM produces a final AIMessage with no tool_calls

  • tools_condition sees no tool_calls → routes to END

  1. Graph returns state["messages"][-1].content = the grounded answer Why not just put knowledge in the system prompt? For small knowledge bases, you could. For anything non-trivial:

System prompts have token limits You pay for all tokens even if most are irrelevant RAG retrieves only what's relevant to the current query You can update the knowledge base without redeploying the agent

The Complete, Runnable Script Copy this file, set OPENAI_API_KEY, and run it. All five memory patterns execute sequentially. python""" Five agent memory patterns with LangGraph (Part 2 companion script).

Memory types demonstrated:

  • Short-term : MessagesState + InMemorySaver + stable thread_id

  • Long-term : InMemoryStore + get_store() across different thread_ids

  • Working : Custom WorkingState with notes merged via operator.add

  • Episodic : Append-only store rows + search (toy recall)

  • Semantic : FAISS + @tool + ReAct loop (ToolNode / tools_condition)

All demos use InMemory* backends (zero setup required). For production: swap InMemorySaver → SqliteSaver, InMemoryStore → SqliteStore.*

Dependencies:

pip install langgraph langchain-openai langchain-community faiss-cpu python-dotenv

Enter fullscreen mode

Exit fullscreen mode

Environment:

OPENAI_API_KEY (required) OPENAI_CHAT_MODEL (optional, defaults to gpt-4o-mini)

Enter fullscreen mode

Exit fullscreen mode

"""

from __future__ import annotations

import os

Set before any FAISS import to prevent OpenMP duplicate library crash on macOS.

os.environ.setdefault("KMP_DUPLICATE_LIB_OK", "TRUE")

import operator import sys import uuid from pathlib import Path from typing import Annotated, TypedDict

from dotenv import load_dotenv from langchain_core.documents import Document from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, SystemMessage from langchain_core.tools import tool from langchain_community.vectorstores import FAISS from langchain_openai import ChatOpenAI, OpenAIEmbeddings from langgraph.checkpoint.memory import InMemorySaver from langgraph.config import get_store from langgraph.graph import END, START, MessagesState, StateGraph from langgraph.graph.message import add_messages from langgraph.prebuilt import ToolNode, tools_condition from langgraph.store.memory import InMemoryStore

_ROOT = Path(file).resolve().parent load_dotenv(_ROOT / ".env")

CHAT_MODEL = os.getenv("OPENAI_CHAT_MODEL", "gpt-4o-mini")

def require_api_key() -> None: """Exit with a clear message if the OpenAI key is missing.""" if not os.getenv("OPENAI_API_KEY"): print( "ERROR: Set OPENAI_API_KEY in the environment or in a .env file next to this script.", file=sys.stderr, ) sys.exit(1)`

Enter fullscreen mode

Exit fullscreen mode

# 1. SHORT-TERM MEMORY

def demo_short_term_memory(llm: ChatOpenAI) -> None:  """STM: conversation buffer restored per thread_id via checkpointer."""

def chat(state: MessagesState) -> dict: return {"messages": [llm.invoke(state["messages"])]}

graph = StateGraph(MessagesState) graph.add_node("model", chat) graph.add_edge(START, "model") graph.add_edge("model", END) app = graph.compile(checkpointer=InMemorySaver())

tid = "session-stm-demo" cfg: dict = {"configurable": {"thread_id": tid}}

app.invoke({"messages": [HumanMessage("My codename for this session is Bluejay.")]}, cfg) out = app.invoke({"messages": [HumanMessage("What codename did I give?")]}, cfg) print("[STM] Last reply:", out["messages"][-1].content)`

Enter fullscreen mode

Exit fullscreen mode

python

2. LONG-TERM MEMORY

def demo_long_term_memory(llm: ChatOpenAI) -> None:  """LTM: LangGraph Store persists facts across different thread_ids."""

def remember_node(state: MessagesState) -> dict: store = get_store() ns = ("users", "demo-user", "facts") last = state["messages"][-1] text = last.content if isinstance(last.content, str) else str(last.content)

if text.lower().startswith("remember:"): fact = text.split(":", 1)[1].strip() store.put(ns, "profile", {"text": fact}) return {"messages": [AIMessage(content=f"Stored: {fact}")]}

item = store.get(ns, "profile") fact = item.value.get("text", "") if item else ""

msg = llm.invoke([ SystemMessage(content=f"Stored user fact (long-term): {fact or 'none'}"), HumanMessage(content=text), ]) return {"messages": [msg]}

graph = StateGraph(MessagesState) graph.add_node("agent", remember_node) graph.add_edge(START, "agent") graph.add_edge("agent", END)

store = InMemoryStore() app = graph.compile(checkpointer=InMemorySaver(), store=store)

app.invoke( {"messages": [HumanMessage("Remember: I always want concise bullet answers.")]}, {"configurable": {"thread_id": "ltm-a"}}, ) out = app.invoke( {"messages": [HumanMessage("What style do I prefer?")]}, {"configurable": {"thread_id": "ltm-b"}}, ) print("[LTM] Reply on a different thread_id:", out["messages"][-1].content)`

Enter fullscreen mode

Exit fullscreen mode

3. WORKING MEMORY

class WorkingState(TypedDict):  """State with a scratchpad: notes lists from all nodes are concatenated."""  messages: Annotated[list[BaseMessage], add_messages]  notes: Annotated[list[str], operator.add]

def research_step(: WorkingState) -> dict: """Simulated research node — returns structured data into working memory.""" return {"notes": ["Competitor A monthly price = $49", "Competitor B monthly price = $39"]}

def demo_working_memory(llm: ChatOpenAI) -> None: """Working memory: research node fills notes, answer node reads them."""

def answer_from_notes(state: WorkingState) -> dict: notes = "\n".join(state["notes"]) msg = llm.invoke([ SystemMessage( content="Answer using only the working notes below.\n## Working notes\n" + notes ), HumanMessage(content="Which competitor is cheaper and by how much?"), ]) return {"messages": [msg]}

graph = StateGraph(WorkingState) graph.add_node("research", research_step) graph.add_node("answer", answer_from_notes) graph.add_edge(START, "research") graph.add_edge("research", "answer") graph.add_edge("answer", END) app = graph.compile() out = app.invoke({"messages": [], "notes": []}) print("[Working] Final:", out["messages"][-1].content)`

Enter fullscreen mode

Exit fullscreen mode

4. EPISODIC MEMORY

def demo_episodic_memory() -> None:  """Episodic memory: one logged event written to store, recalled via search."""  store = InMemoryStore()  ns = ("users", "demo-user", "episodes")  eid = str(uuid.uuid4())  store.put(  ns,  eid,  {  "task": "pricing_review",  "outcome": "Chose plan B after comparing three quotes",  },  )  results = store.search(ns, limit=5)  print("[Episodic] Stored episodes:", [r.value for r in results])

Enter fullscreen mode

Exit fullscreen mode

5. SEMANTIC MEMORY (RAG)

def build_kb() -> FAISS:  """Build an in-memory FAISS index over profile document chunks."""  return FAISS.from_documents(  [  Document(  page_content=(  "Seenivasa Ramadurai works at Provizient. He architects cloud-native software — "  "microservices, gRPC, REST — and delivers GenAI, LLMs, and agentic patterns."  )  ),  Document(  page_content=(  "At Provizient, skills include C#, Python, Java, Scala, TypeScript; LLMs, RAG, "  "orchestration; ML and MLOps; vector databases; APIs; Kubernetes and Docker."  )  ),  ],  OpenAIEmbeddings(),  )

def bind_tools(model: ChatOpenAI, tools: list): """Node factory: bind tools to the LLM and return a graph node function.""" bound = model.bind_tools(tools)

def node(state: MessagesState) -> dict: return {"messages": [bound.invoke(state["messages"])]}

return node

def demo_semantic_memory(llm: ChatOpenAI) -> None: """Semantic memory: ReAct loop with FAISS retrieval tool.""" kb = build_kb()

@tool def profile_kb_search(query: str) -> str: """Retrieve top-k chunks from the profile knowledge base.""" docs = kb.similarity_search(query, k=2) return "\n".join(d.page_content for d in docs)

tools = [profile_kb_search] graph = StateGraph(MessagesState) graph.add_node("agent", bind_tools(llm, tools)) graph.add_node("tools", ToolNode(tools)) graph.add_edge(START, "agent") graph.add_conditional_edges( "agent", tools_condition, {"tools": "tools", "end": END} ) graph.add_edge("tools", "agent") app = graph.compile()

out = app.invoke({ "messages": [ HumanMessage( "Which company does Seenivasa work for, and what are some of his skills? " "Use the knowledge tool." ) ] }) print("[Semantic] Last message:", out["messages"][-1].content)`

Enter fullscreen mode

Exit fullscreen mode

ENTRY POINT

def main() -> None:  """Run all five memory demos in sequence."""  require_api_key()  llm = ChatOpenAI(model=CHAT_MODEL, temperature=0)

print("\n=== 1. SHORT-TERM MEMORY ===") demo_short_term_memory(llm)

print("\n=== 2. LONG-TERM MEMORY ===") demo_long_term_memory(llm)

print("\n=== 3. WORKING MEMORY ===") demo_working_memory(llm)

print("\n=== 4. EPISODIC MEMORY ===") demo_episodic_memory()

print("\n=== 5. SEMANTIC MEMORY ===") demo_semantic_memory(llm)

if name == "main": main()`

Enter fullscreen mode

Exit fullscreen mode

Thanks Sreeni Ramadorai

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modeltrainingavailable

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Five Agent …modeltrainingavailableupdateproductserviceDEV Communi…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 229 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Products