2025, Dec 01 11:00

Why Gemini Answers Differently with LangGraph MemorySaver: Prompt Phrasing, Thread IDs, and Reliable Agent Context

Learn why LangGraph MemorySaver isn't broken with gemini-2.5-flash: similar prompts diverge. Fix it with thread_id, prompt wording, or switching to llama3.2

LangGraph Memory That “Forgets”? Why Gemini Answers Differently To Similar Prompts

When you build a simple agent with LangGraph and add MemorySaver, you expect prior turns to stick. Yet with gemini-2.5-flash the agent might greet Bob in the first turn and then answer “I don’t know your name” to “What’s my name?” in the second. It looks like a memory failure, but it isn’t.

The minimal agent that seems to forget

The following snippet runs a two-turn interaction. The first turn tells the model a name. The second asks for it back. The flow uses MemorySaver and a fixed thread_id to persist state across turns.

import os
from langchain_tavily import TavilySearch
from langchain.chat_models import init_chat_model
from langgraph.checkpoint.memory import MemorySaver
from langgraph.prebuilt import create_react_agent
from langchain_core.messages import HumanMessage
os.environ.get('TAVILY_API_KEY')
finder = TavilySearch(max_result=2)
skillset = [finder]
os.environ['HTTP_PROXY'] = 'http://127.0.0.1:7890'
os.environ['HTTPS_PROXY'] = 'http://127.0.0.1:7890'
os.environ.get('GOOGLE_API_KEY')
llm_client = init_chat_model('gemini-2.5-flash', model_provider='google-genai')
checkpoint_store = MemorySaver()
agent_runner = create_react_agent(llm_client, skillset, checkpointer=checkpoint_store)
run_cfg = {'configurable': {'thread_id': 'agent003'}}
# First turn
for event in agent_runner.stream({'messages': [HumanMessage('Hi! I am Bob!')]}, run_cfg, stream_mode='values'):
    event['messages'][-1].pretty_print()
# Collect history, append new user input, run second turn
snapshot = checkpoint_store.get(run_cfg)
prior_msgs = snapshot['channel_values']['messages']
followup_msg = HumanMessage("What's my name?")
payload = {'messages': prior_msgs + [followup_msg]}
for event in agent_runner.stream(payload, run_cfg, stream_mode='values'):
    event['messages'][-1].pretty_print()

In practice, many see a reply along the lines of “I do not know your name. You can tell it to me if you wish!”. That is surprising because the greeting earlier contained the name.

What’s actually happening

Multiple runs with the same setup show different outcomes depending on how the second question is phrased. With the same gemini-2.5-flash model, these behaviors were observed:

“What’s my name?” → “I don’t have memory of past conversations.”

“Do you know my name?” → “Yes, your name is Bob.”

“Do you remember my name?” → “Yes, I do, Bob!”

The state is being saved; the difference is how the model interprets the prompt. Gemini, like most LLMs, does not maintain structured internal memory. When prompted with “What’s my name?”, it can treat the question as a pure knowledge recall task and answer that it doesn’t know. When asked “Do you know my name?” or “Do you remember my name?”, it looks at the immediate chat history that’s been fed back in the same request and extracts “Bob.”

So the issue is not LangGraph’s memory mechanism. It is model behavior specific to Gemini. The agent example in the official tutorial uses anthropic:claude-3-5-sonnet-latest, which behaves differently from Gemini models for the same setup.

The practical fix: rely on the checkpoint and adjust the prompt

If you keep the same thread_id, LangGraph’s MemorySaver will supply the conversation to the model on the next call. You do not need to manually fetch and concatenate the history yourself for the second turn. With Gemini, phrasing the follow-up as “Do you know my name?” or “Do you remember my name?” leads it to consult the provided chat history within the request and answer correctly.

import os
from langchain_tavily import TavilySearch
from langchain.chat_models import init_chat_model
from langgraph.checkpoint.memory import MemorySaver
from langgraph.prebuilt import create_react_agent
from langchain_core.messages import HumanMessage
os.environ.get('TAVILY_API_KEY')
finder = TavilySearch(max_result=2)
skillset = [finder]
os.environ['HTTP_PROXY'] = 'http://127.0.0.1:7890'
os.environ['HTTPS_PROXY'] = 'http://127.0.0.1:7890'
os.environ.get('GOOGLE_API_KEY')
llm_client = init_chat_model('gemini-2.5-flash', model_provider='google-genai')
checkpoint_store = MemorySaver()
agent_runner = create_react_agent(llm_client, skillset, checkpointer=checkpoint_store)
run_cfg = {'configurable': {'thread_id': 'agent003'}}
# Turn 1
for event in agent_runner.stream({'messages': [HumanMessage('Hi! I am Bob!')]}, run_cfg, stream_mode='values'):
    event['messages'][-1].pretty_print()
# Turn 2: no manual history stitching; same thread_id
for event in agent_runner.stream({'messages': [HumanMessage('Do you know my name?')]}, run_cfg, stream_mode='values'):
    event['messages'][-1].pretty_print()

Alternatively, using a different model can change how the same prompt is handled. With llama3.2:latest via Ollama, “what’s my name?” returns the expected “Your name is Bob!” using the exact same agent pattern.

import os
from langchain_tavily import TavilySearch
from langgraph.checkpoint.memory import MemorySaver
from langgraph.prebuilt import create_react_agent
from langchain_core.messages import HumanMessage
from langchain_ollama import ChatOllama
from dotenv import load_dotenv
load_dotenv()
os.environ.get('TAVILY_API_KEY')
finder = TavilySearch(max_result=2)
skillset = [finder]
llm_local = ChatOllama(model="llama3.2:latest", temperature=0)
checkpoint_store = MemorySaver()
agent_runner = create_react_agent(llm_local, skillset, checkpointer=checkpoint_store)
run_cfg = {"configurable": {"thread_id": "agent003"}}
for event in agent_runner.stream({"messages": [HumanMessage("Hi! I am Bob!")]}, run_cfg, stream_mode="values"):
    event["messages"][-1].pretty_print()
for event in agent_runner.stream({"messages": [HumanMessage("what's my name?")]}, run_cfg, stream_mode="values"):
    event["messages"][-1].pretty_print()

Why this matters

Agent memory in LangGraph works by checkpointing and replaying the state to your model on the next turn. Whether the model uses that context the way you expect depends on prompt wording and model behavior. Gemini can respond differently to semantically nearby questions, even when the same conversation history is present. The official tutorial’s model choice and your runtime model might not behave the same way under identical orchestration.

Takeaways

Keep the thread_id stable so MemorySaver can restore state on each call. Import create_react_agent from langgraph.prebuilt and wire the checkpointer in. With gemini-2.5-flash, prefer follow-ups like “Do you know my name?” or “Do you remember my name?” if you want the model to consult the current chat history. If you need “What’s my name?” to behave as expected, consider swapping to a model that responds to that phrasing using the provided history, such as llama3.2:latest in the shown setup.