By Arnav Jalan — 16 Mar 2026

LangGraph Deep Dive: State Machines, Tools, and Human-in-the-Loop

LangGraph solves a specific problem that other agent frameworks don't: cycles. Most orchestration tools are DAGs (directed acyclic graphs). They flow in one direction. But real agents need loops. They need to call an LLM, check the result, maybe call a tool, check again, decide if more information is needed, loop back, and eventually exit.

That loop is the hard part. LangGraph makes it straightforward by representing your agent as a state machine. Nodes do work. Edges define transitions. State tracks everything the agent knows. Conditional edges let the agent decide where to go next based on what it's learned.

This guide teaches LangGraph from first principles. We'll start with the core concepts, build increasingly complex examples, and finish with a complete research agent that searches the web, evaluates its findings, and iterates until it has enough information to answer. Working code at every step.

Core Concepts: State, Nodes, Edges

LangGraph models agent workflows as graphs. Three components:

State: A dictionary that holds everything the agent knows. Every node reads from and writes to this shared state.

Nodes: Functions that do work. Call an LLM, execute a tool, process data. Each node receives the current state and returns updates.

Edges: Connections between nodes. They define which node runs next. Edges can be unconditional (always follow this path) or conditional (decide based on state).

Let's see the simplest possible example:

from langgraph.graph import StateGraph, START, END
from typing import TypedDict

# 1. Define state schema
class AgentState(TypedDict):
    messages: list
    
# 2. Define a node (just a function)
def greet(state: AgentState) -> dict:
    return {"messages": state["messages"] + ["Hello!"]}

# 3. Build the graph
graph = StateGraph(AgentState)
graph.add_node("greet", greet)
graph.add_edge(START, "greet")
graph.add_edge("greet", END)

# 4. Compile and run
app = graph.compile()
result = app.invoke({"messages": []})
print(result)
# {'messages': ['Hello!']}

That's the complete structure. Everything else builds on this pattern.

Understanding State

State is the memory of your agent. It persists across nodes and gets updated as the agent works. You define its shape with TypedDict:

from typing import TypedDict, List, Optional

class ResearchState(TypedDict):
    query: str                      # User's question
    search_queries: List[str]       # Generated search terms
    search_results: List[dict]      # Raw results from web
    sources: List[str]              # Cited URLs
    answer: Optional[str]           # Final response
    iteration: int                  # Loop counter

Every field you need to track goes here. Nodes read what they need, compute something, and return updates.

State Updates: Overwrite vs Append

By default, returning a key overwrites it:

def update_answer(state):
    return {"answer": "New answer"}  # Replaces previous value

For lists where you want to append (like message history), use the Annotated pattern with operator.add:

from typing import Annotated
import operator

class ChatState(TypedDict):
    messages: Annotated[list, operator.add]  # Appends instead of replaces

Now returning {"messages": [new_message]} adds to the list instead of replacing it.

The `add_messages` Helper

For chat-style agents, LangGraph provides add_messages:

from langgraph.graph.message import add_messages
from typing import Annotated

class State(TypedDict):
    messages: Annotated[list, add_messages]

This handles deduplication, ordering, and the specific format LangChain messages expect.

Nodes: Where Work Happens

A node is a function that:

Receives the current state
Does some computation
Returns a dictionary of state updates

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o")

def call_llm(state: AgentState) -> dict:
    """Node that calls the LLM"""
    messages = state["messages"]
    response = llm.invoke(messages)
    return {"messages": [response]}

def check_result(state: AgentState) -> dict:
    """Node that evaluates the response"""
    last_message = state["messages"][-1]
    # Some evaluation logic
    return {"needs_more_info": len(last_message.content) < 100}

Nodes can do anything: LLM calls, API requests, database queries, calculations. The only requirement is they take state and return updates.

Adding Nodes to the Graph

graph = StateGraph(AgentState)
graph.add_node("call_llm", call_llm)
graph.add_node("check_result", check_result)
graph.add_node("search_web", search_web)

Node names are strings. Use them to define edges.

Edges: Control Flow

Edges define how the agent moves between nodes.

Unconditional Edges

Always go from A to B:

graph.add_edge("call_llm", "check_result")  # After LLM, always check
graph.add_edge(START, "call_llm")           # Start at call_llm
graph.add_edge("format_answer", END)        # End after formatting

Conditional Edges

Choose the next node based on state:

def route_after_check(state: AgentState) -> str:
    """Decide where to go based on state"""
    if state.get("needs_more_info"):
        return "search_web"
    return "format_answer"

graph.add_conditional_edges(
    "check_result",         # Source node
    route_after_check,      # Routing function
    {
        "search_web": "search_web",      # If function returns "search_web"
        "format_answer": "format_answer"  # If function returns "format_answer"
    }
)

The routing function receives state and returns a string matching one of the destination keys. This is how agents make decisions.

The Complete Pattern

from langgraph.graph import StateGraph, START, END

graph = StateGraph(AgentState)

# Add all nodes
graph.add_node("generate_query", generate_query)
graph.add_node("search", search)
graph.add_node("evaluate", evaluate)
graph.add_node("answer", answer)

# Entry point
graph.add_edge(START, "generate_query")

# Unconditional edges
graph.add_edge("generate_query", "search")
graph.add_edge("search", "evaluate")

# Conditional edge (the loop!)
graph.add_conditional_edges(
    "evaluate",
    should_continue,
    {
        "continue": "generate_query",  # Loop back
        "finish": "answer"             # Exit
    }
)

# Exit point
graph.add_edge("answer", END)

# Compile
app = graph.compile()

This creates a loop: generate query → search → evaluate → (continue? loop back : answer).

Building a ReAct Agent

The ReAct pattern (Reasoning + Acting) is the most common agent architecture. The agent:

Thinks about what to do
Decides whether to use a tool
If yes, executes the tool
Loops back to think about the result
Eventually responds

Define Tools

from langchain_core.tools import tool

@tool
def search_web(query: str) -> str:
    """Search the web for information."""
    # In production, use Tavily, SerpAPI, etc.
    return f"Search results for: {query}"

@tool
def get_weather(location: str) -> str:
    """Get current weather for a location."""
    return f"Weather in {location}: 72°F, sunny"

tools = [search_web, get_weather]

Bind Tools to LLM

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o")
llm_with_tools = llm.bind_tools(tools)

Define the Agent Node

def agent(state: AgentState) -> dict:
    """The reasoning node - decides what to do next"""
    messages = state["messages"]
    response = llm_with_tools.invoke(messages)
    return {"messages": [response]}

Define the Tool Executor Node

from langgraph.prebuilt import ToolNode

tool_node = ToolNode(tools)

ToolNode handles parsing tool calls from the LLM response and executing them.

Define the Router

def should_use_tool(state: AgentState) -> str:
    """Check if the last message has tool calls"""
    last_message = state["messages"][-1]
    if last_message.tool_calls:
        return "tools"
    return "end"

Wire It Together

from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from typing import Annotated, TypedDict

class AgentState(TypedDict):
    messages: Annotated[list, add_messages]

# Build graph
graph = StateGraph(AgentState)
graph.add_node("agent", agent)
graph.add_node("tools", tool_node)

# Edges
graph.add_edge(START, "agent")
graph.add_conditional_edges(
    "agent",
    should_use_tool,
    {"tools": "tools", "end": END}
)
graph.add_edge("tools", "agent")  # After tool, back to agent

app = graph.compile()

Run It

result = app.invoke({
    "messages": [{"role": "user", "content": "What's the weather in Tokyo?"}]
})

for message in result["messages"]:
    print(f"{message.type}: {message.content}")

The agent will:

Receive the question
Decide to call get_weather
Execute the tool
Loop back with the result
Formulate the final answer
Exit

The Prebuilt ReAct Agent

LangGraph provides create_react_agent for common cases:

from langgraph.prebuilt import create_react_agent

app = create_react_agent(llm, tools)

result = app.invoke({
    "messages": [{"role": "user", "content": "Search for LangGraph tutorials"}]
})

This handles all the wiring automatically. Use it when the default ReAct pattern fits. Build custom graphs when you need more control.

Persistence: Memory That Survives

By default, state only exists during a single invocation. For multi-turn conversations or long-running tasks, you need persistence.

Checkpointers

A checkpointer saves state after every node execution:

from langgraph.checkpoint.memory import MemorySaver

checkpointer = MemorySaver()  # In-memory (dev only)
app = graph.compile(checkpointer=checkpointer)

For production:

from langgraph.checkpoint.postgres import PostgresSaver

checkpointer = PostgresSaver.from_conn_string("postgresql://...")
app = graph.compile(checkpointer=checkpointer)

Thread IDs

Each conversation gets a thread_id. Pass it in the config:

config = {"configurable": {"thread_id": "user-123-conversation-1"}}

# First message
result = app.invoke(
    {"messages": [{"role": "user", "content": "Hi, I'm Alice"}]},
    config
)

# Second message - agent remembers the first
result = app.invoke(
    {"messages": [{"role": "user", "content": "What's my name?"}]},
    config
)
# Agent knows it's Alice because state was persisted

Time Travel

With checkpoints, you can inspect or revert to any previous state:

# Get all checkpoints for a thread
checkpoints = list(checkpointer.list(config))

# Load a specific checkpoint
old_state = checkpointer.get(config, checkpoint_id="abc123")

# Resume from that point
result = app.invoke(
    {"messages": [{"role": "user", "content": "Try something different"}]},
    {"configurable": {"thread_id": "...", "checkpoint_id": "abc123"}}
)

This is invaluable for debugging and error recovery.

Human-in-the-Loop

Real agents need human oversight. LangGraph provides two mechanisms: breakpoints and the interrupt function.

Static Breakpoints

Pause before or after specific nodes:

app = graph.compile(
    checkpointer=checkpointer,
    interrupt_before=["execute_dangerous_action"],
    interrupt_after=["generate_plan"]
)

The graph pauses, saves state, and waits. You inspect, approve, and resume:

config = {"configurable": {"thread_id": "task-1"}}

# Run until interrupt
result = app.invoke({"messages": [...]}, config)
# Graph pauses after "generate_plan"

# Check what the agent wants to do
print(result["plan"])

# Resume execution
result = app.invoke(None, config)  # None continues from checkpoint

Dynamic Interrupts with `interrupt()`

For more control, pause from within a node:

from langgraph.types import interrupt, Command

def review_action(state: AgentState) -> dict:
    """Pause and ask human for approval"""
    proposed_action = state["proposed_action"]
    
    # This pauses execution and returns to the caller
    human_decision = interrupt({
        "question": f"Approve this action? {proposed_action}",
        "options": ["approve", "reject", "edit"]
    })
    
    if human_decision == "reject":
        return {"status": "cancelled"}
    
    return {"approved": True}

Resume with Command:

from langgraph.types import Command

# Initial run (pauses at interrupt)
result = app.invoke({"messages": [...]}, config)

# Check what it's asking
print(result["__interrupt__"])  # Shows the interrupt payload

# Resume with human decision
result = app.invoke(
    Command(resume="approve"),  # Pass the decision
    config
)

The key insight: interrupt() saves the entire execution state. The agent can wait minutes, hours, or days. When you resume, it continues exactly where it left off.

Practical Example: Approve Before Write

def write_file(state: AgentState) -> dict:
    """Write to file with human approval"""
    filename = state["filename"]
    content = state["content"]
    
    # Ask for approval
    decision = interrupt({
        "action": "write_file",
        "filename": filename,
        "content_preview": content[:500],
        "message": "Approve this file write?"
    })
    
    if decision != "approve":
        return {"status": "write_cancelled"}
    
    # Proceed with write
    with open(filename, "w") as f:
        f.write(content)
    
    return {"status": "file_written", "path": filename}

This pattern works for any risky operation: database writes, API calls, emails, payments.

Building a Research Agent

Let's build something real: an agent that researches a topic by searching the web, evaluating results, and iterating until it has enough information.

State Definition

from typing import TypedDict, List, Optional, Annotated
from langgraph.graph.message import add_messages

class ResearchState(TypedDict):
    messages: Annotated[list, add_messages]
    query: str                           # Original question
    search_queries: List[str]            # Generated search terms
    search_results: List[dict]           # Results from web searches
    sources: List[str]                   # URLs for citations
    gaps: List[str]                      # Identified knowledge gaps
    iteration: int                       # Current loop count
    max_iterations: int                  # Limit to prevent infinite loops
    final_answer: Optional[str]

Node 1: Generate Search Queries

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o")

def generate_queries(state: ResearchState) -> dict:
    """Generate search queries based on the question and any gaps"""
    query = state["query"]
    gaps = state.get("gaps", [])
    
    prompt = f"""Generate 3 search queries to research this question:
Question: {query}

{"Knowledge gaps to address: " + str(gaps) if gaps else ""}

Return queries as a JSON list of strings."""
    
    response = llm.invoke([{"role": "user", "content": prompt}])
    
    # Parse response (in production, use structured output)
    import json
    queries = json.loads(response.content)
    
    return {"search_queries": queries}

Node 2: Execute Web Search

from langchain_community.tools import TavilySearchResults

search_tool = TavilySearchResults(max_results=3)

def search_web(state: ResearchState) -> dict:
    """Execute searches for all queries"""
    queries = state["search_queries"]
    all_results = []
    sources = []
    
    for query in queries:
        results = search_tool.invoke(query)
        for r in results:
            all_results.append({
                "query": query,
                "title": r.get("title", ""),
                "content": r.get("content", ""),
                "url": r.get("url", "")
            })
            if r.get("url"):
                sources.append(r["url"])
    
    return {
        "search_results": state.get("search_results", []) + all_results,
        "sources": list(set(state.get("sources", []) + sources))
    }

Node 3: Evaluate and Reflect

def evaluate_results(state: ResearchState) -> dict:
    """Analyze results and identify gaps"""
    query = state["query"]
    results = state["search_results"]
    iteration = state.get("iteration", 0)
    
    results_text = "\n\n".join([
        f"Source: {r['url']}\n{r['content'][:500]}"
        for r in results[-9:]  # Last 9 results (3 queries × 3 results)
    ])
    
    prompt = f"""Evaluate if we have enough information to answer this question:

Question: {query}

Search Results:
{results_text}

Respond with JSON:
{{
    "sufficient": true/false,
    "gaps": ["list of missing information if not sufficient"],
    "summary": "brief summary of what we know"
}}"""
    
    response = llm.invoke([{"role": "user", "content": prompt}])
    
    import json
    evaluation = json.loads(response.content)
    
    return {
        "gaps": evaluation.get("gaps", []),
        "iteration": iteration + 1,
        "_sufficient": evaluation["sufficient"]  # Used for routing
    }

Node 4: Generate Final Answer

def generate_answer(state: ResearchState) -> dict:
    """Synthesize research into final answer"""
    query = state["query"]
    results = state["search_results"]
    sources = state["sources"]
    
    results_text = "\n\n".join([
        f"[{i+1}] {r['content'][:1000]}"
        for i, r in enumerate(results[:12])
    ])
    
    prompt = f"""Based on this research, answer the question.

Question: {query}

Research:
{results_text}

Requirements:
- Provide a comprehensive answer
- Include citations using [1], [2], etc.
- Be factual and balanced"""
    
    response = llm.invoke([{"role": "user", "content": prompt}])
    
    # Add source list
    source_list = "\n\nSources:\n" + "\n".join([
        f"[{i+1}] {url}" for i, url in enumerate(sources[:12])
    ])
    
    return {"final_answer": response.content + source_list}

Routing Function

def should_continue_research(state: ResearchState) -> str:
    """Decide whether to continue searching or finalize"""
    # Check if we have enough info
    if state.get("_sufficient", False):
        return "answer"
    
    # Check iteration limit
    if state.get("iteration", 0) >= state.get("max_iterations", 3):
        return "answer"
    
    # Check if we have gaps to address
    if state.get("gaps"):
        return "search_more"
    
    return "answer"

Assemble the Graph

from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver

# Build graph
graph = StateGraph(ResearchState)

# Add nodes
graph.add_node("generate_queries", generate_queries)
graph.add_node("search", search_web)
graph.add_node("evaluate", evaluate_results)
graph.add_node("answer", generate_answer)

# Entry
graph.add_edge(START, "generate_queries")

# Flow
graph.add_edge("generate_queries", "search")
graph.add_edge("search", "evaluate")

# Conditional: continue or finish
graph.add_conditional_edges(
    "evaluate",
    should_continue_research,
    {
        "search_more": "generate_queries",  # Loop back
        "answer": "answer"                   # Exit to answer
    }
)

# Exit
graph.add_edge("answer", END)

# Compile with persistence
checkpointer = MemorySaver()
app = graph.compile(checkpointer=checkpointer)

Run the Research Agent

config = {"configurable": {"thread_id": "research-1"}}

result = app.invoke({
    "messages": [],
    "query": "What are the latest developments in quantum computing?",
    "search_queries": [],
    "search_results": [],
    "sources": [],
    "gaps": [],
    "iteration": 0,
    "max_iterations": 3,
    "final_answer": None
}, config)

print(result["final_answer"])

The agent will:

Generate initial search queries
Search the web
Evaluate if the results are sufficient
If not, identify gaps and search again
Loop up to 3 times
Generate final answer with citations

Visualizing the Graph

from IPython.display import Image, display

display(Image(app.get_graph().draw_mermaid_png()))

This outputs a flowchart showing nodes and edges, helpful for debugging.

Streaming

For long-running agents, stream intermediate results:

# Stream all state updates
for chunk in app.stream(initial_state, config):
    print(chunk)

# Stream specific events
for event in app.stream(initial_state, config, stream_mode="updates"):
    node_name = list(event.keys())[0]
    node_output = event[node_name]
    print(f"{node_name}: {node_output}")

This lets you show progress to users as the agent works.

Production Patterns

Error Handling

Wrap node logic in try/except and update state with error info:

def search_with_retry(state: ResearchState) -> dict:
    """Search with error handling"""
    try:
        results = search_tool.invoke(state["search_queries"][0])
        return {"search_results": results, "error": None}
    except Exception as e:
        return {
            "search_results": [],
            "error": str(e),
            "retry_count": state.get("retry_count", 0) + 1
        }

Add conditional edges for error recovery:

def route_after_search(state):
    if state.get("error") and state.get("retry_count", 0) < 3:
        return "retry"
    elif state.get("error"):
        return "fallback"
    return "continue"

Parallel Execution

LangGraph can run independent nodes in parallel. When using conditional edges with Send, you can spawn multiple branches:

from langgraph.types import Send

def distribute_searches(state):
    """Create parallel search tasks"""
    queries = state["search_queries"]
    return [Send("search_single", {"query": q}) for q in queries]

graph.add_conditional_edges("generate_queries", distribute_searches)

Each Send creates a parallel branch. Results merge back into state.

Subgraphs

For complex agents, compose smaller graphs:

# Define a subgraph
search_graph = StateGraph(SearchState)
# ... add nodes and edges ...
search_app = search_graph.compile()

# Use it as a node in parent graph
def search_node(state):
    result = search_app.invoke({"query": state["query"]})
    return {"search_results": result["results"]}

parent_graph.add_node("search", search_node)

This keeps code modular and testable.

Common Pitfalls

1. Forgetting the Checkpointer for Interrupts

Interrupts require persistence. Without a checkpointer, the state is lost:

# WRONG - interrupts won't work
app = graph.compile()

# RIGHT
app = graph.compile(checkpointer=MemorySaver())

2. Infinite Loops

Always include exit conditions:

def should_continue(state):
    if state["iteration"] >= state["max_iterations"]:
        return "end"  # Force exit
    # ... rest of logic

3. State Schema Mismatches

Nodes must return keys that exist in your state schema:

class State(TypedDict):
    messages: list
    
def bad_node(state):
    return {"invalid_key": "value"}  # Will fail!

4. Not Handling Tool Errors

Tools can fail. Handle it:

def execute_tool(state):
    try:
        result = tool.invoke(state["tool_input"])
        return {"tool_result": result}
    except Exception as e:
        return {"tool_error": str(e), "tool_result": None}

When to Use LangGraph

Good fit:

Agents that need loops (search → evaluate → search again)
Multi-step workflows with branching
Human-in-the-loop requirements
Long-running tasks needing persistence
Complex tool orchestration

Maybe overkill:

Simple LLM calls without tools
Linear pipelines (A → B → C)
Stateless request/response patterns

For simpler cases, LangChain's basic chains or direct API calls might be cleaner. LangGraph shines when your agent logic is genuinely complex.

Multi-Agent Systems

When one agent isn't enough, LangGraph supports multi-agent architectures.

Supervisor Pattern

One agent coordinates multiple specialist agents:

from typing import Literal
from langgraph.types import Command

class SupervisorState(TypedDict):
    messages: Annotated[list, add_messages]
    next_agent: str

def supervisor(state: SupervisorState) -> Command:
    """Decide which agent should handle the task"""
    messages = state["messages"]
    
    prompt = f"""You are a supervisor managing these agents:
- researcher: searches the web for information
- analyst: analyzes data and provides insights
- writer: writes reports and summaries

Based on the conversation, which agent should act next?
Or respond FINISH if the task is complete.

Respond with just the agent name or FINISH."""
    
    response = llm.invoke([{"role": "system", "content": prompt}] + messages)
    next_agent = response.content.strip().lower()
    
    if next_agent == "finish":
        return Command(goto=END)
    
    return Command(goto=next_agent)

# Build the graph
graph = StateGraph(SupervisorState)

graph.add_node("supervisor", supervisor)
graph.add_node("researcher", researcher_agent)
graph.add_node("analyst", analyst_agent)
graph.add_node("writer", writer_agent)

# All agents report back to supervisor
graph.add_edge("researcher", "supervisor")
graph.add_edge("analyst", "supervisor")
graph.add_edge("writer", "supervisor")

# Supervisor routes to agents
graph.add_edge(START, "supervisor")

Hierarchical Teams

For complex tasks, nest supervisors:

Top Supervisor
├── Research Team Supervisor
│   ├── Web Researcher
│   └── Document Analyst
└── Writing Team Supervisor
    ├── Copywriter
    └── Editor

Each team is its own subgraph. The top supervisor delegates to team supervisors.

Message Passing Between Agents

Agents communicate through the shared state:

class MultiAgentState(TypedDict):
    messages: Annotated[list, add_messages]
    research_findings: List[str]
    analysis_results: dict
    draft: str
    feedback: List[str]

def researcher(state):
    # Do research
    findings = perform_research(state["messages"][-1].content)
    return {"research_findings": findings}

def analyst(state):
    # Analyze the research
    findings = state["research_findings"]
    analysis = analyze(findings)
    return {"analysis_results": analysis}

def writer(state):
    # Write based on analysis
    analysis = state["analysis_results"]
    draft = write_report(analysis)
    return {"draft": draft}

Each agent reads from what previous agents wrote and adds its contribution.

Advanced Tool Patterns

Tools That Modify State

Sometimes tools need to update agent state directly:

from langgraph.types import Command

def counter_tool(state):
    """Tool that increments a counter in state"""
    current = state.get("tool_call_count", 0)
    
    # Return Command to update state AND specify next node
    return Command(
        update={"tool_call_count": current + 1},
        goto="agent"  # Go back to agent after
    )

Dynamic Tool Loading

Load tools based on context:

def select_tools(state):
    """Dynamically select which tools to make available"""
    topic = state.get("topic", "general")
    
    if topic == "coding":
        return [code_executor, linter, git_tool]
    elif topic == "research":
        return [web_search, arxiv_search, wiki_search]
    else:
        return [web_search, calculator]

def agent_with_dynamic_tools(state):
    tools = select_tools(state)
    llm_with_tools = llm.bind_tools(tools)
    response = llm_with_tools.invoke(state["messages"])
    return {"messages": [response]}

Tool Approval Workflow

Require human approval for sensitive tools:

def execute_tool_with_approval(state):
    tool_call = state["pending_tool_call"]
    
    # Check if this tool needs approval
    sensitive_tools = ["delete_file", "send_email", "make_payment"]
    
    if tool_call["name"] in sensitive_tools:
        # Interrupt for approval
        decision = interrupt({
            "tool": tool_call["name"],
            "args": tool_call["args"],
            "message": "This action requires approval"
        })
        
        if decision != "approve":
            return {"tool_result": "Action cancelled by user"}
    
    # Execute the tool
    result = execute(tool_call)
    return {"tool_result": result}

Debugging and Observability

LangSmith Integration

Track every step of your agent:

import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-key"

# Your graph runs as normal, but everything is traced
result = app.invoke({"messages": [...]})

LangSmith shows:

Every node execution
Token usage per LLM call
Tool inputs and outputs
State changes at each step
Latency breakdown

Inspecting State

Check state at any point:

# Get current state for a thread
state = app.get_state(config)
print(state.values)  # Current state values
print(state.next)    # Next node(s) to execute

# Get state history
for state in app.get_state_history(config):
    print(f"Step {state.step}: {state.next}")

Custom Logging

Add logging inside nodes:

import logging

logger = logging.getLogger("research_agent")

def search_web(state):
    logger.info(f"Searching for: {state['search_queries']}")
    
    results = perform_search(state["search_queries"])
    
    logger.info(f"Found {len(results)} results")
    logger.debug(f"Results: {results}")
    
    return {"search_results": results}

Performance Optimization

Caching

Cache expensive operations:

from functools import lru_cache

@lru_cache(maxsize=100)
def cached_search(query: str) -> str:
    return search_tool.invoke(query)

def search_node(state):
    results = []
    for query in state["queries"]:
        result = cached_search(query)  # Uses cache if available
        results.append(result)
    return {"results": results}

Batching LLM Calls

When possible, batch multiple calls:

def evaluate_multiple(state):
    """Evaluate all results in one LLM call instead of many"""
    results = state["search_results"]
    
    prompt = f"""Evaluate each of these search results:

{json.dumps(results, indent=2)}

For each result, provide:
- relevance (1-10)
- key_facts extracted

Return as JSON array."""
    
    response = llm.invoke([{"role": "user", "content": prompt}])
    evaluations = json.loads(response.content)
    
    return {"evaluations": evaluations}

Limiting Context Size

For long conversations, trim history:

from langchain_core.messages import trim_messages

def agent(state):
    messages = state["messages"]
    
    # Keep only recent messages + system message
    trimmed = trim_messages(
        messages,
        max_tokens=4000,
        strategy="last",
        token_counter=llm,
        include_system=True
    )
    
    response = llm.invoke(trimmed)
    return {"messages": [response]}

Testing LangGraph Agents

Unit Testing Nodes

Test nodes in isolation:

import pytest

def test_generate_queries():
    state = {
        "query": "What is quantum computing?",
        "gaps": []
    }
    
    result = generate_queries(state)
    
    assert "search_queries" in result
    assert len(result["search_queries"]) >= 1
    assert all(isinstance(q, str) for q in result["search_queries"])

def test_evaluate_with_sufficient_info():
    state = {
        "query": "Simple question",
        "search_results": [{"content": "Complete answer..."}],
        "iteration": 0
    }
    
    result = evaluate_results(state)
    
    assert result["_sufficient"] == True

Integration Testing

Test the full graph:

def test_research_agent_completes():
    app = build_research_agent()
    
    result = app.invoke({
        "query": "What is Python?",
        "max_iterations": 2,
        # ... other initial state
    })
    
    assert result["final_answer"] is not None
    assert len(result["sources"]) > 0
    assert result["iteration"] <= 2

def test_research_agent_handles_no_results():
    # Mock search to return empty
    with patch("search_tool.invoke", return_value=[]):
        result = app.invoke({...})
        
    # Should still produce an answer (possibly "not found")
    assert result["final_answer"] is not None

Testing Interrupts

def test_interrupt_pauses_execution():
    app = build_agent_with_approval()
    checkpointer = MemorySaver()
    app = graph.compile(checkpointer=checkpointer)
    
    config = {"configurable": {"thread_id": "test-1"}}
    
    # Run until interrupt
    result = app.invoke(
        {"task": "delete important file"},
        config
    )
    
    # Should be interrupted
    assert "__interrupt__" in result
    
    # Resume with rejection
    result = app.invoke(Command(resume="reject"), config)
    
    # File should not be deleted
    assert result["status"] == "cancelled"

Summary

LangGraph models agents as state machines:

State: Everything the agent knows, updated as it works
Nodes: Functions that do work and update state
Edges: Transitions between nodes, conditional or fixed
Checkpointers: Persistence for long-running tasks
Interrupts: Human oversight at any point

The power comes from conditional edges creating loops. The agent can reason, act, observe, and decide whether to continue or exit. This matches how humans actually solve problems: try something, check if it worked, adjust, repeat.

Start with the prebuilt create_react_agent for simple tool use. Graduate to custom graphs when you need specific control over the flow. Add persistence when conversations span multiple sessions. Add interrupts when humans need to stay in the loop.

For teams building production agents that need reliable evaluation, Prem Studio provides fine-tuning and evaluation tools that integrate with agentic workflows built on any framework.