By Arnav Jalan — 17 Mar 2026

LangChain vs LlamaIndex (2026): Complete Production RAG Comparison

LangChain vs LlamaIndex 2026. RAG pipelines, agent frameworks, LangSmith vs Langfuse, breaking changes, and a no-BS decision guide for production teams.

Choosing between LangChain and LlamaIndex used to be straightforward. LangChain for complex workflows, LlamaIndex for retrieval. In 2026, that framing is outdated.

LangChain has effectively become LangGraph for anything production-facing. LlamaIndex added Workflows and now handles complex multi-step agents. Both frameworks have converged on overlapping territory, and the comparison every article gives you ("LangChain = orchestration, LlamaIndex = data") misses how the landscape actually looks today.

This guide goes deeper. Architecture differences, RAG pipeline comparisons with working code, agent frameworks, observability tooling, real user complaints, and a decision framework based on what you are actually building.

Quick Snapshot: Where Each Framework Stands in 2026

	LangChain / LangGraph	LlamaIndex
GitHub stars	119K (LangChain)	44K
Primary identity	Orchestration + agents	Data framework + RAG
Production agent layer	LangGraph	LlamaIndex Workflows
Observability	LangSmith (first-party)	Langfuse, Arize Phoenix (integrations)
RAG support	Via integrations	Native, purpose-built
Learning curve	Steeper	Lower for RAG use cases
Versioning stability	History of breaking changes	More stable upgrade path
Enterprise adoption	Uber, LinkedIn, Replit	Growing, document-heavy verticals
License	MIT	MIT

Both are free and open source. Paid tiers exist for LangSmith (LangChain's observability) and LlamaCloud (LlamaIndex's managed platform).

How Each Framework Thinks About the Problem

Understanding the core mental model of each framework matters more than any feature checklist.

LangChain: Start from the workflow

LangChain was built around the idea that building with LLMs is fundamentally a workflow problem. You have inputs, you want outputs, and in between there are models, tools, memory, and external data sources that need to be composed into a pipeline.

The original LangChain gave you chains: composable sequences of operations. You pipe a prompt through a model, through a parser, into a tool, and so on. This works well for prototyping and simple pipelines, but it breaks down for stateful, multi-step agent work where execution branches, loops back, or waits for human input.

That is why LangGraph exists. LangGraph replaces the original chains-and-agents model with a directed graph where nodes are functions and edges represent state transitions. It gives you explicit state management, persistent checkpoints, human-in-the-loop interrupts, and time-travel debugging. The LangChain team now positions LangGraph as the correct way to build any production agent system.

When you read "LangChain" in most comparisons, they mean the original library. When you are building anything serious today, you mean LangGraph.

LlamaIndex: Start from the data

LlamaIndex was built around the idea that the core challenge is connecting LLMs to your data reliably. Everything follows from that premise.

You ingest documents, chunk them into nodes, build indices, and query them. The framework handles the retrieval engineering: chunking strategy, embedding, vector storage, re-ranking, context window assembly. The query engine abstraction does a lot of work that you would otherwise build yourself.

LlamaIndex added agents and Workflows on top of this retrieval core. The agent system is data-centric by design: agents route queries to query engines as tools rather than treating retrieval as one capability among many.

Workflows (their async-first orchestration layer) lets you build multi-step pipelines with explicit event passing between steps. Default workflows are stateless, which is different from LangGraph's built-in state graph model.

Architecture Deep Dive

LangChain / LangGraph Architecture

The current LangChain stack has three layers:

LangChain core handles the primitives: chat models, prompts, output parsers, document loaders, retrievers, vector stores. It is where integrations live. Over 500 integrations with LLMs, vector databases, tools, and data sources.

LangGraph handles orchestration. You define a state schema, then nodes (Python functions) that read and write to state. Edges between nodes are conditional or unconditional. The graph compiles to a runtime that manages state persistence, checkpointing, and streaming.

LangSmith handles observability. Every LLM call, tool invocation, and chain execution is traced automatically. You get latency, token usage, and cost tracking out of the box, plus a testing and evaluation suite.

# LangGraph: Basic stateful RAG agent
from langgraph.graph import StateGraph, END
from langchain_core.messages import HumanMessage, AIMessage
from langchain_openai import ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from typing import TypedDict, List

# Define state schema
class AgentState(TypedDict):
    messages: List
    context: str
    answer: str

# Initialize components
llm = ChatOpenAI(model="gpt-4o-mini")
vectorstore = Chroma(
    embedding_function=OpenAIEmbeddings(),
    persist_directory="./chroma_db"
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

# Define nodes
def retrieve(state: AgentState) -> AgentState:
    query = state["messages"][-1].content
    docs = retriever.invoke(query)
    context = "\n\n".join([d.page_content for d in docs])
    return {"context": context}

def generate(state: AgentState) -> AgentState:
    prompt = f"""Answer based on this context:

{state['context']}

Question: {state['messages'][-1].content}"""
    
    response = llm.invoke([HumanMessage(content=prompt)])
    return {
        "answer": response.content,
        "messages": state["messages"] + [AIMessage(content=response.content)]
    }

# Build graph
workflow = StateGraph(AgentState)
workflow.add_node("retrieve", retrieve)
workflow.add_node("generate", generate)
workflow.set_entry_point("retrieve")
workflow.add_edge("retrieve", "generate")
workflow.add_edge("generate", END)

app = workflow.compile()

# Run
result = app.invoke({
    "messages": [HumanMessage(content="What is our Q3 revenue?")],
    "context": "",
    "answer": ""
})
print(result["answer"])

LlamaIndex Architecture

LlamaIndex is built around five core abstractions:

Data connectors (LlamaHub) ingest from 300+ sources: file systems, databases, S3, APIs, web pages. They produce Document objects.

Node parsers chunk documents into Node objects with configurable strategies: fixed size, sentence splitting, semantic chunking, hierarchical chunking.

Indices organize nodes for retrieval. The main types are VectorStoreIndex (semantic search), SummaryIndex (sequential summarization), KeywordTableIndex (keyword matching), and KnowledgeGraphIndex (graph-based).

Query engines sit on top of indices and handle the full retrieval-to-response pipeline. They run the retrieval, re-rank results, assemble context, and pass to the LLM.

Workflows (added in 2024) provide the async orchestration layer for multi-step pipelines.

# LlamaIndex: Basic RAG pipeline
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# Configure settings
Settings.llm = OpenAI(model="gpt-4o-mini")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=50)

# Load and index documents
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)

# Query
query_engine = index.as_query_engine(similarity_top_k=4)
response = query_engine.query("What is our Q3 revenue?")
print(response)

The LlamaIndex version is noticeably shorter for the same basic RAG task. That gap grows as you add more data sources, but it narrows again when you need complex stateful behavior.

RAG Capabilities: Where the Real Difference Lives

Both frameworks support retrieval-augmented generation. The difference is in how much retrieval-specific functionality comes built in versus how much you wire together yourself.

Indexing and Chunking

LlamaIndex has more built-in chunking strategies and index types. Hierarchical chunking (where parent nodes contain summaries of child chunks) is a first-class feature that takes a few lines to configure. Hybrid search combining vector similarity and BM25 keyword matching is built in.

With LangChain, you pick individual components and assemble them. You choose a text splitter, a vector store, a retriever, and a re-ranker separately. This gives more control but requires more code for the same result.

# LlamaIndex: Hierarchical chunking with auto-merging retrieval
from llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import HierarchicalNodeParser, get_leaf_nodes
from llama_index.core.retrievers import AutoMergingRetriever
from llama_index.core.query_engine import RetrieverQueryEngine

# Create hierarchical nodes: 2048 -> 512 -> 128 token chunks
node_parser = HierarchicalNodeParser.from_defaults(
    chunk_sizes=[2048, 512, 128]
)
nodes = node_parser.get_nodes_from_documents(documents)
leaf_nodes = get_leaf_nodes(nodes)

# Build index on leaf nodes only
index = VectorStoreIndex(leaf_nodes)
index.storage_context.docstore.add_documents(nodes)

# Auto-merging retriever replaces small chunks with parent context when relevant
base_retriever = index.as_retriever(similarity_top_k=12)
retriever = AutoMergingRetriever(base_retriever, index.storage_context)

query_engine = RetrieverQueryEngine.from_args(retriever)
response = query_engine.query("Summarize the contract terms")

# LangChain: Hybrid search with re-ranking
from langchain_community.vectorstores import Chroma
from langchain_community.retrievers import BM25Retriever
from langchain.retrievers import EnsembleRetriever
from langchain_community.document_compressors import CrossEncoderReranker
from langchain.retrievers import ContextualCompressionRetriever
from langchain_community.cross_encoders import HuggingFaceCrossEncoder

# Set up vector retriever
vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 10})

# Set up BM25 keyword retriever
bm25_retriever = BM25Retriever.from_documents(documents)
bm25_retriever.k = 10

# Combine with ensemble (60% vector, 40% keyword)
ensemble_retriever = EnsembleRetriever(
    retrievers=[vector_retriever, bm25_retriever],
    weights=[0.6, 0.4]
)

# Add cross-encoder re-ranking
model = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-base")
compressor = CrossEncoderReranker(model=model, top_n=4)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=ensemble_retriever
)

Both work. The LangChain version requires assembling more pieces. The upside is that each piece is swappable independently.

Advanced RAG: Sub-question decomposition

LlamaIndex has built-in sub-question decomposition. Complex questions get broken into simpler sub-questions, each answered against relevant indices, then synthesized.

# LlamaIndex: Sub-question query engine
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool
from llama_index.core import VectorStoreIndex

# Build separate indices per data source
financial_index = VectorStoreIndex.from_documents(financial_docs)
legal_index = VectorStoreIndex.from_documents(legal_docs)

# Wrap as tools with descriptions
tools = [
    QueryEngineTool.from_defaults(
        query_engine=financial_index.as_query_engine(),
        name="financial_data",
        description="Contains quarterly earnings, revenue, and financial forecasts"
    ),
    QueryEngineTool.from_defaults(
        query_engine=legal_index.as_query_engine(),
        name="legal_docs",
        description="Contains contracts, compliance documents, and legal filings"
    )
]

# Sub-question engine automatically decomposes complex queries
query_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=tools)
response = query_engine.query(
    "How do our Q3 revenue figures compare to the revenue thresholds in our partnership contract?"
)

This works because LlamaIndex's query engine model treats retrieval as the primary unit of work. In LangChain, you would build similar behavior using a LangGraph agent with tool-calling, which gives more flexibility but requires more code.

Agent Frameworks: LangGraph vs LlamaIndex Workflows

This is where the 2026 comparison gets interesting. Both frameworks now have serious multi-step agent support.

LangGraph

LangGraph represents agent state as a typed dictionary that flows through a directed graph. Nodes read and write state. Edges are conditional based on state values.

Key capabilities:

Built-in persistence via checkpointers (SQLite, PostgreSQL, Redis)
Human-in-the-loop interrupts with interrupt()
Time-travel: roll back and replay from any checkpoint
Streaming tokens and intermediate state from any node
Multi-agent patterns: supervisor, swarm, hierarchical

# LangGraph: Multi-agent with human-in-the-loop
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.sqlite import SqliteSaver
from langgraph.types import interrupt
from typing import TypedDict, Annotated
import operator

class State(TypedDict):
    messages: Annotated[list, operator.add]
    requires_approval: bool
    approved: bool

def research_agent(state: State) -> State:
    # Agent does research, flags if result needs approval
    result = llm.invoke(state["messages"])
    needs_approval = "financial" in result.content.lower()
    return {
        "messages": [result],
        "requires_approval": needs_approval
    }

def human_approval(state: State) -> State:
    if state["requires_approval"]:
        # Pause execution and wait for human input
        decision = interrupt(
            {"message": "This response involves financial data. Approve?"}
        )
        return {"approved": decision["approved"]}
    return {"approved": True}

def route_after_approval(state: State) -> str:
    return "send_response" if state["approved"] else "revise"

# Compile with persistence
checkpointer = SqliteSaver.from_conn_string(":memory:")

workflow = StateGraph(State)
workflow.add_node("research", research_agent)
workflow.add_node("approval", human_approval)
workflow.add_node("send_response", lambda s: s)
workflow.add_node("revise", lambda s: s)

workflow.set_entry_point("research")
workflow.add_edge("research", "approval")
workflow.add_conditional_edges("approval", route_after_approval)
workflow.add_edge("send_response", END)
workflow.add_edge("revise", "research")

app = workflow.compile(checkpointer=checkpointer, interrupt_before=["approval"])

LlamaIndex Workflows

LlamaIndex Workflows use an event-driven model. Steps emit events, and other steps listen for them. The whole thing is async by design.

# LlamaIndex Workflows: Multi-step RAG pipeline
from llama_index.core.workflow import (
    Workflow,
    step,
    Event,
    StartEvent,
    StopEvent,
    Context
)

class RetrievalEvent(Event):
    query: str
    chunks: list

class AnalysisEvent(Event):
    query: str
    chunks: list
    analysis: str

class RAGWorkflow(Workflow):
    
    @step
    async def retrieve(self, ctx: Context, ev: StartEvent) -> RetrievalEvent:
        query = ev.query
        retriever = index.as_retriever(similarity_top_k=6)
        nodes = await retriever.aretrieve(query)
        chunks = [n.text for n in nodes]
        return RetrievalEvent(query=query, chunks=chunks)
    
    @step
    async def analyze(self, ctx: Context, ev: RetrievalEvent) -> AnalysisEvent:
        context = "\n\n".join(ev.chunks)
        prompt = f"Analyze this context for the query: {ev.query}\n\nContext: {context}"
        response = await llm.acomplete(prompt)
        return AnalysisEvent(
            query=ev.query,
            chunks=ev.chunks,
            analysis=str(response)
        )
    
    @step
    async def synthesize(self, ctx: Context, ev: AnalysisEvent) -> StopEvent:
        prompt = f"""Based on this analysis: {ev.analysis}
        
        Provide a final answer to: {ev.query}"""
        response = await llm.acomplete(prompt)
        return StopEvent(result=str(response))

# Run workflow
workflow = RAGWorkflow(timeout=60)
result = await workflow.run(query="What are the key risks in our Q3 report?")
print(result)

LangGraph vs LlamaIndex Workflows: the real difference

LangGraph is better for complex stateful systems where you need persistence, time-travel debugging, and human-in-the-loop. The graph model makes control flow explicit and auditable. LangSmith traces make debugging straightforward.

LlamaIndex Workflows are better for data-intensive pipelines where retrieval is the core operation. The async-first design means concurrent retrieval from multiple sources happens naturally. The event model is cleaner for pipelines that are mostly sequential.

The honest LangGraph weakness: frequent breaking changes between versions. LangGraph v0.2 renamed core constants and changed import paths, requiring teams to rewrite code. Teams report spending more time on migrations than on feature work. If you pin versions and run contract tests, this is manageable. If you do not, upgrades are painful.

The honest LlamaIndex Workflows weakness: stateless by default. You manage state explicitly via the Context object. For long-running agents with complex state that needs to persist across sessions, LangGraph's checkpointing model is more mature.

Observability: How You Debug What Goes Wrong

Both frameworks take different approaches to observability, and this difference matters in production.

LangChain + LangSmith

LangSmith is LangChain's first-party tracing and evaluation platform. Set the API key and every LLM call, tool invocation, and chain step gets traced automatically with zero code changes.

import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-key"
os.environ["LANGCHAIN_PROJECT"] = "production-rag"

# All subsequent LangChain calls are traced automatically
result = app.invoke({"messages": [HumanMessage(content="What is our policy on refunds?")]})

LangSmith shows the full execution trace: every node in the graph, every LLM call with input/output, token counts, latency, and cost. You can create evaluation datasets from production traces and run automated evals against new versions.

The caveat: LangSmith had a security vulnerability in June 2025 that exposed API keys. It has been patched. For sensitive deployments, self-hosting the LangSmith server is recommended.

LlamaIndex + Third-Party Observability

LlamaIndex uses a callback-based approach that integrates with multiple third-party tools. There is no single first-party observability product like LangSmith.

The main integrations are Langfuse, Arize Phoenix, and Weights and Biases. Langfuse is the most popular for LLM tracing.

# LlamaIndex + Langfuse tracing
from llama_index.core import Settings
from llama_index.core.callbacks import CallbackManager
from langfuse.llama_index import LlamaIndexCallbackHandler

langfuse_handler = LlamaIndexCallbackHandler(
    public_key="your-public-key",
    secret_key="your-secret-key",
    host="https://cloud.langfuse.com"
)

Settings.callback_manager = CallbackManager([langfuse_handler])

# All queries and retrieval operations are now traced in Langfuse
response = query_engine.query("What does our SLA say about uptime?")

LlamaIndex also has built-in RAG evaluation metrics (faithfulness and relevancy) that you do not get out of the box with LangChain:

from llama_index.core.evaluation import FaithfulnessEvaluator, RelevancyEvaluator

faithfulness_eval = FaithfulnessEvaluator()
relevancy_eval = RelevancyEvaluator()

response = query_engine.query("What are the contract termination terms?")

# Check if answer is grounded in retrieved context
faith_result = await faithfulness_eval.aevaluate_response(response=response)
rel_result = await relevancy_eval.aevaluate_response(
    query="What are the contract termination terms?",
    response=response
)

print(f"Faithful: {faith_result.passing}, Score: {faith_result.score}")
print(f"Relevant: {rel_result.passing}, Score: {rel_result.score}")

If your team already runs LangSmith and likes the integrated experience, it is a genuine advantage for LangChain. If you want best-of-breed observability tools and do not want vendor lock-in, LlamaIndex's integration approach gives more flexibility.

Using Both Together

Many production stacks use both frameworks. LlamaIndex handles data ingestion and retrieval. LangChain/LangGraph handles orchestration, memory, and multi-step workflows.

# Hybrid stack: LlamaIndex retriever inside a LangGraph agent
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from langchain_core.tools import tool
from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI

# Build LlamaIndex retriever
documents = SimpleDirectoryReader("./contracts").load_data()
index = VectorStoreIndex.from_documents(documents)
llamaindex_query_engine = index.as_query_engine(similarity_top_k=5)

# Wrap as LangChain tool
@tool
def search_contracts(query: str) -> str:
    """Search through company contracts and legal documents."""
    response = llamaindex_query_engine.query(query)
    return str(response)

@tool  
def search_financials(query: str) -> str:
    """Search through financial reports and earnings data."""
    # Different index for financial data
    response = financial_query_engine.query(query)
    return str(response)

# LangGraph agent uses LlamaIndex tools
llm = ChatOpenAI(model="gpt-4o")
agent = create_react_agent(
    llm,
    tools=[search_contracts, search_financials]
)

result = agent.invoke({
    "messages": [("user", "Does our Q3 revenue meet the minimum in our investor agreement?")]
})
print(result["messages"][-1].content)

This pattern works well for teams that need both strong retrieval (LlamaIndex) and complex multi-step reasoning with persistence (LangGraph). The tradeoff is two frameworks to maintain and a larger dependency surface.

Self-Hosted Models with Both Frameworks

Both frameworks work with self-hosted models via Ollama, vLLM, or any OpenAI-compatible endpoint. This matters for teams running private AI infrastructure where data cannot leave the network.

# LangChain with self-hosted vLLM
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="meta-llama/Llama-3.1-8B-Instruct",
    openai_api_base="http://localhost:8000/v1",
    openai_api_key="none"  # vLLM does not require a key
)

# LlamaIndex with self-hosted vLLM
from llama_index.llms.openai_like import OpenAILike

llm = OpenAILike(
    model="meta-llama/Llama-3.1-8B-Instruct",
    api_base="http://localhost:8000/v1",
    api_key="none",
    is_chat_model=True
)
Settings.llm = llm

Both integrations are straightforward. For enterprise teams running fine-tuned models, the same pattern applies: point the api_base at your inference endpoint and the framework handles the rest.

The one thing worth noting: when you are running your own fine-tuned model, LlamaIndex's retrieval quality benefits are still there regardless of what model you use for generation. Fine-tuning a model improves generation quality. Good retrieval engineering improves what context the model sees. Both matter, and they are independent.

Production Readiness: Honest Assessment

LangChain / LangGraph

Strengths in production:

LangGraph's explicit state management catches bugs at compile time rather than runtime
LangSmith makes root-cause analysis fast when something goes wrong
800+ companies in production including Uber and LinkedIn
Broad ecosystem means most integrations already exist

Real weaknesses:

LangGraph has had significant breaking changes between versions. A developer on Reddit: "Instead of spending hours going through the rabbit holes in these frameworks, I found out an ugly hard coded way is faster to implement. Consider the break changes in LangChain through 0.1, 0.2, 0.3." Pin your versions and lock your CI.
Documentation is split across LangChain, LangGraph, and LangSmith. Navigating it for something that touches all three takes longer than it should.
Complex chains can result in many LLM calls, which drives up costs quickly. You need to monitor this actively.

LlamaIndex

Strengths in production:

More stable versioning. Fewer breaking changes between releases.
RAG pipelines take less code to get right
Built-in faithfulness and relevancy evaluation
Async-first design makes concurrent retrieval from multiple sources clean

Real weaknesses:

No equivalent to LangSmith. You are assembling observability from third-party tools.
Multi-agent state management is less mature than LangGraph for complex cases
Smaller ecosystem means some integrations require custom work
LlamaCloud (managed platform) is still maturing

Decision Framework: Which One to Use

Work through these questions in order:

Is retrieval your core problem? You have large document collections, need accurate search across them, and the main value you are delivering is answering questions from that data. Contract Q&A, enterprise search, technical documentation, legal research.

Use LlamaIndex. You will get to working RAG faster with better out-of-the-box retrieval quality.

Do you need complex stateful agents with persistence and human-in-the-loop? Your system has multiple agents that need to hand off to each other, maintain state across sessions, pause for human approval, and recover from failures without losing progress.

Use LangGraph. The state persistence and checkpointing model is more mature than LlamaIndex Workflows for this pattern.

Are you already deep in the LangChain ecosystem? You use LangSmith, your team knows LCEL, and migration costs are real.

Stay with LangGraph. The consistency is worth more than switching costs.

Do you need both strong retrieval and complex orchestration? Build a hybrid stack. LlamaIndex handles ingestion, indexing, and retrieval. LangGraph handles orchestration, memory, and multi-step agent logic. Use LlamaIndex query engines as tools inside a LangGraph agent.

Is your team small and moving fast? LlamaIndex for RAG gets to a working product in less time. The smaller API surface and purpose-built retrieval abstractions reduce the number of decisions you need to make.

Is version stability important? (It should be for any production system) LlamaIndex has a better track record here. If you use LangGraph, pin your versions, lock them in CI, and read the changelog before every upgrade.

Quick Comparison: Same Use Case, Both Frameworks

Here is a basic RAG chatbot in both frameworks, using a self-hosted model. This makes the code volume difference concrete.

# LlamaIndex: RAG chatbot with memory
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.chat_engine import CondensePlusContextChatEngine

documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(documents)

memory = ChatMemoryBuffer.from_defaults(token_limit=3000)

chat_engine = CondensePlusContextChatEngine.from_defaults(
    index.as_retriever(similarity_top_k=4),
    memory=memory,
    verbose=True
)

# Multi-turn conversation
response1 = chat_engine.chat("What are our refund terms?")
print(response1.response)

response2 = chat_engine.chat("How does that compare to our competitor's policy?")
print(response2.response)  # Maintains context from previous turn

# LangChain: RAG chatbot with memory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(model="gpt-4o-mini")

prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer based on this context:\n\n{context}"),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{question}")
])

def get_context(question):
    docs = retriever.invoke(question)
    return "\n\n".join([d.page_content for d in docs])

chain = (
    {
        "context": lambda x: get_context(x["question"]),
        "question": lambda x: x["question"],
        "history": lambda x: x.get("history", [])
    }
    | prompt
    | llm
    | StrOutputParser()
)

store = {}
def get_session_history(session_id):
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="question",
    history_messages_key="history"
)

response1 = chain_with_history.invoke(
    {"question": "What are our refund terms?"},
    config={"configurable": {"session_id": "user_123"}}
)

The LlamaIndex version is shorter for this specific task. The LangChain version is more explicit about each piece, which makes it easier to swap components.

FAQ

Is LangChain dead / declining?

No. The LangChain ecosystem saw 220% growth in GitHub stars and 300% increase in downloads between Q1 2024 and Q1 2025. LangGraph reached 1.0 stability in October 2025. The narrative that LangChain is being abandoned is not supported by the data. What is true is that the original chains-and-agents approach has been superseded by LangGraph.

Can LlamaIndex build agents?

Yes. LlamaIndex Workflows and the FunctionCallingAgentWorker both support multi-step agents. The agent model is data-centric: agents treat query engines as tools. For most RAG-focused agent use cases this is sufficient. For complex stateful multi-agent systems, LangGraph is more mature.

Which has better RAG out of the box?

LlamaIndex. Its retrieval-specific abstractions (hierarchical chunking, auto-merging, sub-question decomposition, built-in re-ranking) require less custom code to configure correctly. LangChain can match on retrieval quality but requires assembling more components.

How do they work with fine-tuned models?

Both integrate cleanly with any OpenAI-compatible endpoint. Point api_base at your inference server. The framework's retrieval and orchestration logic is model-agnostic. Fine-tuning your model improves generation quality; the RAG framework improves what context the model sees. Both improvements are independent and additive.

Which is easier to learn?

LlamaIndex for RAG-focused work. LangChain/LangGraph for orchestration-focused work. LlamaIndex's core concepts (documents, nodes, indices, query engines) are more intuitive for developers whose primary problem is retrieval. LangGraph's graph model requires understanding directed graphs and state machines, which has a steeper initial learning curve.

Should I use both?

Many production teams do. LlamaIndex for retrieval, LangGraph for orchestration. The integration is clean: wrap a LlamaIndex query engine as a LangChain tool. The tradeoff is a larger dependency surface and two frameworks to keep up to date.

What about evaluation?

LlamaIndex has built-in faithfulness and relevancy evaluators. LangSmith (LangChain) has a more comprehensive testing and evaluation suite but requires the LangSmith platform. For RAG-specific evaluation, LlamaIndex's approach is simpler to start with. For broader production monitoring and A/B testing of prompts and models, LangSmith is more complete.