LLM Function Calling: Complete Implementation Guide (2026)
Function calling turns LLMs into action-takers. JSON schemas, parallel execution, streaming, error handling. Working code for OpenAI, Anthropic, and open-source models.
Function calling transforms LLMs from text generators into action-takers. Instead of describing what could be done, the model specifies exactly which function to call and with what arguments. Your code executes the function, returns the result, and the model incorporates it into its response.
This capability powers every serious AI agent. Code interpreters, web search, database queries, API integrations, file operations. Without function calling, an LLM can only talk about doing things. With it, the model becomes the reasoning layer in a system that actually does things.
This guide covers implementation from first principles. You'll learn how function calling works at the protocol level, how to define tool schemas that models understand, and how to build reliable execution loops. We'll cover OpenAI, Anthropic, and open-source implementations, then move into advanced patterns: parallel execution, streaming with tools, error handling, and multi-step orchestration.
How Function Calling Works
The model doesn't execute functions. It generates structured output describing which function to call and what arguments to pass. Your application parses this output, executes the function, and feeds the result back to the model.
The flow:
- You send a prompt plus tool definitions (JSON schemas describing available functions)
- The model decides whether to respond directly or request a tool call
- If it requests a tool call, you execute the function locally
- You send the result back to the model
- The model generates a final response incorporating the tool result
This loop can repeat. Complex tasks might involve 5-10 tool calls before the model has enough information to answer.
# Simplified flow
response = model.generate(prompt, tools=tool_definitions)
while response.wants_tool_call:
tool_name = response.tool_call.name
tool_args = response.tool_call.arguments
result = execute_tool(tool_name, tool_args)
response = model.generate(
messages=[*previous_messages, tool_result(result)],
tools=tool_definitions
)
return response.text
The model outputs structured data, typically JSON, conforming to a schema you defined. Modern APIs enforce schema compliance at the generation level, meaning the output is guaranteed valid JSON matching your schema.
Tool Definition: JSON Schema Fundamentals
Every function needs a schema describing its name, purpose, and parameters. The model reads this schema to understand when and how to use the tool.
Basic Structure
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and country, e.g. 'London, UK'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["location"],
"additionalProperties": false
},
"strict": true
}
}
Key elements:
- name: Function identifier. Use snake_case. Be specific:
search_customer_ordersbeatssearch. - description: When and why to use this tool. The model relies heavily on this text.
- parameters: JSON Schema defining accepted inputs. Include descriptions for each property.
- required: Which parameters must be provided.
- strict: Enforces exact schema compliance. Always enable this.
Writing Effective Descriptions
Descriptions matter more than you'd expect. Research from Gorilla and ToolAlpaca found that precise descriptions improve parameter accuracy by 30%+.
Bad description:
"description": "Search function"
Good description:
"description": "Search for products in the catalog. Use when the user asks to find, look up, or browse products. Returns product IDs, names, prices, and availability. Supports filtering by category, price range, and brand."
Include:
- When to use the tool (trigger conditions)
- What it returns (output format)
- Constraints or limitations
- Examples of valid inputs
Parameter Constraints
Use JSON Schema features to constrain inputs:
{
"type": "object",
"properties": {
"quantity": {
"type": "integer",
"minimum": 1,
"maximum": 100,
"description": "Number of items (1-100)"
},
"status": {
"type": "string",
"enum": ["pending", "shipped", "delivered"],
"description": "Filter by order status"
},
"date_range": {
"type": "object",
"properties": {
"start": { "type": "string", "format": "date" },
"end": { "type": "string", "format": "date" }
},
"required": ["start", "end"]
}
}
}
Enums, min/max values, and nested objects all help the model generate correct parameters.
OpenAI Implementation
OpenAI's function calling uses the tools parameter in chat completions.
Basic Example
from openai import OpenAI
client = OpenAI()
tools = [{
"type": "function",
"function": {
"name": "get_stock_price",
"description": "Get the current stock price for a ticker symbol",
"parameters": {
"type": "object",
"properties": {
"symbol": {
"type": "string",
"description": "Stock ticker symbol, e.g. AAPL, GOOGL"
}
},
"required": ["symbol"],
"additionalProperties": False
},
"strict": True
}
}]
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What's Apple's stock price?"}],
tools=tools
)
# Check if model wants to call a tool
if response.choices[0].message.tool_calls:
tool_call = response.choices[0].message.tool_calls[0]
print(f"Function: {tool_call.function.name}")
print(f"Arguments: {tool_call.function.arguments}")
Complete Execution Loop
import json
def get_stock_price(symbol: str) -> dict:
# Your actual implementation
return {"symbol": symbol, "price": 178.50, "currency": "USD"}
def execute_tool(name: str, args: dict) -> str:
if name == "get_stock_price":
result = get_stock_price(**args)
return json.dumps(result)
raise ValueError(f"Unknown tool: {name}")
def run_conversation(user_message: str):
messages = [{"role": "user", "content": user_message}]
while True:
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools
)
assistant_message = response.choices[0].message
messages.append(assistant_message)
# No tool calls = final response
if not assistant_message.tool_calls:
return assistant_message.content
# Execute each tool call
for tool_call in assistant_message.tool_calls:
args = json.loads(tool_call.function.arguments)
result = execute_tool(tool_call.function.name, args)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result
})
Controlling Tool Choice
# Let model decide (default)
tool_choice="auto"
# Force a specific tool
tool_choice={"type": "function", "function": {"name": "get_stock_price"}}
# Prevent tool use entirely
tool_choice="none"
# Require at least one tool call
tool_choice="required"
Structured Outputs with Strict Mode
Setting strict: True guarantees the model's output matches your schema exactly. No missing required fields, no invalid enum values, no extra properties.
tools = [{
"type": "function",
"function": {
"name": "create_order",
"strict": True, # Enforces schema compliance
"parameters": {
"type": "object",
"properties": {
"product_id": {"type": "string"},
"quantity": {"type": "integer", "minimum": 1}
},
"required": ["product_id", "quantity"],
"additionalProperties": False # Required for strict mode
}
}
}]
Strict mode requires additionalProperties: false on all objects in your schema.
Anthropic Implementation
Claude uses a similar pattern with slightly different structure.
Basic Example
import anthropic
client = anthropic.Anthropic()
tools = [{
"name": "get_weather",
"description": "Get current weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and country, e.g. 'Paris, France'"
}
},
"required": ["location"]
}
}]
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
tools=tools,
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}]
)
Handling Tool Use Responses
Claude's response contains content blocks of different types:
for block in response.content:
if block.type == "text":
print(f"Text: {block.text}")
elif block.type == "tool_use":
print(f"Tool: {block.name}")
print(f"Input: {block.input}")
print(f"ID: {block.id}")
Complete Loop
def run_claude_conversation(user_message: str):
messages = [{"role": "user", "content": user_message}]
while True:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
tools=tools,
messages=messages
)
# Check stop reason
if response.stop_reason == "end_turn":
# Extract final text
for block in response.content:
if block.type == "text":
return block.text
# Process tool calls
if response.stop_reason == "tool_use":
# Add assistant's response to messages
messages.append({
"role": "assistant",
"content": response.content
})
# Execute tools and collect results
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
# Add tool results
messages.append({
"role": "user",
"content": tool_results
})
Key Differences from OpenAI
| Aspect | OpenAI | Anthropic |
|---|---|---|
| Schema key | parameters |
input_schema |
| Tool call indicator | tool_calls array |
tool_use content blocks |
| Stop reason | finish_reason: "tool_calls" |
stop_reason: "tool_use" |
| Tool result role | role: "tool" |
role: "user" with tool_result type |
Both APIs are converging toward similar patterns, but you'll need provider-specific handling.
Open-Source Models
Open-source models vary significantly in function calling capability. Some are trained specifically for tool use; others need prompting tricks.
Models with Native Support
Strong function calling:
- Llama 3.1/3.3 70B (native tool use format)
- Qwen 2.5 Instruct (Hermes-style format)
- Mistral Large (native function calling)
- Hermes 2 Pro (purpose-built for tools)
- Functionary (specialized for function calling)
Usable with prompting:
- DeepSeek V3 (JSON output works, native support evolving)
- Gemma 3 (Python-style function definitions work better than JSON)
vLLM Implementation
vLLM provides OpenAI-compatible tool calling for supported models:
from openai import OpenAI
# Point to local vLLM server
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="not-needed"
)
tools = [{
"type": "function",
"function": {
"name": "search_database",
"description": "Search the product database",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"}
},
"required": ["query"]
}
}
}]
response = client.chat.completions.create(
model="meta-llama/Llama-3.1-70B-Instruct",
messages=[{"role": "user", "content": "Find laptops under $1000"}],
tools=tools,
tool_choice="auto"
)
vLLM handles the chat template and tool format automatically for supported models.
Ollama
Ollama supports function calling for compatible models:
import ollama
response = ollama.chat(
model="llama3.1",
messages=[{"role": "user", "content": "What's 25 * 47?"}],
tools=[{
"type": "function",
"function": {
"name": "calculate",
"description": "Perform arithmetic calculations",
"parameters": {
"type": "object",
"properties": {
"expression": {"type": "string"}
},
"required": ["expression"]
}
}
}]
)
if response.message.tool_calls:
for tool in response.message.tool_calls:
print(tool.function.name, tool.function.arguments)
Note: Streaming with tool calls has known issues in Ollama. Use stream=False for reliability.
Model-Specific Formats
Different models expect different prompt formats for tools. Hermes-style (used by Qwen, Hermes 2):
<|im_start|>system
You are a helpful assistant with access to the following functions:
{"name": "get_weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}}}
<|im_end|>
Llama 3.1 native format is handled automatically by vLLM and Ollama when using the tools parameter.
Parallel Function Calling
Models can request multiple function calls simultaneously when the calls are independent.
# User: "What's the weather in London and Paris?"
# Model response contains two tool calls:
# 1. get_weather(location="London, UK")
# 2. get_weather(location="Paris, France")
Handling Parallel Calls
import asyncio
async def execute_tools_parallel(tool_calls):
tasks = []
for call in tool_calls:
args = json.loads(call.function.arguments)
task = asyncio.create_task(
async_execute_tool(call.function.name, args)
)
tasks.append((call.id, task))
results = []
for call_id, task in tasks:
result = await task
results.append({
"role": "tool",
"tool_call_id": call_id,
"content": json.dumps(result)
})
return results
Performance Impact
Parallel calls reduce latency dramatically. If each external API call takes 200ms:
| Calls | Sequential | Parallel |
|---|---|---|
| 2 | 400ms | 200ms |
| 5 | 1000ms | 200ms |
| 10 | 2000ms | 200ms |
The parallel approach also reduces the number of model inference passes, cutting token costs.
Disabling Parallel Calls
Some scenarios require sequential execution (when call B depends on call A's result):
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
parallel_tool_calls=False # Force one tool at a time
)
Streaming with Tools
Streaming tool calls lets you show progress as the model generates function arguments.
OpenAI Streaming
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Search for recent AI papers"}],
tools=tools,
stream=True
)
tool_calls = {}
for chunk in stream:
delta = chunk.choices[0].delta
if delta.tool_calls:
for tc in delta.tool_calls:
idx = tc.index
if idx not in tool_calls:
tool_calls[idx] = {
"id": tc.id,
"name": tc.function.name,
"arguments": ""
}
if tc.function.arguments:
tool_calls[idx]["arguments"] += tc.function.arguments
# Show progress
print(f"Building args: {tool_calls[idx]['arguments']}")
When Streaming Helps
Streaming function arguments is useful when:
- Arguments are large (complex search queries, code generation)
- You want to show the user what the model is doing
- You need to start preparing resources before the call completes
For simple function calls with small arguments, non-streaming is often simpler.
Error Handling
Tool execution fails. Networks timeout. APIs return errors. Your system needs to handle these gracefully.
Basic Error Pattern
def execute_tool_safely(name: str, args: dict) -> dict:
try:
if name == "search_database":
return search_database(**args)
elif name == "get_weather":
return get_weather(**args)
else:
return {"error": f"Unknown tool: {name}"}
except TimeoutError:
return {"error": "Request timed out. Try again."}
except ValidationError as e:
return {"error": f"Invalid parameters: {e}"}
except Exception as e:
return {"error": f"Tool execution failed: {str(e)}"}
Returning Errors to the Model
The model can often recover from errors if you explain what went wrong:
# Instead of crashing, return error as tool result
tool_result = {
"role": "tool",
"tool_call_id": call_id,
"content": json.dumps({
"error": "Database connection failed",
"suggestion": "Try a different search term or retry"
})
}
The model receives this error, understands the tool failed, and can either retry with different parameters or explain the issue to the user.
Retry Logic
async def execute_with_retry(name: str, args: dict, max_retries: int = 3):
for attempt in range(max_retries):
try:
return await async_execute_tool(name, args)
except (TimeoutError, ConnectionError) as e:
if attempt == max_retries - 1:
return {"error": f"Failed after {max_retries} attempts: {e}"}
await asyncio.sleep(2 ** attempt) # Exponential backoff
Schema Validation Failures
Even with strict mode, you should validate tool outputs:
from pydantic import BaseModel, ValidationError
class WeatherResult(BaseModel):
temperature: float
conditions: str
humidity: int
def get_weather(location: str) -> dict:
raw_result = call_weather_api(location)
try:
validated = WeatherResult(**raw_result)
return validated.model_dump()
except ValidationError as e:
return {"error": f"API returned invalid data: {e}"}
Multi-Step Tool Chains
Complex tasks require multiple dependent tool calls. The model calls tool A, uses the result to decide how to call tool B, and so on.
Example: Research Task
User: "Find the top 3 AI papers from last month and summarize their key findings."
Step 1: search_papers(query="AI", date_range="last_month", limit=10)
→ Returns list of papers with IDs
Step 2: get_paper_details(paper_id="arxiv:2401.12345")
→ Returns full abstract, authors, citations
Step 3: get_paper_details(paper_id="arxiv:2401.67890")
→ Returns second paper details
Step 4: get_paper_details(paper_id="arxiv:2401.11111")
→ Returns third paper details
Final: Model synthesizes findings into summary
Context Management
Each tool call and result consumes context. A chain of 5-8 calls with verbose results can use 30-50% of the context window.
Mitigation strategies:
- Summarize verbose results: Extract only fields needed for next steps
- Clear completed steps: In multi-turn, keep summaries not full results
- Split long chains: Handle in phases, each with 2-3 calls
def compress_tool_result(result: dict, keep_fields: list) -> dict:
"""Extract only necessary fields from verbose API responses"""
return {k: result[k] for k in keep_fields if k in result}
Agentic Patterns
For complex agent workflows, consider frameworks that handle orchestration:
- ReAct pattern: Reasoning → Action → Observation loop
- LangGraph: Graph-based agent orchestration
- CrewAI: Multi-agent collaboration
These abstract the tool loop and add planning, memory, and coordination between multiple agents.
Best Practices
Tool Design
- Single responsibility: Each tool does one thing well
- Clear boundaries: Obvious when to use tool A vs tool B
- Descriptive names:
search_customer_ordersnotsearch - Constrained parameters: Use enums, min/max, required fields
- Useful descriptions: Include examples, edge cases, return format
Schema Organization
For systems with many tools, use namespaces:
tools = [
{"name": "crm.search_contacts", ...},
{"name": "crm.update_contact", ...},
{"name": "billing.get_invoice", ...},
{"name": "billing.create_charge", ...}
]
Namespaces help the model distinguish between similar tools in different domains.
Limiting Tool Count
Models degrade with too many tools. Guidelines:
| Tools | Model Performance |
|---|---|
| 1-10 | Excellent |
| 10-30 | Good with clear descriptions |
| 30-50 | Degraded, consider tool search |
| 50+ | Use dynamic tool loading |
For large tool sets, implement tool search: the model first calls a search tool to find relevant tools, then those tools are loaded for use.
Security
Never let the model construct arbitrary code or database queries. Treat tool arguments as untrusted input:
# BAD: SQL injection risk
def search_users(query: str):
return db.execute(f"SELECT * FROM users WHERE name LIKE '%{query}%'")
# GOOD: Parameterized query
def search_users(query: str):
return db.execute(
"SELECT * FROM users WHERE name LIKE ?",
[f"%{query}%"]
)
Validate all inputs. Limit tool permissions. Log tool calls for audit.
Provider Comparison
| Feature | OpenAI | Anthropic | vLLM (open-source) |
|---|---|---|---|
| Strict schema enforcement | Yes | Yes | Model-dependent |
| Parallel tool calls | Yes | Yes (Sonnet 4+) | Yes |
| Streaming tool calls | Yes | Yes | Yes |
| Tool choice control | auto/none/required/specific | auto/any/specific | auto/none/specific |
| Built-in tools | web_search, code_interpreter | web_search, code_execution | None |
For teams building production systems, PremAI provides a unified API across providers with consistent tool calling behavior, plus fine-tuning capabilities to improve tool use accuracy on your specific functions.
Quick Reference
OpenAI Tool Schema
{
"type": "function",
"function": {
"name": "tool_name",
"description": "What the tool does",
"parameters": {
"type": "object",
"properties": {...},
"required": [...],
"additionalProperties": False
},
"strict": True
}
}
Anthropic Tool Schema
{
"name": "tool_name",
"description": "What the tool does",
"input_schema": {
"type": "object",
"properties": {...},
"required": [...]
}
}
Execution Loop Checklist
- [ ] Define tools with clear descriptions
- [ ] Enable strict mode where available
- [ ] Handle tool_calls in response
- [ ] Execute tools with error handling
- [ ] Return results in provider-specific format
- [ ] Loop until stop_reason indicates completion
- [ ] Implement timeout for runaway loops
Summary
Function calling extends LLMs from text generation into action. The core pattern is simple: define tools with JSON schemas, check if the model wants to call them, execute locally, return results.
Implementation complexity comes from:
- Provider differences in schema format and response structure
- Parallel execution and dependency management
- Streaming tool calls for responsive UIs
- Error handling and recovery
- Context management in multi-step chains
Start simple. One tool, one execution loop, no streaming. Get that working reliably. Add parallel calls when you have independent operations. Add streaming when responsiveness matters. Build multi-step chains when tasks require it.
For enterprise deployments needing consistent behavior across providers, Prem Studio handles the provider abstraction and lets you fine-tune models specifically for your tool definitions, improving accuracy on your exact function schemas.