By Arnav Jalan — 09 Mar 2026

LLM Function Calling: Complete Implementation Guide (2026)

Function calling turns LLMs into action-takers. JSON schemas, parallel execution, streaming, error handling. Working code for OpenAI, Anthropic, and open-source models.

Function calling transforms LLMs from text generators into action-takers. Instead of describing what could be done, the model specifies exactly which function to call and with what arguments. Your code executes the function, returns the result, and the model incorporates it into its response.

This capability powers every serious AI agent. Code interpreters, web search, database queries, API integrations, file operations. Without function calling, an LLM can only talk about doing things. With it, the model becomes the reasoning layer in a system that actually does things.

This guide covers implementation from first principles. You'll learn how function calling works at the protocol level, how to define tool schemas that models understand, and how to build reliable execution loops. We'll cover OpenAI, Anthropic, and open-source implementations, then move into advanced patterns: parallel execution, streaming with tools, error handling, and multi-step orchestration.

How Function Calling Works

The model doesn't execute functions. It generates structured output describing which function to call and what arguments to pass. Your application parses this output, executes the function, and feeds the result back to the model.

The flow:

You send a prompt plus tool definitions (JSON schemas describing available functions)
The model decides whether to respond directly or request a tool call
If it requests a tool call, you execute the function locally
You send the result back to the model
The model generates a final response incorporating the tool result

This loop can repeat. Complex tasks might involve 5-10 tool calls before the model has enough information to answer.

# Simplified flow
response = model.generate(prompt, tools=tool_definitions)

while response.wants_tool_call:
    tool_name = response.tool_call.name
    tool_args = response.tool_call.arguments
    
    result = execute_tool(tool_name, tool_args)
    
    response = model.generate(
        messages=[*previous_messages, tool_result(result)],
        tools=tool_definitions
    )

return response.text

The model outputs structured data, typically JSON, conforming to a schema you defined. Modern APIs enforce schema compliance at the generation level, meaning the output is guaranteed valid JSON matching your schema.

Tool Definition: JSON Schema Fundamentals

Every function needs a schema describing its name, purpose, and parameters. The model reads this schema to understand when and how to use the tool.

Basic Structure

{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Get current weather for a location",
    "parameters": {
      "type": "object",
      "properties": {
        "location": {
          "type": "string",
          "description": "City and country, e.g. 'London, UK'"
        },
        "unit": {
          "type": "string",
          "enum": ["celsius", "fahrenheit"],
          "description": "Temperature unit"
        }
      },
      "required": ["location"],
      "additionalProperties": false
    },
    "strict": true
  }
}

Key elements:

name: Function identifier. Use snake_case. Be specific: search_customer_orders beats search.
description: When and why to use this tool. The model relies heavily on this text.
parameters: JSON Schema defining accepted inputs. Include descriptions for each property.
required: Which parameters must be provided.
strict: Enforces exact schema compliance. Always enable this.

Writing Effective Descriptions

Descriptions matter more than you'd expect. Research from Gorilla and ToolAlpaca found that precise descriptions improve parameter accuracy by 30%+.

Bad description:

"description": "Search function"

Good description:

"description": "Search for products in the catalog. Use when the user asks to find, look up, or browse products. Returns product IDs, names, prices, and availability. Supports filtering by category, price range, and brand."

Include:

When to use the tool (trigger conditions)
What it returns (output format)
Constraints or limitations
Examples of valid inputs

Parameter Constraints

Use JSON Schema features to constrain inputs:

{
  "type": "object",
  "properties": {
    "quantity": {
      "type": "integer",
      "minimum": 1,
      "maximum": 100,
      "description": "Number of items (1-100)"
    },
    "status": {
      "type": "string",
      "enum": ["pending", "shipped", "delivered"],
      "description": "Filter by order status"
    },
    "date_range": {
      "type": "object",
      "properties": {
        "start": { "type": "string", "format": "date" },
        "end": { "type": "string", "format": "date" }
      },
      "required": ["start", "end"]
    }
  }
}

Enums, min/max values, and nested objects all help the model generate correct parameters.

OpenAI Implementation

OpenAI's function calling uses the tools parameter in chat completions.

Basic Example

from openai import OpenAI

client = OpenAI()

tools = [{
    "type": "function",
    "function": {
        "name": "get_stock_price",
        "description": "Get the current stock price for a ticker symbol",
        "parameters": {
            "type": "object",
            "properties": {
                "symbol": {
                    "type": "string",
                    "description": "Stock ticker symbol, e.g. AAPL, GOOGL"
                }
            },
            "required": ["symbol"],
            "additionalProperties": False
        },
        "strict": True
    }
}]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's Apple's stock price?"}],
    tools=tools
)

# Check if model wants to call a tool
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Function: {tool_call.function.name}")
    print(f"Arguments: {tool_call.function.arguments}")

Complete Execution Loop

import json

def get_stock_price(symbol: str) -> dict:
    # Your actual implementation
    return {"symbol": symbol, "price": 178.50, "currency": "USD"}

def execute_tool(name: str, args: dict) -> str:
    if name == "get_stock_price":
        result = get_stock_price(**args)
        return json.dumps(result)
    raise ValueError(f"Unknown tool: {name}")

def run_conversation(user_message: str):
    messages = [{"role": "user", "content": user_message}]
    
    while True:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools
        )
        
        assistant_message = response.choices[0].message
        messages.append(assistant_message)
        
        # No tool calls = final response
        if not assistant_message.tool_calls:
            return assistant_message.content
        
        # Execute each tool call
        for tool_call in assistant_message.tool_calls:
            args = json.loads(tool_call.function.arguments)
            result = execute_tool(tool_call.function.name, args)
            
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": result
            })

Controlling Tool Choice

# Let model decide (default)
tool_choice="auto"

# Force a specific tool
tool_choice={"type": "function", "function": {"name": "get_stock_price"}}

# Prevent tool use entirely
tool_choice="none"

# Require at least one tool call
tool_choice="required"

Structured Outputs with Strict Mode

Setting strict: True guarantees the model's output matches your schema exactly. No missing required fields, no invalid enum values, no extra properties.

tools = [{
    "type": "function",
    "function": {
        "name": "create_order",
        "strict": True,  # Enforces schema compliance
        "parameters": {
            "type": "object",
            "properties": {
                "product_id": {"type": "string"},
                "quantity": {"type": "integer", "minimum": 1}
            },
            "required": ["product_id", "quantity"],
            "additionalProperties": False  # Required for strict mode
        }
    }
}]

Strict mode requires additionalProperties: false on all objects in your schema.

Anthropic Implementation

Claude uses a similar pattern with slightly different structure.

Basic Example

import anthropic

client = anthropic.Anthropic()

tools = [{
    "name": "get_weather",
    "description": "Get current weather for a location",
    "input_schema": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "City and country, e.g. 'Paris, France'"
            }
        },
        "required": ["location"]
    }
}]

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}]
)

Handling Tool Use Responses

Claude's response contains content blocks of different types:

for block in response.content:
    if block.type == "text":
        print(f"Text: {block.text}")
    elif block.type == "tool_use":
        print(f"Tool: {block.name}")
        print(f"Input: {block.input}")
        print(f"ID: {block.id}")

Complete Loop

def run_claude_conversation(user_message: str):
    messages = [{"role": "user", "content": user_message}]
    
    while True:
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=4096,
            tools=tools,
            messages=messages
        )
        
        # Check stop reason
        if response.stop_reason == "end_turn":
            # Extract final text
            for block in response.content:
                if block.type == "text":
                    return block.text
        
        # Process tool calls
        if response.stop_reason == "tool_use":
            # Add assistant's response to messages
            messages.append({
                "role": "assistant",
                "content": response.content
            })
            
            # Execute tools and collect results
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    result = execute_tool(block.name, block.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result
                    })
            
            # Add tool results
            messages.append({
                "role": "user",
                "content": tool_results
            })

Key Differences from OpenAI

Aspect	OpenAI	Anthropic
Schema key	`parameters`	`input_schema`
Tool call indicator	`tool_calls` array	`tool_use` content blocks
Stop reason	`finish_reason: "tool_calls"`	`stop_reason: "tool_use"`
Tool result role	`role: "tool"`	`role: "user"` with `tool_result` type

Both APIs are converging toward similar patterns, but you'll need provider-specific handling.

Open-Source Models

Open-source models vary significantly in function calling capability. Some are trained specifically for tool use; others need prompting tricks.

Models with Native Support

Strong function calling:

Llama 3.1/3.3 70B (native tool use format)
Qwen 2.5 Instruct (Hermes-style format)
Mistral Large (native function calling)
Hermes 2 Pro (purpose-built for tools)
Functionary (specialized for function calling)

Usable with prompting:

DeepSeek V3 (JSON output works, native support evolving)
Gemma 3 (Python-style function definitions work better than JSON)

vLLM Implementation

vLLM provides OpenAI-compatible tool calling for supported models:

from openai import OpenAI

# Point to local vLLM server
client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="not-needed"
)

tools = [{
    "type": "function",
    "function": {
        "name": "search_database",
        "description": "Search the product database",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string"}
            },
            "required": ["query"]
        }
    }
}]

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-70B-Instruct",
    messages=[{"role": "user", "content": "Find laptops under $1000"}],
    tools=tools,
    tool_choice="auto"
)

vLLM handles the chat template and tool format automatically for supported models.

Ollama

Ollama supports function calling for compatible models:

import ollama

response = ollama.chat(
    model="llama3.1",
    messages=[{"role": "user", "content": "What's 25 * 47?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "calculate",
            "description": "Perform arithmetic calculations",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {"type": "string"}
                },
                "required": ["expression"]
            }
        }
    }]
)

if response.message.tool_calls:
    for tool in response.message.tool_calls:
        print(tool.function.name, tool.function.arguments)

Note: Streaming with tool calls has known issues in Ollama. Use stream=False for reliability.

Model-Specific Formats

Different models expect different prompt formats for tools. Hermes-style (used by Qwen, Hermes 2):

<|im_start|>system
You are a helpful assistant with access to the following functions:
{"name": "get_weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}}}
<|im_end|>

Llama 3.1 native format is handled automatically by vLLM and Ollama when using the tools parameter.

Parallel Function Calling

Models can request multiple function calls simultaneously when the calls are independent.

# User: "What's the weather in London and Paris?"

# Model response contains two tool calls:
# 1. get_weather(location="London, UK")
# 2. get_weather(location="Paris, France")

Handling Parallel Calls

import asyncio

async def execute_tools_parallel(tool_calls):
    tasks = []
    for call in tool_calls:
        args = json.loads(call.function.arguments)
        task = asyncio.create_task(
            async_execute_tool(call.function.name, args)
        )
        tasks.append((call.id, task))
    
    results = []
    for call_id, task in tasks:
        result = await task
        results.append({
            "role": "tool",
            "tool_call_id": call_id,
            "content": json.dumps(result)
        })
    return results

Performance Impact

Parallel calls reduce latency dramatically. If each external API call takes 200ms:

Calls	Sequential	Parallel
2	400ms	200ms
5	1000ms	200ms
10	2000ms	200ms

The parallel approach also reduces the number of model inference passes, cutting token costs.

Disabling Parallel Calls

Some scenarios require sequential execution (when call B depends on call A's result):

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    parallel_tool_calls=False  # Force one tool at a time
)

Streaming with Tools

Streaming tool calls lets you show progress as the model generates function arguments.

OpenAI Streaming

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Search for recent AI papers"}],
    tools=tools,
    stream=True
)

tool_calls = {}

for chunk in stream:
    delta = chunk.choices[0].delta
    
    if delta.tool_calls:
        for tc in delta.tool_calls:
            idx = tc.index
            if idx not in tool_calls:
                tool_calls[idx] = {
                    "id": tc.id,
                    "name": tc.function.name,
                    "arguments": ""
                }
            if tc.function.arguments:
                tool_calls[idx]["arguments"] += tc.function.arguments
                # Show progress
                print(f"Building args: {tool_calls[idx]['arguments']}")

When Streaming Helps

Streaming function arguments is useful when:

Arguments are large (complex search queries, code generation)
You want to show the user what the model is doing
You need to start preparing resources before the call completes

For simple function calls with small arguments, non-streaming is often simpler.

Error Handling

Tool execution fails. Networks timeout. APIs return errors. Your system needs to handle these gracefully.

Basic Error Pattern

def execute_tool_safely(name: str, args: dict) -> dict:
    try:
        if name == "search_database":
            return search_database(**args)
        elif name == "get_weather":
            return get_weather(**args)
        else:
            return {"error": f"Unknown tool: {name}"}
    except TimeoutError:
        return {"error": "Request timed out. Try again."}
    except ValidationError as e:
        return {"error": f"Invalid parameters: {e}"}
    except Exception as e:
        return {"error": f"Tool execution failed: {str(e)}"}

Returning Errors to the Model

The model can often recover from errors if you explain what went wrong:

# Instead of crashing, return error as tool result
tool_result = {
    "role": "tool",
    "tool_call_id": call_id,
    "content": json.dumps({
        "error": "Database connection failed",
        "suggestion": "Try a different search term or retry"
    })
}

The model receives this error, understands the tool failed, and can either retry with different parameters or explain the issue to the user.

Retry Logic

async def execute_with_retry(name: str, args: dict, max_retries: int = 3):
    for attempt in range(max_retries):
        try:
            return await async_execute_tool(name, args)
        except (TimeoutError, ConnectionError) as e:
            if attempt == max_retries - 1:
                return {"error": f"Failed after {max_retries} attempts: {e}"}
            await asyncio.sleep(2 ** attempt)  # Exponential backoff

Schema Validation Failures

Even with strict mode, you should validate tool outputs:

from pydantic import BaseModel, ValidationError

class WeatherResult(BaseModel):
    temperature: float
    conditions: str
    humidity: int

def get_weather(location: str) -> dict:
    raw_result = call_weather_api(location)
    try:
        validated = WeatherResult(**raw_result)
        return validated.model_dump()
    except ValidationError as e:
        return {"error": f"API returned invalid data: {e}"}

Multi-Step Tool Chains

Complex tasks require multiple dependent tool calls. The model calls tool A, uses the result to decide how to call tool B, and so on.

Example: Research Task

User: "Find the top 3 AI papers from last month and summarize their key findings."

Step 1: search_papers(query="AI", date_range="last_month", limit=10)
        → Returns list of papers with IDs

Step 2: get_paper_details(paper_id="arxiv:2401.12345")
        → Returns full abstract, authors, citations

Step 3: get_paper_details(paper_id="arxiv:2401.67890")
        → Returns second paper details

Step 4: get_paper_details(paper_id="arxiv:2401.11111")
        → Returns third paper details

Final: Model synthesizes findings into summary

Context Management

Each tool call and result consumes context. A chain of 5-8 calls with verbose results can use 30-50% of the context window.

Mitigation strategies:

Summarize verbose results: Extract only fields needed for next steps
Clear completed steps: In multi-turn, keep summaries not full results
Split long chains: Handle in phases, each with 2-3 calls

def compress_tool_result(result: dict, keep_fields: list) -> dict:
    """Extract only necessary fields from verbose API responses"""
    return {k: result[k] for k in keep_fields if k in result}

Agentic Patterns

For complex agent workflows, consider frameworks that handle orchestration:

ReAct pattern: Reasoning → Action → Observation loop
LangGraph: Graph-based agent orchestration
CrewAI: Multi-agent collaboration

These abstract the tool loop and add planning, memory, and coordination between multiple agents.

Best Practices

Tool Design

Single responsibility: Each tool does one thing well
Clear boundaries: Obvious when to use tool A vs tool B
Descriptive names: search_customer_orders not search
Constrained parameters: Use enums, min/max, required fields
Useful descriptions: Include examples, edge cases, return format

Schema Organization

For systems with many tools, use namespaces:

tools = [
    {"name": "crm.search_contacts", ...},
    {"name": "crm.update_contact", ...},
    {"name": "billing.get_invoice", ...},
    {"name": "billing.create_charge", ...}
]

Namespaces help the model distinguish between similar tools in different domains.

Limiting Tool Count

Models degrade with too many tools. Guidelines:

Tools	Model Performance
1-10	Excellent
10-30	Good with clear descriptions
30-50	Degraded, consider tool search
50+	Use dynamic tool loading

For large tool sets, implement tool search: the model first calls a search tool to find relevant tools, then those tools are loaded for use.

Security

Never let the model construct arbitrary code or database queries. Treat tool arguments as untrusted input:

# BAD: SQL injection risk
def search_users(query: str):
    return db.execute(f"SELECT * FROM users WHERE name LIKE '%{query}%'")

# GOOD: Parameterized query
def search_users(query: str):
    return db.execute(
        "SELECT * FROM users WHERE name LIKE ?",
        [f"%{query}%"]
    )

Validate all inputs. Limit tool permissions. Log tool calls for audit.

Provider Comparison

Feature	OpenAI	Anthropic	vLLM (open-source)
Strict schema enforcement	Yes	Yes	Model-dependent
Parallel tool calls	Yes	Yes (Sonnet 4+)	Yes
Streaming tool calls	Yes	Yes	Yes
Tool choice control	auto/none/required/specific	auto/any/specific	auto/none/specific
Built-in tools	web_search, code_interpreter	web_search, code_execution	None

For teams building production systems, PremAI provides a unified API across providers with consistent tool calling behavior, plus fine-tuning capabilities to improve tool use accuracy on your specific functions.

Quick Reference

OpenAI Tool Schema

{
    "type": "function",
    "function": {
        "name": "tool_name",
        "description": "What the tool does",
        "parameters": {
            "type": "object",
            "properties": {...},
            "required": [...],
            "additionalProperties": False
        },
        "strict": True
    }
}

Anthropic Tool Schema

{
    "name": "tool_name",
    "description": "What the tool does",
    "input_schema": {
        "type": "object",
        "properties": {...},
        "required": [...]
    }
}

Execution Loop Checklist

[ ] Define tools with clear descriptions
[ ] Enable strict mode where available
[ ] Handle tool_calls in response
[ ] Execute tools with error handling
[ ] Return results in provider-specific format
[ ] Loop until stop_reason indicates completion
[ ] Implement timeout for runaway loops

Summary

Function calling extends LLMs from text generation into action. The core pattern is simple: define tools with JSON schemas, check if the model wants to call them, execute locally, return results.

Implementation complexity comes from:

Provider differences in schema format and response structure
Parallel execution and dependency management
Streaming tool calls for responsive UIs
Error handling and recovery
Context management in multi-step chains

Start simple. One tool, one execution loop, no streaming. Get that working reliably. Add parallel calls when you have independent operations. Add streaming when responsiveness matters. Build multi-step chains when tasks require it.

For enterprise deployments needing consistent behavior across providers, Prem Studio handles the provider abstraction and lets you fine-tune models specifically for your tool definitions, improving accuracy on your exact function schemas.

How Function Calling Works

Tool Definition: JSON Schema Fundamentals

Basic Structure

Writing Effective Descriptions

Parameter Constraints

OpenAI Implementation

Basic Example

Complete Execution Loop

Controlling Tool Choice

Structured Outputs with Strict Mode

Anthropic Implementation

Basic Example

Handling Tool Use Responses

Complete Loop

Key Differences from OpenAI

Open-Source Models

Models with Native Support

vLLM Implementation

Ollama

Model-Specific Formats

Parallel Function Calling

Handling Parallel Calls

Performance Impact

Disabling Parallel Calls

Streaming with Tools

OpenAI Streaming

When Streaming Helps

Error Handling

Basic Error Pattern

Returning Errors to the Model

Retry Logic

Schema Validation Failures

Multi-Step Tool Chains

Example: Research Task

Context Management

Agentic Patterns

Best Practices

Tool Design

Schema Organization

Limiting Tool Count

Security

Provider Comparison

Quick Reference

OpenAI Tool Schema

Anthropic Tool Schema

Execution Loop Checklist

Summary

Subscribe to Prem AI