LLM Function Calling: Complete Implementation Guide (2026)

Function calling turns LLMs into action-takers. JSON schemas, parallel execution, streaming, error handling. Working code for OpenAI, Anthropic, and open-source models.

LLM Function Calling: Complete Implementation Guide (2026)

Function calling transforms LLMs from text generators into action-takers. Instead of describing what could be done, the model specifies exactly which function to call and with what arguments. Your code executes the function, returns the result, and the model incorporates it into its response.

This capability powers every serious AI agent. Code interpreters, web search, database queries, API integrations, file operations. Without function calling, an LLM can only talk about doing things. With it, the model becomes the reasoning layer in a system that actually does things.

This guide covers implementation from first principles. You'll learn how function calling works at the protocol level, how to define tool schemas that models understand, and how to build reliable execution loops. We'll cover OpenAI, Anthropic, and open-source implementations, then move into advanced patterns: parallel execution, streaming with tools, error handling, and multi-step orchestration.

How Function Calling Works

The model doesn't execute functions. It generates structured output describing which function to call and what arguments to pass. Your application parses this output, executes the function, and feeds the result back to the model.

The flow:

  1. You send a prompt plus tool definitions (JSON schemas describing available functions)
  2. The model decides whether to respond directly or request a tool call
  3. If it requests a tool call, you execute the function locally
  4. You send the result back to the model
  5. The model generates a final response incorporating the tool result

This loop can repeat. Complex tasks might involve 5-10 tool calls before the model has enough information to answer.

# Simplified flow
response = model.generate(prompt, tools=tool_definitions)

while response.wants_tool_call:
    tool_name = response.tool_call.name
    tool_args = response.tool_call.arguments
    
    result = execute_tool(tool_name, tool_args)
    
    response = model.generate(
        messages=[*previous_messages, tool_result(result)],
        tools=tool_definitions
    )

return response.text

The model outputs structured data, typically JSON, conforming to a schema you defined. Modern APIs enforce schema compliance at the generation level, meaning the output is guaranteed valid JSON matching your schema.

Tool Definition: JSON Schema Fundamentals

Every function needs a schema describing its name, purpose, and parameters. The model reads this schema to understand when and how to use the tool.

Basic Structure

{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Get current weather for a location",
    "parameters": {
      "type": "object",
      "properties": {
        "location": {
          "type": "string",
          "description": "City and country, e.g. 'London, UK'"
        },
        "unit": {
          "type": "string",
          "enum": ["celsius", "fahrenheit"],
          "description": "Temperature unit"
        }
      },
      "required": ["location"],
      "additionalProperties": false
    },
    "strict": true
  }
}

Key elements:

  • name: Function identifier. Use snake_case. Be specific: search_customer_orders beats search.
  • description: When and why to use this tool. The model relies heavily on this text.
  • parameters: JSON Schema defining accepted inputs. Include descriptions for each property.
  • required: Which parameters must be provided.
  • strict: Enforces exact schema compliance. Always enable this.

Writing Effective Descriptions

Descriptions matter more than you'd expect. Research from Gorilla and ToolAlpaca found that precise descriptions improve parameter accuracy by 30%+.

Bad description:

"description": "Search function"

Good description:

"description": "Search for products in the catalog. Use when the user asks to find, look up, or browse products. Returns product IDs, names, prices, and availability. Supports filtering by category, price range, and brand."

Include:

  • When to use the tool (trigger conditions)
  • What it returns (output format)
  • Constraints or limitations
  • Examples of valid inputs

Parameter Constraints

Use JSON Schema features to constrain inputs:

{
  "type": "object",
  "properties": {
    "quantity": {
      "type": "integer",
      "minimum": 1,
      "maximum": 100,
      "description": "Number of items (1-100)"
    },
    "status": {
      "type": "string",
      "enum": ["pending", "shipped", "delivered"],
      "description": "Filter by order status"
    },
    "date_range": {
      "type": "object",
      "properties": {
        "start": { "type": "string", "format": "date" },
        "end": { "type": "string", "format": "date" }
      },
      "required": ["start", "end"]
    }
  }
}

Enums, min/max values, and nested objects all help the model generate correct parameters.

OpenAI Implementation

OpenAI's function calling uses the tools parameter in chat completions.

Basic Example

from openai import OpenAI

client = OpenAI()

tools = [{
    "type": "function",
    "function": {
        "name": "get_stock_price",
        "description": "Get the current stock price for a ticker symbol",
        "parameters": {
            "type": "object",
            "properties": {
                "symbol": {
                    "type": "string",
                    "description": "Stock ticker symbol, e.g. AAPL, GOOGL"
                }
            },
            "required": ["symbol"],
            "additionalProperties": False
        },
        "strict": True
    }
}]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's Apple's stock price?"}],
    tools=tools
)

# Check if model wants to call a tool
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Function: {tool_call.function.name}")
    print(f"Arguments: {tool_call.function.arguments}")

Complete Execution Loop

import json

def get_stock_price(symbol: str) -> dict:
    # Your actual implementation
    return {"symbol": symbol, "price": 178.50, "currency": "USD"}

def execute_tool(name: str, args: dict) -> str:
    if name == "get_stock_price":
        result = get_stock_price(**args)
        return json.dumps(result)
    raise ValueError(f"Unknown tool: {name}")

def run_conversation(user_message: str):
    messages = [{"role": "user", "content": user_message}]
    
    while True:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools
        )
        
        assistant_message = response.choices[0].message
        messages.append(assistant_message)
        
        # No tool calls = final response
        if not assistant_message.tool_calls:
            return assistant_message.content
        
        # Execute each tool call
        for tool_call in assistant_message.tool_calls:
            args = json.loads(tool_call.function.arguments)
            result = execute_tool(tool_call.function.name, args)
            
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": result
            })

Controlling Tool Choice

# Let model decide (default)
tool_choice="auto"

# Force a specific tool
tool_choice={"type": "function", "function": {"name": "get_stock_price"}}

# Prevent tool use entirely
tool_choice="none"

# Require at least one tool call
tool_choice="required"

Structured Outputs with Strict Mode

Setting strict: True guarantees the model's output matches your schema exactly. No missing required fields, no invalid enum values, no extra properties.

tools = [{
    "type": "function",
    "function": {
        "name": "create_order",
        "strict": True,  # Enforces schema compliance
        "parameters": {
            "type": "object",
            "properties": {
                "product_id": {"type": "string"},
                "quantity": {"type": "integer", "minimum": 1}
            },
            "required": ["product_id", "quantity"],
            "additionalProperties": False  # Required for strict mode
        }
    }
}]

Strict mode requires additionalProperties: false on all objects in your schema.

Anthropic Implementation

Claude uses a similar pattern with slightly different structure.

Basic Example

import anthropic

client = anthropic.Anthropic()

tools = [{
    "name": "get_weather",
    "description": "Get current weather for a location",
    "input_schema": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "City and country, e.g. 'Paris, France'"
            }
        },
        "required": ["location"]
    }
}]

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}]
)

Handling Tool Use Responses

Claude's response contains content blocks of different types:

for block in response.content:
    if block.type == "text":
        print(f"Text: {block.text}")
    elif block.type == "tool_use":
        print(f"Tool: {block.name}")
        print(f"Input: {block.input}")
        print(f"ID: {block.id}")

Complete Loop

def run_claude_conversation(user_message: str):
    messages = [{"role": "user", "content": user_message}]
    
    while True:
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=4096,
            tools=tools,
            messages=messages
        )
        
        # Check stop reason
        if response.stop_reason == "end_turn":
            # Extract final text
            for block in response.content:
                if block.type == "text":
                    return block.text
        
        # Process tool calls
        if response.stop_reason == "tool_use":
            # Add assistant's response to messages
            messages.append({
                "role": "assistant",
                "content": response.content
            })
            
            # Execute tools and collect results
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    result = execute_tool(block.name, block.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result
                    })
            
            # Add tool results
            messages.append({
                "role": "user",
                "content": tool_results
            })

Key Differences from OpenAI

Aspect OpenAI Anthropic
Schema key parameters input_schema
Tool call indicator tool_calls array tool_use content blocks
Stop reason finish_reason: "tool_calls" stop_reason: "tool_use"
Tool result role role: "tool" role: "user" with tool_result type

Both APIs are converging toward similar patterns, but you'll need provider-specific handling.

Open-Source Models

Open-source models vary significantly in function calling capability. Some are trained specifically for tool use; others need prompting tricks.

Models with Native Support

Strong function calling:

  • Llama 3.1/3.3 70B (native tool use format)
  • Qwen 2.5 Instruct (Hermes-style format)
  • Mistral Large (native function calling)
  • Hermes 2 Pro (purpose-built for tools)
  • Functionary (specialized for function calling)

Usable with prompting:

  • DeepSeek V3 (JSON output works, native support evolving)
  • Gemma 3 (Python-style function definitions work better than JSON)

vLLM Implementation

vLLM provides OpenAI-compatible tool calling for supported models:

from openai import OpenAI

# Point to local vLLM server
client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="not-needed"
)

tools = [{
    "type": "function",
    "function": {
        "name": "search_database",
        "description": "Search the product database",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string"}
            },
            "required": ["query"]
        }
    }
}]

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-70B-Instruct",
    messages=[{"role": "user", "content": "Find laptops under $1000"}],
    tools=tools,
    tool_choice="auto"
)

vLLM handles the chat template and tool format automatically for supported models.

Ollama

Ollama supports function calling for compatible models:

import ollama

response = ollama.chat(
    model="llama3.1",
    messages=[{"role": "user", "content": "What's 25 * 47?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "calculate",
            "description": "Perform arithmetic calculations",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {"type": "string"}
                },
                "required": ["expression"]
            }
        }
    }]
)

if response.message.tool_calls:
    for tool in response.message.tool_calls:
        print(tool.function.name, tool.function.arguments)

Note: Streaming with tool calls has known issues in Ollama. Use stream=False for reliability.

Model-Specific Formats

Different models expect different prompt formats for tools. Hermes-style (used by Qwen, Hermes 2):

<|im_start|>system
You are a helpful assistant with access to the following functions:
{"name": "get_weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}}}
<|im_end|>

Llama 3.1 native format is handled automatically by vLLM and Ollama when using the tools parameter.

Parallel Function Calling

Models can request multiple function calls simultaneously when the calls are independent.

# User: "What's the weather in London and Paris?"

# Model response contains two tool calls:
# 1. get_weather(location="London, UK")
# 2. get_weather(location="Paris, France")

Handling Parallel Calls

import asyncio

async def execute_tools_parallel(tool_calls):
    tasks = []
    for call in tool_calls:
        args = json.loads(call.function.arguments)
        task = asyncio.create_task(
            async_execute_tool(call.function.name, args)
        )
        tasks.append((call.id, task))
    
    results = []
    for call_id, task in tasks:
        result = await task
        results.append({
            "role": "tool",
            "tool_call_id": call_id,
            "content": json.dumps(result)
        })
    return results

Performance Impact

Parallel calls reduce latency dramatically. If each external API call takes 200ms:

Calls Sequential Parallel
2 400ms 200ms
5 1000ms 200ms
10 2000ms 200ms

The parallel approach also reduces the number of model inference passes, cutting token costs.

Disabling Parallel Calls

Some scenarios require sequential execution (when call B depends on call A's result):

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    parallel_tool_calls=False  # Force one tool at a time
)

Streaming with Tools

Streaming tool calls lets you show progress as the model generates function arguments.

OpenAI Streaming

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Search for recent AI papers"}],
    tools=tools,
    stream=True
)

tool_calls = {}

for chunk in stream:
    delta = chunk.choices[0].delta
    
    if delta.tool_calls:
        for tc in delta.tool_calls:
            idx = tc.index
            if idx not in tool_calls:
                tool_calls[idx] = {
                    "id": tc.id,
                    "name": tc.function.name,
                    "arguments": ""
                }
            if tc.function.arguments:
                tool_calls[idx]["arguments"] += tc.function.arguments
                # Show progress
                print(f"Building args: {tool_calls[idx]['arguments']}")

When Streaming Helps

Streaming function arguments is useful when:

  • Arguments are large (complex search queries, code generation)
  • You want to show the user what the model is doing
  • You need to start preparing resources before the call completes

For simple function calls with small arguments, non-streaming is often simpler.

Error Handling

Tool execution fails. Networks timeout. APIs return errors. Your system needs to handle these gracefully.

Basic Error Pattern

def execute_tool_safely(name: str, args: dict) -> dict:
    try:
        if name == "search_database":
            return search_database(**args)
        elif name == "get_weather":
            return get_weather(**args)
        else:
            return {"error": f"Unknown tool: {name}"}
    except TimeoutError:
        return {"error": "Request timed out. Try again."}
    except ValidationError as e:
        return {"error": f"Invalid parameters: {e}"}
    except Exception as e:
        return {"error": f"Tool execution failed: {str(e)}"}

Returning Errors to the Model

The model can often recover from errors if you explain what went wrong:

# Instead of crashing, return error as tool result
tool_result = {
    "role": "tool",
    "tool_call_id": call_id,
    "content": json.dumps({
        "error": "Database connection failed",
        "suggestion": "Try a different search term or retry"
    })
}

The model receives this error, understands the tool failed, and can either retry with different parameters or explain the issue to the user.

Retry Logic

async def execute_with_retry(name: str, args: dict, max_retries: int = 3):
    for attempt in range(max_retries):
        try:
            return await async_execute_tool(name, args)
        except (TimeoutError, ConnectionError) as e:
            if attempt == max_retries - 1:
                return {"error": f"Failed after {max_retries} attempts: {e}"}
            await asyncio.sleep(2 ** attempt)  # Exponential backoff

Schema Validation Failures

Even with strict mode, you should validate tool outputs:

from pydantic import BaseModel, ValidationError

class WeatherResult(BaseModel):
    temperature: float
    conditions: str
    humidity: int

def get_weather(location: str) -> dict:
    raw_result = call_weather_api(location)
    try:
        validated = WeatherResult(**raw_result)
        return validated.model_dump()
    except ValidationError as e:
        return {"error": f"API returned invalid data: {e}"}

Multi-Step Tool Chains

Complex tasks require multiple dependent tool calls. The model calls tool A, uses the result to decide how to call tool B, and so on.

Example: Research Task

User: "Find the top 3 AI papers from last month and summarize their key findings."

Step 1: search_papers(query="AI", date_range="last_month", limit=10)
        → Returns list of papers with IDs

Step 2: get_paper_details(paper_id="arxiv:2401.12345")
        → Returns full abstract, authors, citations

Step 3: get_paper_details(paper_id="arxiv:2401.67890")
        → Returns second paper details

Step 4: get_paper_details(paper_id="arxiv:2401.11111")
        → Returns third paper details

Final: Model synthesizes findings into summary

Context Management

Each tool call and result consumes context. A chain of 5-8 calls with verbose results can use 30-50% of the context window.

Mitigation strategies:

  1. Summarize verbose results: Extract only fields needed for next steps
  2. Clear completed steps: In multi-turn, keep summaries not full results
  3. Split long chains: Handle in phases, each with 2-3 calls
def compress_tool_result(result: dict, keep_fields: list) -> dict:
    """Extract only necessary fields from verbose API responses"""
    return {k: result[k] for k in keep_fields if k in result}

Agentic Patterns

For complex agent workflows, consider frameworks that handle orchestration:

  • ReAct pattern: Reasoning → Action → Observation loop
  • LangGraph: Graph-based agent orchestration
  • CrewAI: Multi-agent collaboration

These abstract the tool loop and add planning, memory, and coordination between multiple agents.

Best Practices

Tool Design

  1. Single responsibility: Each tool does one thing well
  2. Clear boundaries: Obvious when to use tool A vs tool B
  3. Descriptive names: search_customer_orders not search
  4. Constrained parameters: Use enums, min/max, required fields
  5. Useful descriptions: Include examples, edge cases, return format

Schema Organization

For systems with many tools, use namespaces:

tools = [
    {"name": "crm.search_contacts", ...},
    {"name": "crm.update_contact", ...},
    {"name": "billing.get_invoice", ...},
    {"name": "billing.create_charge", ...}
]

Namespaces help the model distinguish between similar tools in different domains.

Limiting Tool Count

Models degrade with too many tools. Guidelines:

Tools Model Performance
1-10 Excellent
10-30 Good with clear descriptions
30-50 Degraded, consider tool search
50+ Use dynamic tool loading

For large tool sets, implement tool search: the model first calls a search tool to find relevant tools, then those tools are loaded for use.

Security

Never let the model construct arbitrary code or database queries. Treat tool arguments as untrusted input:

# BAD: SQL injection risk
def search_users(query: str):
    return db.execute(f"SELECT * FROM users WHERE name LIKE '%{query}%'")

# GOOD: Parameterized query
def search_users(query: str):
    return db.execute(
        "SELECT * FROM users WHERE name LIKE ?",
        [f"%{query}%"]
    )

Validate all inputs. Limit tool permissions. Log tool calls for audit.

Provider Comparison

Feature OpenAI Anthropic vLLM (open-source)
Strict schema enforcement Yes Yes Model-dependent
Parallel tool calls Yes Yes (Sonnet 4+) Yes
Streaming tool calls Yes Yes Yes
Tool choice control auto/none/required/specific auto/any/specific auto/none/specific
Built-in tools web_search, code_interpreter web_search, code_execution None

For teams building production systems, PremAI provides a unified API across providers with consistent tool calling behavior, plus fine-tuning capabilities to improve tool use accuracy on your specific functions.

Quick Reference

OpenAI Tool Schema

{
    "type": "function",
    "function": {
        "name": "tool_name",
        "description": "What the tool does",
        "parameters": {
            "type": "object",
            "properties": {...},
            "required": [...],
            "additionalProperties": False
        },
        "strict": True
    }
}

Anthropic Tool Schema

{
    "name": "tool_name",
    "description": "What the tool does",
    "input_schema": {
        "type": "object",
        "properties": {...},
        "required": [...]
    }
}

Execution Loop Checklist

  • [ ] Define tools with clear descriptions
  • [ ] Enable strict mode where available
  • [ ] Handle tool_calls in response
  • [ ] Execute tools with error handling
  • [ ] Return results in provider-specific format
  • [ ] Loop until stop_reason indicates completion
  • [ ] Implement timeout for runaway loops

Summary

Function calling extends LLMs from text generation into action. The core pattern is simple: define tools with JSON schemas, check if the model wants to call them, execute locally, return results.

Implementation complexity comes from:

  • Provider differences in schema format and response structure
  • Parallel execution and dependency management
  • Streaming tool calls for responsive UIs
  • Error handling and recovery
  • Context management in multi-step chains

Start simple. One tool, one execution loop, no streaming. Get that working reliably. Add parallel calls when you have independent operations. Add streaming when responsiveness matters. Build multi-step chains when tasks require it.

For enterprise deployments needing consistent behavior across providers, Prem Studio handles the provider abstraction and lets you fine-tune models specifically for your tool definitions, improving accuracy on your exact function schemas.

Subscribe to Prem AI

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
[email protected]
Subscribe