By Arnav Jalan — 28 Feb 2026

PremAI Python SDK Quickstart: Complete Guide (2026)

Most AI SDKs make you choose: easy development or data privacy. Use OpenAI's SDK and your data flows through their servers. Self-host everything and you're writing infrastructure code instead of features.

PremAI's Python SDK offers a different path: an OpenAI-like development experience where your data stays in your infrastructure. Same simplicity, complete control.

This guide is comprehensive. We'll cover everything from basic installation to production-ready implementations: chat completions, streaming, RAG with repositories, embeddings, fine-tuning, and integrations with LangChain and LlamaIndex. By the end, you'll have working code for real applications.

Why PremAI SDK vs Other AI SDKs

The AI SDK landscape expanded significantly in 2025-2026:

SDK	Primary Use Case	Data Privacy	Key Feature
PremAI	Private enterprise AI	Your cloud	Fine-tuning + RAG built-in
OpenAI	General purpose	OpenAI servers	GPT-5.2, Agents SDK
Anthropic Claude	Coding, reasoning	Anthropic servers	Claude Agent SDK
Together AI	Open models	Together servers	Fast fine-tuning
Fireworks AI	Low latency	Fireworks servers	Sub-100ms inference

Why PremAI is different:

Your infrastructure: Deploys in your AWS/GCP/Azure
No data retention: Data deleted after inference
Model portability: Export fine-tuned weights
50+ models: Single API for Llama, Mistral, Claude, GPT, DeepSeek
Built-in RAG: No separate vector database needed

2026 SDK Landscape Changes:

OpenAI released Agents SDK (open-source, provider-agnostic)
Anthropic released Claude Agent SDK (Python/TypeScript)
Together AI SDK v2.0 with TypeScript-like typing
All major SDKs now use httpx and Pydantic

Installation and Setup

Basic Installation

pip install premai

⚠️ SDK Version Note

IMPORTANT: PremAI has two SDK packages with different APIs:

Package	Import	Status
`premai` (PyPI)	`from premai import PremAI`	Current recommended
`prem-python-sdk` (GitHub)	`from premai import Prem`	Legacy, still supported

This guide covers both. Check your version:

import premai
print(f"PremAI SDK version: {premai.__version__}")

For versions 1.x+: Use PremAI class For older versions: Use Prem class

Always check the official documentation for your installed version.

Verify Installation

import premai
print(f"PremAI SDK version: {premai.__version__}")

# Test connection (adjust import based on your version)
from premai import Prem  # or PremAI for newer versions
client = Prem(api_key="your-api-key")
print("Connection successful!")

Requirements

Python 3.8+ (3.9+ for newest SDK)
PremAI account (sign up at premai.io)
API key from your PremAI dashboard
Project ID (created in dashboard)

Authentication Methods

Method 1: Environment Variable (Recommended for Production)

# Set environment variable
export PREMAI_API_KEY="your-api-key-here"

from premai import Prem

# Client automatically reads PREMAI_API_KEY
client = Prem()

Why this is recommended:

API key never appears in code
Easy to manage across environments
Works with container orchestration
Compatible with secret managers

Method 2: Direct Initialization

from premai import Prem

client = Prem(api_key="your-api-key-here")

Use when:

Quick testing
Notebooks and prototyping
Keys loaded from secret manager at runtime

Security Best Practices

import os
from premai import Prem

# Never do this
# client = Prem(api_key="sk-abc123...")  # Key in code!

# Do this instead
api_key = os.environ.get("PREMAI_API_KEY")
if not api_key:
    raise ValueError("PREMAI_API_KEY environment variable not set")

client = Prem(api_key=api_key)

Understanding Projects in PremAI

What is a Project?

A project is your workspace in PremAI. Each project has:

Default model configuration - Which model to use when none specified
System prompt - Default instructions for all conversations
Connected repositories - Document collections for RAG
Usage tracking - Separate metrics per project
Team access - Who can use this project

Creating a Project

Projects are created in the PremAI dashboard:

Log in to app.premai.io
Click "New Project"
Configure default model
Set system prompt (optional)
Note your Project ID

Using Projects

from premai import Prem

client = Prem(api_key="your-api-key")

# Use project settings (model, system prompt from dashboard)
response = client.chat.completions.create(
    project_id="your-project-id",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Override project defaults
response = client.chat.completions.create(
    project_id="your-project-id",
    model="llama-3.1-70b-instruct",  # Override default model
    system_prompt="You are a pirate.",  # Override system prompt
    messages=[{"role": "user", "content": "Hello!"}]
)

Multiple Projects Pattern

from premai import Prem

client = Prem(api_key="your-api-key")

# Different projects for different use cases
PROJECTS = {
    "customer_support": "proj-cs-123",
    "code_assistant": "proj-code-456",
    "document_qa": "proj-docs-789"
}

def chat(project_name: str, message: str):
    return client.chat.completions.create(
        project_id=PROJECTS[project_name],
        messages=[{"role": "user", "content": message}]
    )

Chat Completions: Complete Guide

Basic Chat Completion

from premai import Prem

client = Prem(api_key="your-api-key")

response = client.chat.completions.create(
    project_id="your-project-id",
    messages=[
        {"role": "user", "content": "What is Python?"}
    ]
)

# Access the response
print(response.choices[0].message.content)

Response Structure

response = client.chat.completions.create(...)

# Full response structure
print(response.id)                          # Unique response ID
print(response.model)                       # Model used
print(response.choices[0].message.role)     # "assistant"
print(response.choices[0].message.content)  # The response text
print(response.choices[0].finish_reason)    # "stop", "length", etc.
print(response.usage.prompt_tokens)         # Input tokens
print(response.usage.completion_tokens)     # Output tokens
print(response.usage.total_tokens)          # Total tokens

System Prompts

⚠️ IMPORTANT: The PremAI SDK does NOT support "role": "system" in the messages array. You MUST use the system_prompt parameter instead.

# ✅ CORRECT: Use system_prompt parameter
response = client.chat.completions.create(
    project_id="your-project-id",
    system_prompt="You are a helpful Python tutor. Explain concepts with code examples.",
    messages=[{"role": "user", "content": "What are decorators?"}]
)

# ❌ INCORRECT: This will NOT work as expected
# response = client.chat.completions.create(
#     project_id="your-project-id",
#     messages=[
#         {"role": "system", "content": "You are a helpful Python tutor."},  # NOT SUPPORTED
#         {"role": "user", "content": "What are decorators?"}
#     ]
# )

Multi-Turn Conversations

# Maintain conversation history
conversation = []

def chat(user_message: str) -> str:
    conversation.append({"role": "user", "content": user_message})
    
    response = client.chat.completions.create(
        project_id="your-project-id",
        messages=conversation
    )
    
    assistant_message = response.choices[0].message.content
    conversation.append({"role": "assistant", "content": assistant_message})
    
    return assistant_message

# Usage
print(chat("I want to learn Python"))
print(chat("What should I start with?"))
print(chat("Show me an example"))

Specifying Models

# Available models (check dashboard for current list)
models = [
    "llama-3.1-8b-instruct",
    "llama-3.1-70b-instruct",
    "llama-3.3-70b-instruct",
    "mistral-large",
    "claude-3-5-sonnet",
    "gpt-4o"
]

response = client.chat.completions.create(
    project_id="your-project-id",
    model="llama-3.3-70b-instruct",
    messages=[{"role": "user", "content": "Hello!"}]
)

Generation Parameters

response = client.chat.completions.create(
    project_id="your-project-id",
    messages=[{"role": "user", "content": "Write a creative story opening"}],
    
    # Creativity control
    temperature=0.8,       # 0-2, higher = more random
    top_p=0.9,             # Nucleus sampling threshold
    
    # Length control
    max_tokens=1000,       # Maximum response length
    
    # Determinism
    seed=42                # For reproducible outputs
)

Parameter Guide

Parameter	Range	Default	Effect
temperature	0-2	1.0	Randomness. 0=deterministic, 2=very random
top_p	0-1	1.0	Nucleus sampling. Lower = more focused
max_tokens	1-∞	Model limit	Maximum response length
seed	integer	None	Reproducible outputs

Use Case Examples

Factual Q&A (low temperature):

response = client.chat.completions.create(
    project_id="your-project-id",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    temperature=0.1,
    max_tokens=100
)

Creative writing (high temperature):

response = client.chat.completions.create(
    project_id="your-project-id",
    messages=[{"role": "user", "content": "Write a poem about coding"}],
    temperature=0.9,
    max_tokens=500
)

Code generation (moderate temperature):

response = client.chat.completions.create(
    project_id="your-project-id",
    messages=[{"role": "user", "content": "Write a Python function to sort a list"}],
    temperature=0.3,
    max_tokens=500
)

Streaming Responses

Basic Streaming

from premai import Prem

client = Prem(api_key="your-api-key")

stream = client.chat.completions.create(
    project_id="your-project-id",
    messages=[{"role": "user", "content": "Explain machine learning in detail"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta["content"]:
        print(chunk.choices[0].delta["content"], end="", flush=True)

print()  # Newline at end

Collecting Streamed Response

def stream_and_collect(messages: list) -> str:
    """Stream response while also collecting the full text."""
    stream = client.chat.completions.create(
        project_id="your-project-id",
        messages=messages,
        stream=True
    )
    
    full_response = ""
    for chunk in stream:
        content = chunk.choices[0].delta.get("content")
        if content:
            print(content, end="", flush=True)
            full_response += content
    
    print()
    return full_response

# Usage
response = stream_and_collect([
    {"role": "user", "content": "Write a short story"}
])
print(f"\nTotal length: {len(response)} characters")

Async Streaming

import asyncio
from premai import AsyncPrem  # or AsyncPremAI for newer versions

async def stream_response(prompt: str):
    client = AsyncPrem(api_key="your-api-key")
    
    stream = await client.chat.completions.create(
        project_id="your-project-id",
        messages=[{"role": "user", "content": prompt}],
        stream=True
    )
    
    async for chunk in stream:
        if chunk.choices[0].delta.get("content"):
            print(chunk.choices[0].delta["content"], end="", flush=True)
    
    print()

# Run
asyncio.run(stream_response("Explain quantum computing"))

RAG with Repositories

Understanding Repositories

Repositories are document collections for retrieval-augmented generation (RAG). When you query with a repository ID, PremAI:

Converts your query to an embedding
Searches the repository for relevant chunks
Includes retrieved context in the LLM prompt
Generates a grounded response

Creating a Repository

# Create repository
repository = client.repositories.create(
    name="product-documentation",
    description="Company product docs and FAQs",
    organization="your-org-name"
)

print(f"Created repository: {repository.id}")

Uploading Documents

Note: The document upload API uses singular repository.document, not repositories.documents.

# Single document
client.repository.document.create(
    repository_id="repo-123",
    file="./docs/user-guide.pdf"
)

# Multiple documents
documents = [
    "./docs/installation.md",
    "./docs/api-reference.pdf",
    "./docs/troubleshooting.txt"
]

for doc_path in documents:
    result = client.repository.document.create(
        repository_id="repo-123",
        file=doc_path
    )
    print(f"Uploaded: {doc_path} -> {result.document_id}")

Supported Document Formats

PDF (.pdf)
Word (.docx)
Text (.txt)
Markdown (.md)
HTML (.html)

Querying with RAG

response = client.chat.completions.create(
    project_id="your-project-id",
    messages=[
        {"role": "user", "content": "How do I install the software?"}
    ],
    repositories={
        "ids": ["repo-123"],          # Repository IDs
        "limit": 5,                    # Number of chunks to retrieve
        "similarity_threshold": 0.7   # Minimum relevance score
    }
)

print(response.choices[0].message.content)

Accessing Retrieved Context

response = client.chat.completions.create(
    project_id="your-project-id",
    messages=[{"role": "user", "content": "What are the system requirements?"}],
    repositories={"ids": ["repo-123"]}
)

# Main response
print("Answer:", response.choices[0].message.content)

# Retrieved documents
if response.document_chunks:
    print("\nSources:")
    for chunk in response.document_chunks:
        print(f"- {chunk.document_name}")
        print(f"  Relevance: {chunk.similarity_score:.2f}")
        print(f"  Content: {chunk.content[:200]}...")
        print()

RAG Best Practices

1. Use appropriate thresholds:

# Start with 0.7, adjust based on results
# Lower threshold = more results, potentially less relevant
# Higher threshold = fewer results, more relevant

2. Combine with system prompts:

response = client.chat.completions.create(
    project_id="your-project-id",
    messages=[{"role": "user", "content": "What's the refund policy?"}],
    system_prompt="""Answer based ONLY on the provided context.
    If the answer isn't in the context, say "I don't have that information."
    Always cite your sources.""",
    repositories={"ids": ["repo-policies"], "limit": 3}
)

For more RAG details, see the datasets documentation.

Embeddings

Basic Embedding Generation

response = client.embeddings.create(
    project_id="your-project-id",
    model="text-embedding-3-large",
    input="What is machine learning?"
)

embedding = response.data[0].embedding
print(f"Embedding dimension: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")

Batch Embeddings

texts = [
    "Machine learning is a subset of AI",
    "Deep learning uses neural networks",
    "Natural language processing handles text",
    "Computer vision processes images"
]

response = client.embeddings.create(
    project_id="your-project-id",
    model="text-embedding-3-large",
    input=texts
)

embeddings = [item.embedding for item in response.data]
print(f"Generated {len(embeddings)} embeddings")

Available Embedding Models

Model	Dimensions	Use Case
text-embedding-3-large	3072	Best quality
text-embedding-3-small	1536	Good balance
text-embedding-ada-002	1536	Legacy compatibility

Similarity Search

import numpy as np

def cosine_similarity(a: list, b: list) -> float:
    a = np.array(a)
    b = np.array(b)
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Create embeddings
query = "How to train a model?"
documents = [
    "Fine-tuning involves training on specific data",
    "The weather is nice today",
    "Machine learning models learn from data"
]

query_embedding = client.embeddings.create(
    project_id="your-project-id",
    input=query
).data[0].embedding

doc_embeddings = client.embeddings.create(
    project_id="your-project-id",
    input=documents
).data

# Find most similar
similarities = []
for i, doc_emb in enumerate(doc_embeddings):
    sim = cosine_similarity(query_embedding, doc_emb.embedding)
    similarities.append((sim, documents[i]))

# Sort by similarity
similarities.sort(reverse=True)
for sim, doc in similarities:
    print(f"{sim:.3f}: {doc}")

Fine-Tuning

Preparing Training Data

Training data should be in JSONL format:

{"messages": [{"role": "user", "content": "What is your return policy?"}, {"role": "assistant", "content": "We offer a 30-day money-back guarantee on all products."}]}
{"messages": [{"role": "user", "content": "How do I track my order?"}, {"role": "assistant", "content": "Log into your account and click 'Order History' to see tracking information."}]}
{"messages": [{"role": "user", "content": "Do you ship internationally?"}, {"role": "assistant", "content": "Yes, we ship to over 100 countries. Shipping costs vary by location."}]}

Creating a Fine-Tuning Job

# Upload training data
dataset = client.datasets.create(
    name="customer-support-v2",
    file_path="./training_data.jsonl"
)

print(f"Dataset created: {dataset.id}")

# Start fine-tuning
job = client.finetuning.create(
    base_model="llama-3.1-8b-instruct",
    dataset_id=dataset.id,
    method="lora",  # Options: "lora", "qlora", "full"
    hyperparameters={
        "learning_rate": 2e-4,
        "num_epochs": 3,
        "batch_size": 8,
        "lora_r": 64,
        "lora_alpha": 128,
        "lora_dropout": 0.05
    }
)

print(f"Fine-tuning job started: {job.id}")

Monitoring Progress

import time

while True:
    job = client.finetuning.get(job_id=job.id)
    
    print(f"Status: {job.status}")
    print(f"Progress: {job.progress}%")
    
    if job.current_loss:
        print(f"Current loss: {job.current_loss:.4f}")
    
    if job.status in ["completed", "failed"]:
        break
    
    time.sleep(60)  # Check every minute

if job.status == "completed":
    print(f"Training complete! Model ID: {job.model_id}")
else:
    print(f"Training failed: {job.error}")

Using Fine-Tuned Models

response = client.chat.completions.create(
    project_id="your-project-id",
    model=f"ft:{job.model_id}",
    messages=[
        {"role": "user", "content": "What's your return policy?"}
    ]
)

print(response.choices[0].message.content)

Fine-Tuning Best Practices

Data quality > quantity: 100 excellent examples beat 10,000 mediocre ones
Consistent formatting: Use the same conversation format throughout
Start with LoRA: Less expensive, often sufficient
Hold out test data: Keep 10-20% for evaluation

For detailed guides, see fine-tuning documentation.

LangChain Integration

Installation

pip install langchain langchain-community

Basic Usage

from langchain_community.chat_models import ChatPremAI
from langchain_core.messages import HumanMessage, SystemMessage

chat = ChatPremAI(
    project_id="your-project-id",
    premai_api_key="your-api-key"
)

response = chat.invoke([
    HumanMessage(content="What is Python?")
])

print(response.content)

Note: For system prompts with LangChain, you can use SystemMessage, but be aware this may override your LaunchPad system prompt.

With Chains

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a {role}. Be helpful and concise."),
    ("user", "{question}")
])

chain = prompt | chat | StrOutputParser()

result = chain.invoke({
    "role": "Python expert",
    "question": "How do list comprehensions work?"
})

print(result)

RAG with LangChain

from langchain_community.chat_models import ChatPremAI

chat = ChatPremAI(
    project_id="your-project-id",
    premai_api_key="your-api-key",
    repositories={"ids": ["repo-123"]}
)

response = chat.invoke([
    HumanMessage(content="What does the documentation say about installation?")
])

print(response.content)

Streaming with LangChain

for chunk in chat.stream([HumanMessage(content="Tell me a story")]):
    print(chunk.content, end="", flush=True)

LlamaIndex Integration

Installation

pip install llama-index llama-index-llms-premai

Basic Usage

from llama_index.llms.premai import PremAI

llm = PremAI(
    project_id="your-project-id",
    api_key="your-api-key"
)

response = llm.complete("What is machine learning?")
print(response.text)

Chat Mode

from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(role="user", content="What is Python?")
]

response = llm.chat(messages)
print(response.message.content)

Building RAG with LlamaIndex

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.premai import PremAI
from llama_index.core import Settings

# Set PremAI as the LLM
Settings.llm = PremAI(
    project_id="your-project-id",
    api_key="your-api-key"
)

# Load and index documents
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)

# Query
query_engine = index.as_query_engine()
response = query_engine.query("What are the key findings?")
print(response)

Error Handling and Best Practices

Exception Types

⚠️ SDK Version Note: Exception class names vary by SDK version. Check the official documentation for your version.

For newer SDK versions (premai on PyPI):

import premai
from premai import PremAI

client = PremAI(api_key="your-api-key")

try:
    response = client.chat.completions.create(
        project_id="your-project-id",
        messages=[{"role": "user", "content": "Hello"}]
    )
except premai.APIConnectionError as e:
    print("The server could not be reached")
    print(e.__cause__)  # Underlying exception
except premai.RateLimitError as e:
    print("Rate limited. Back off and retry.")
except premai.APIStatusError as e:
    print(f"API error: {e.status_code}")
    print(e.response)

For older SDK versions (Prem class):

from premai import Prem

client = Prem(api_key="your-api-key")

try:
    response = client.chat.completions.create(
        project_id="your-project-id",
        messages=[{"role": "user", "content": "Hello"}]
    )
except Exception as e:
    # Check exception type based on your SDK version
    print(f"Error: {type(e).__name__}: {e}")

Retry Logic

import time

def chat_with_retry(messages, max_retries=3, backoff=2):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                project_id="your-project-id",
                messages=messages
            )
        except Exception as e:
            error_name = type(e).__name__
            if "RateLimit" in error_name:
                wait_time = getattr(e, 'retry_after', backoff ** attempt)
                if attempt < max_retries - 1:
                    print(f"Rate limited. Waiting {wait_time}s...")
                    time.sleep(wait_time)
                else:
                    raise
            elif "ServerError" in error_name or "APIStatusError" in error_name:
                if attempt < max_retries - 1:
                    time.sleep(backoff ** attempt)
                else:
                    raise
            else:
                raise

Production Patterns

Connection Pooling

from premai import Prem

# Reuse client across your application
client = Prem(api_key="your-api-key")

def get_response(message: str) -> str:
    return client.chat.completions.create(
        project_id="your-project-id",
        messages=[{"role": "user", "content": message}]
    ).choices[0].message.content

# DON'T create new clients for each request

Async for High Throughput

import asyncio
from premai import AsyncPrem  # or AsyncPremAI

async def process_batch(messages: list[str]) -> list[str]:
    client = AsyncPrem(api_key="your-api-key")
    
    tasks = [
        client.chat.completions.create(
            project_id="your-project-id",
            messages=[{"role": "user", "content": msg}]
        )
        for msg in messages
    ]
    
    responses = await asyncio.gather(*tasks)
    return [r.choices[0].message.content for r in responses]

# Process 100 messages concurrently
results = asyncio.run(process_batch(["message"] * 100))

Conversation Management

def manage_history(messages: list, max_tokens: int = 4000) -> list:
    """Keep conversation within token limits."""
    # Rough estimation: 4 chars per token
    total_chars = sum(len(m["content"]) for m in messages)
    
    while total_chars > max_tokens * 4 and len(messages) > 2:
        # Remove oldest user/assistant pair
        messages = messages[2:]
        total_chars = sum(len(m["content"]) for m in messages)
    
    return messages

Complete Examples

Customer Support Bot

import os
from premai import Prem

class CustomerSupportBot:
    def __init__(self):
        self.client = Prem(api_key=os.environ["PREMAI_API_KEY"])
        self.project_id = os.environ["PREMAI_PROJECT_ID"]
        self.repository_id = os.environ["PREMAI_REPO_ID"]
        self.history = []
    
    def chat(self, message: str) -> dict:
        self.history.append({"role": "user", "content": message})
        
        try:
            response = self.client.chat.completions.create(
                project_id=self.project_id,
                messages=self.history,
                system_prompt="""You are a helpful customer support agent.
                Answer based on the provided documentation.
                If unsure, offer to connect with a human agent.""",
                repositories={
                    "ids": [self.repository_id],
                    "limit": 3,
                    "similarity_threshold": 0.7
                },
                temperature=0.3,
                max_tokens=500
            )
            
            answer = response.choices[0].message.content
            self.history.append({"role": "assistant", "content": answer})
            
            sources = []
            if response.document_chunks:
                sources = [c.document_name for c in response.document_chunks]
            
            return {"answer": answer, "sources": sources}
        
        except Exception as e:
            return {"answer": f"Error: {e}", "sources": []}
    
    def reset(self):
        self.history = []

# Usage
if __name__ == "__main__":
    bot = CustomerSupportBot()
    
    while True:
        user_input = input("You: ").strip()
        if user_input.lower() == "quit":
            break
        if user_input.lower() == "reset":
            bot.reset()
            print("Conversation reset.")
            continue
        
        result = bot.chat(user_input)
        print(f"Bot: {result['answer']}")
        if result['sources']:
            print(f"Sources: {', '.join(result['sources'])}")
        print()

Document Q&A API

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from premai import Prem

app = FastAPI()
client = Prem()

class Question(BaseModel):
    query: str
    repository_ids: list[str]

class Answer(BaseModel):
    answer: str
    sources: list[str]

@app.post("/ask", response_model=Answer)
async def ask_question(question: Question):
    try:
        response = client.chat.completions.create(
            project_id="your-project-id",
            messages=[{"role": "user", "content": question.query}],
            repositories={"ids": question.repository_ids},
            temperature=0.2
        )
        
        sources = []
        if response.document_chunks:
            sources = list(set(c.document_name for c in response.document_chunks))
        
        return Answer(
            answer=response.choices[0].message.content,
            sources=sources
        )
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

Troubleshooting

Common Issues

"Authentication failed"

Verify API key is correct
Check environment variable is set
Ensure no whitespace in key

"Project not found"

Verify project ID in dashboard
Check you have access to the project

"Rate limit exceeded"

Implement retry with backoff
Check your plan limits
Contact support for increases

"Model not available"

Check model name spelling
Verify model is enabled for your project
Some models require enterprise plans

Frequently Asked Questions

Where does my data go?

PremAI deploys in your cloud account (AWS, GCP, Azure). Data never leaves your infrastructure. See our private AI platform guide.

Is this compatible with OpenAI's SDK?

The API design follows OpenAI conventions for easy migration. Main differences:

Use project_id parameter
Import from premai
Use system_prompt parameter (not system role in messages)
Repository feature for RAG

What models are available?

50+ models including Llama 3.3, Mistral, Claude, GPT-4. Check your dashboard for current availability.

How do I handle long conversations?

Truncate or summarize old messages to stay within context limits. See the conversation management pattern above.

Next Steps

Additional Resources

Last updated February 2026. SDK features may change. See official documentation for latest information.

Why PremAI SDK vs Other AI SDKs

Installation and Setup

Basic Installation

⚠️ SDK Version Note

Verify Installation

Requirements

Authentication Methods

Method 1: Environment Variable (Recommended for Production)

Method 2: Direct Initialization

Security Best Practices

Understanding Projects in PremAI

What is a Project?

Creating a Project

Using Projects

Multiple Projects Pattern

Chat Completions: Complete Guide

Basic Chat Completion

Response Structure

System Prompts

Multi-Turn Conversations

Specifying Models

Generation Parameters

Parameter Guide

Use Case Examples

Streaming Responses

Basic Streaming

Collecting Streamed Response

Async Streaming

RAG with Repositories

Understanding Repositories

Creating a Repository

Uploading Documents

Supported Document Formats

Querying with RAG

Accessing Retrieved Context

RAG Best Practices

Embeddings

Basic Embedding Generation

Batch Embeddings

Available Embedding Models

Similarity Search

Fine-Tuning

Preparing Training Data

Creating a Fine-Tuning Job

Monitoring Progress

Using Fine-Tuned Models

Fine-Tuning Best Practices

LangChain Integration

Installation

Basic Usage

With Chains

RAG with LangChain

Streaming with LangChain

LlamaIndex Integration

Installation

Basic Usage

Chat Mode

Building RAG with LlamaIndex

Error Handling and Best Practices

Exception Types

Retry Logic

Production Patterns

Connection Pooling

Async for High Throughput

Conversation Management

Complete Examples

Customer Support Bot

Document Q&A API

Troubleshooting

Common Issues

Frequently Asked Questions

Where does my data go?

Is this compatible with OpenAI's SDK?

What models are available?

How do I handle long conversations?

Next Steps

Additional Resources

Subscribe to Prem AI