PremAI Python SDK Quickstart: Complete Guide (2026)

PremAI Python SDK Quickstart: Complete Guide (2026)

Most AI SDKs make you choose: easy development or data privacy. Use OpenAI's SDK and your data flows through their servers. Self-host everything and you're writing infrastructure code instead of features.

PremAI's Python SDK offers a different path: an OpenAI-like development experience where your data stays in your infrastructure. Same simplicity, complete control.

This guide is comprehensive. We'll cover everything from basic installation to production-ready implementations: chat completions, streaming, RAG with repositories, embeddings, fine-tuning, and integrations with LangChain and LlamaIndex. By the end, you'll have working code for real applications.

Why PremAI SDK vs Other AI SDKs

The AI SDK landscape expanded significantly in 2025-2026:

SDK Primary Use Case Data Privacy Key Feature
PremAI Private enterprise AI Your cloud Fine-tuning + RAG built-in
OpenAI General purpose OpenAI servers GPT-5.2, Agents SDK
Anthropic Claude Coding, reasoning Anthropic servers Claude Agent SDK
Together AI Open models Together servers Fast fine-tuning
Fireworks AI Low latency Fireworks servers Sub-100ms inference

Why PremAI is different:

  • Your infrastructure: Deploys in your AWS/GCP/Azure
  • No data retention: Data deleted after inference
  • Model portability: Export fine-tuned weights
  • 50+ models: Single API for Llama, Mistral, Claude, GPT, DeepSeek
  • Built-in RAG: No separate vector database needed

2026 SDK Landscape Changes:

  • OpenAI released Agents SDK (open-source, provider-agnostic)
  • Anthropic released Claude Agent SDK (Python/TypeScript)
  • Together AI SDK v2.0 with TypeScript-like typing
  • All major SDKs now use httpx and Pydantic

Installation and Setup

Basic Installation

pip install premai

⚠️ SDK Version Note

IMPORTANT: PremAI has two SDK packages with different APIs:

Package Import Status
premai (PyPI) from premai import PremAI Current recommended
prem-python-sdk (GitHub) from premai import Prem Legacy, still supported

This guide covers both. Check your version:

import premai
print(f"PremAI SDK version: {premai.__version__}")

For versions 1.x+: Use PremAI class For older versions: Use Prem class

Always check the official documentation for your installed version.

Verify Installation

import premai
print(f"PremAI SDK version: {premai.__version__}")

# Test connection (adjust import based on your version)
from premai import Prem  # or PremAI for newer versions
client = Prem(api_key="your-api-key")
print("Connection successful!")

Requirements

  • Python 3.8+ (3.9+ for newest SDK)
  • PremAI account (sign up at premai.io)
  • API key from your PremAI dashboard
  • Project ID (created in dashboard)

Authentication Methods

# Set environment variable
export PREMAI_API_KEY="your-api-key-here"
from premai import Prem

# Client automatically reads PREMAI_API_KEY
client = Prem()

Why this is recommended:

  • API key never appears in code
  • Easy to manage across environments
  • Works with container orchestration
  • Compatible with secret managers

Method 2: Direct Initialization

from premai import Prem

client = Prem(api_key="your-api-key-here")

Use when:

  • Quick testing
  • Notebooks and prototyping
  • Keys loaded from secret manager at runtime

Security Best Practices

import os
from premai import Prem

# Never do this
# client = Prem(api_key="sk-abc123...")  # Key in code!

# Do this instead
api_key = os.environ.get("PREMAI_API_KEY")
if not api_key:
    raise ValueError("PREMAI_API_KEY environment variable not set")

client = Prem(api_key=api_key)

Understanding Projects in PremAI

What is a Project?

A project is your workspace in PremAI. Each project has:

  • Default model configuration - Which model to use when none specified
  • System prompt - Default instructions for all conversations
  • Connected repositories - Document collections for RAG
  • Usage tracking - Separate metrics per project
  • Team access - Who can use this project

Creating a Project

Projects are created in the PremAI dashboard:

  1. Log in to app.premai.io
  2. Click "New Project"
  3. Configure default model
  4. Set system prompt (optional)
  5. Note your Project ID

Using Projects

from premai import Prem

client = Prem(api_key="your-api-key")

# Use project settings (model, system prompt from dashboard)
response = client.chat.completions.create(
    project_id="your-project-id",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Override project defaults
response = client.chat.completions.create(
    project_id="your-project-id",
    model="llama-3.1-70b-instruct",  # Override default model
    system_prompt="You are a pirate.",  # Override system prompt
    messages=[{"role": "user", "content": "Hello!"}]
)

Multiple Projects Pattern

from premai import Prem

client = Prem(api_key="your-api-key")

# Different projects for different use cases
PROJECTS = {
    "customer_support": "proj-cs-123",
    "code_assistant": "proj-code-456",
    "document_qa": "proj-docs-789"
}

def chat(project_name: str, message: str):
    return client.chat.completions.create(
        project_id=PROJECTS[project_name],
        messages=[{"role": "user", "content": message}]
    )

Chat Completions: Complete Guide

Basic Chat Completion

from premai import Prem

client = Prem(api_key="your-api-key")

response = client.chat.completions.create(
    project_id="your-project-id",
    messages=[
        {"role": "user", "content": "What is Python?"}
    ]
)

# Access the response
print(response.choices[0].message.content)

Response Structure

response = client.chat.completions.create(...)

# Full response structure
print(response.id)                          # Unique response ID
print(response.model)                       # Model used
print(response.choices[0].message.role)     # "assistant"
print(response.choices[0].message.content)  # The response text
print(response.choices[0].finish_reason)    # "stop", "length", etc.
print(response.usage.prompt_tokens)         # Input tokens
print(response.usage.completion_tokens)     # Output tokens
print(response.usage.total_tokens)          # Total tokens

System Prompts

⚠️ IMPORTANT: The PremAI SDK does NOT support "role": "system" in the messages array. You MUST use the system_prompt parameter instead.
# ✅ CORRECT: Use system_prompt parameter
response = client.chat.completions.create(
    project_id="your-project-id",
    system_prompt="You are a helpful Python tutor. Explain concepts with code examples.",
    messages=[{"role": "user", "content": "What are decorators?"}]
)

# ❌ INCORRECT: This will NOT work as expected
# response = client.chat.completions.create(
#     project_id="your-project-id",
#     messages=[
#         {"role": "system", "content": "You are a helpful Python tutor."},  # NOT SUPPORTED
#         {"role": "user", "content": "What are decorators?"}
#     ]
# )

Multi-Turn Conversations

# Maintain conversation history
conversation = []

def chat(user_message: str) -> str:
    conversation.append({"role": "user", "content": user_message})
    
    response = client.chat.completions.create(
        project_id="your-project-id",
        messages=conversation
    )
    
    assistant_message = response.choices[0].message.content
    conversation.append({"role": "assistant", "content": assistant_message})
    
    return assistant_message

# Usage
print(chat("I want to learn Python"))
print(chat("What should I start with?"))
print(chat("Show me an example"))

Specifying Models

# Available models (check dashboard for current list)
models = [
    "llama-3.1-8b-instruct",
    "llama-3.1-70b-instruct",
    "llama-3.3-70b-instruct",
    "mistral-large",
    "claude-3-5-sonnet",
    "gpt-4o"
]

response = client.chat.completions.create(
    project_id="your-project-id",
    model="llama-3.3-70b-instruct",
    messages=[{"role": "user", "content": "Hello!"}]
)

Generation Parameters

response = client.chat.completions.create(
    project_id="your-project-id",
    messages=[{"role": "user", "content": "Write a creative story opening"}],
    
    # Creativity control
    temperature=0.8,       # 0-2, higher = more random
    top_p=0.9,             # Nucleus sampling threshold
    
    # Length control
    max_tokens=1000,       # Maximum response length
    
    # Determinism
    seed=42                # For reproducible outputs
)

Parameter Guide

Parameter Range Default Effect
temperature 0-2 1.0 Randomness. 0=deterministic, 2=very random
top_p 0-1 1.0 Nucleus sampling. Lower = more focused
max_tokens 1-∞ Model limit Maximum response length
seed integer None Reproducible outputs

Use Case Examples

Factual Q&A (low temperature):

response = client.chat.completions.create(
    project_id="your-project-id",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    temperature=0.1,
    max_tokens=100
)

Creative writing (high temperature):

response = client.chat.completions.create(
    project_id="your-project-id",
    messages=[{"role": "user", "content": "Write a poem about coding"}],
    temperature=0.9,
    max_tokens=500
)

Code generation (moderate temperature):

response = client.chat.completions.create(
    project_id="your-project-id",
    messages=[{"role": "user", "content": "Write a Python function to sort a list"}],
    temperature=0.3,
    max_tokens=500
)

Streaming Responses

Basic Streaming

from premai import Prem

client = Prem(api_key="your-api-key")

stream = client.chat.completions.create(
    project_id="your-project-id",
    messages=[{"role": "user", "content": "Explain machine learning in detail"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta["content"]:
        print(chunk.choices[0].delta["content"], end="", flush=True)

print()  # Newline at end

Collecting Streamed Response

def stream_and_collect(messages: list) -> str:
    """Stream response while also collecting the full text."""
    stream = client.chat.completions.create(
        project_id="your-project-id",
        messages=messages,
        stream=True
    )
    
    full_response = ""
    for chunk in stream:
        content = chunk.choices[0].delta.get("content")
        if content:
            print(content, end="", flush=True)
            full_response += content
    
    print()
    return full_response

# Usage
response = stream_and_collect([
    {"role": "user", "content": "Write a short story"}
])
print(f"\nTotal length: {len(response)} characters")

Async Streaming

import asyncio
from premai import AsyncPrem  # or AsyncPremAI for newer versions

async def stream_response(prompt: str):
    client = AsyncPrem(api_key="your-api-key")
    
    stream = await client.chat.completions.create(
        project_id="your-project-id",
        messages=[{"role": "user", "content": prompt}],
        stream=True
    )
    
    async for chunk in stream:
        if chunk.choices[0].delta.get("content"):
            print(chunk.choices[0].delta["content"], end="", flush=True)
    
    print()

# Run
asyncio.run(stream_response("Explain quantum computing"))

RAG with Repositories

Understanding Repositories

Repositories are document collections for retrieval-augmented generation (RAG). When you query with a repository ID, PremAI:

  1. Converts your query to an embedding
  2. Searches the repository for relevant chunks
  3. Includes retrieved context in the LLM prompt
  4. Generates a grounded response

Creating a Repository

# Create repository
repository = client.repositories.create(
    name="product-documentation",
    description="Company product docs and FAQs",
    organization="your-org-name"
)

print(f"Created repository: {repository.id}")

Uploading Documents

Note: The document upload API uses singular repository.document, not repositories.documents.
# Single document
client.repository.document.create(
    repository_id="repo-123",
    file="./docs/user-guide.pdf"
)

# Multiple documents
documents = [
    "./docs/installation.md",
    "./docs/api-reference.pdf",
    "./docs/troubleshooting.txt"
]

for doc_path in documents:
    result = client.repository.document.create(
        repository_id="repo-123",
        file=doc_path
    )
    print(f"Uploaded: {doc_path} -> {result.document_id}")

Supported Document Formats

  • PDF (.pdf)
  • Word (.docx)
  • Text (.txt)
  • Markdown (.md)
  • HTML (.html)

Querying with RAG

response = client.chat.completions.create(
    project_id="your-project-id",
    messages=[
        {"role": "user", "content": "How do I install the software?"}
    ],
    repositories={
        "ids": ["repo-123"],          # Repository IDs
        "limit": 5,                    # Number of chunks to retrieve
        "similarity_threshold": 0.7   # Minimum relevance score
    }
)

print(response.choices[0].message.content)

Accessing Retrieved Context

response = client.chat.completions.create(
    project_id="your-project-id",
    messages=[{"role": "user", "content": "What are the system requirements?"}],
    repositories={"ids": ["repo-123"]}
)

# Main response
print("Answer:", response.choices[0].message.content)

# Retrieved documents
if response.document_chunks:
    print("\nSources:")
    for chunk in response.document_chunks:
        print(f"- {chunk.document_name}")
        print(f"  Relevance: {chunk.similarity_score:.2f}")
        print(f"  Content: {chunk.content[:200]}...")
        print()

RAG Best Practices

1. Use appropriate thresholds:

# Start with 0.7, adjust based on results
# Lower threshold = more results, potentially less relevant
# Higher threshold = fewer results, more relevant

2. Combine with system prompts:

response = client.chat.completions.create(
    project_id="your-project-id",
    messages=[{"role": "user", "content": "What's the refund policy?"}],
    system_prompt="""Answer based ONLY on the provided context.
    If the answer isn't in the context, say "I don't have that information."
    Always cite your sources.""",
    repositories={"ids": ["repo-policies"], "limit": 3}
)

For more RAG details, see the datasets documentation.


Embeddings

Basic Embedding Generation

response = client.embeddings.create(
    project_id="your-project-id",
    model="text-embedding-3-large",
    input="What is machine learning?"
)

embedding = response.data[0].embedding
print(f"Embedding dimension: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")

Batch Embeddings

texts = [
    "Machine learning is a subset of AI",
    "Deep learning uses neural networks",
    "Natural language processing handles text",
    "Computer vision processes images"
]

response = client.embeddings.create(
    project_id="your-project-id",
    model="text-embedding-3-large",
    input=texts
)

embeddings = [item.embedding for item in response.data]
print(f"Generated {len(embeddings)} embeddings")

Available Embedding Models

Model Dimensions Use Case
text-embedding-3-large 3072 Best quality
text-embedding-3-small 1536 Good balance
text-embedding-ada-002 1536 Legacy compatibility
import numpy as np

def cosine_similarity(a: list, b: list) -> float:
    a = np.array(a)
    b = np.array(b)
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Create embeddings
query = "How to train a model?"
documents = [
    "Fine-tuning involves training on specific data",
    "The weather is nice today",
    "Machine learning models learn from data"
]

query_embedding = client.embeddings.create(
    project_id="your-project-id",
    input=query
).data[0].embedding

doc_embeddings = client.embeddings.create(
    project_id="your-project-id",
    input=documents
).data

# Find most similar
similarities = []
for i, doc_emb in enumerate(doc_embeddings):
    sim = cosine_similarity(query_embedding, doc_emb.embedding)
    similarities.append((sim, documents[i]))

# Sort by similarity
similarities.sort(reverse=True)
for sim, doc in similarities:
    print(f"{sim:.3f}: {doc}")

Fine-Tuning

Preparing Training Data

Training data should be in JSONL format:

{"messages": [{"role": "user", "content": "What is your return policy?"}, {"role": "assistant", "content": "We offer a 30-day money-back guarantee on all products."}]}
{"messages": [{"role": "user", "content": "How do I track my order?"}, {"role": "assistant", "content": "Log into your account and click 'Order History' to see tracking information."}]}
{"messages": [{"role": "user", "content": "Do you ship internationally?"}, {"role": "assistant", "content": "Yes, we ship to over 100 countries. Shipping costs vary by location."}]}

Creating a Fine-Tuning Job

# Upload training data
dataset = client.datasets.create(
    name="customer-support-v2",
    file_path="./training_data.jsonl"
)

print(f"Dataset created: {dataset.id}")

# Start fine-tuning
job = client.finetuning.create(
    base_model="llama-3.1-8b-instruct",
    dataset_id=dataset.id,
    method="lora",  # Options: "lora", "qlora", "full"
    hyperparameters={
        "learning_rate": 2e-4,
        "num_epochs": 3,
        "batch_size": 8,
        "lora_r": 64,
        "lora_alpha": 128,
        "lora_dropout": 0.05
    }
)

print(f"Fine-tuning job started: {job.id}")

Monitoring Progress

import time

while True:
    job = client.finetuning.get(job_id=job.id)
    
    print(f"Status: {job.status}")
    print(f"Progress: {job.progress}%")
    
    if job.current_loss:
        print(f"Current loss: {job.current_loss:.4f}")
    
    if job.status in ["completed", "failed"]:
        break
    
    time.sleep(60)  # Check every minute

if job.status == "completed":
    print(f"Training complete! Model ID: {job.model_id}")
else:
    print(f"Training failed: {job.error}")

Using Fine-Tuned Models

response = client.chat.completions.create(
    project_id="your-project-id",
    model=f"ft:{job.model_id}",
    messages=[
        {"role": "user", "content": "What's your return policy?"}
    ]
)

print(response.choices[0].message.content)

Fine-Tuning Best Practices

  • Data quality > quantity: 100 excellent examples beat 10,000 mediocre ones
  • Consistent formatting: Use the same conversation format throughout
  • Start with LoRA: Less expensive, often sufficient
  • Hold out test data: Keep 10-20% for evaluation

For detailed guides, see fine-tuning documentation.


LangChain Integration

Installation

pip install langchain langchain-community

Basic Usage

from langchain_community.chat_models import ChatPremAI
from langchain_core.messages import HumanMessage, SystemMessage

chat = ChatPremAI(
    project_id="your-project-id",
    premai_api_key="your-api-key"
)

response = chat.invoke([
    HumanMessage(content="What is Python?")
])

print(response.content)
Note: For system prompts with LangChain, you can use SystemMessage, but be aware this may override your LaunchPad system prompt.

With Chains

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a {role}. Be helpful and concise."),
    ("user", "{question}")
])

chain = prompt | chat | StrOutputParser()

result = chain.invoke({
    "role": "Python expert",
    "question": "How do list comprehensions work?"
})

print(result)

RAG with LangChain

from langchain_community.chat_models import ChatPremAI

chat = ChatPremAI(
    project_id="your-project-id",
    premai_api_key="your-api-key",
    repositories={"ids": ["repo-123"]}
)

response = chat.invoke([
    HumanMessage(content="What does the documentation say about installation?")
])

print(response.content)

Streaming with LangChain

for chunk in chat.stream([HumanMessage(content="Tell me a story")]):
    print(chunk.content, end="", flush=True)

LlamaIndex Integration

Installation

pip install llama-index llama-index-llms-premai

Basic Usage

from llama_index.llms.premai import PremAI

llm = PremAI(
    project_id="your-project-id",
    api_key="your-api-key"
)

response = llm.complete("What is machine learning?")
print(response.text)

Chat Mode

from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(role="user", content="What is Python?")
]

response = llm.chat(messages)
print(response.message.content)

Building RAG with LlamaIndex

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.premai import PremAI
from llama_index.core import Settings

# Set PremAI as the LLM
Settings.llm = PremAI(
    project_id="your-project-id",
    api_key="your-api-key"
)

# Load and index documents
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)

# Query
query_engine = index.as_query_engine()
response = query_engine.query("What are the key findings?")
print(response)

Error Handling and Best Practices

Exception Types

⚠️ SDK Version Note: Exception class names vary by SDK version. Check the official documentation for your version.

For newer SDK versions (premai on PyPI):

import premai
from premai import PremAI

client = PremAI(api_key="your-api-key")

try:
    response = client.chat.completions.create(
        project_id="your-project-id",
        messages=[{"role": "user", "content": "Hello"}]
    )
except premai.APIConnectionError as e:
    print("The server could not be reached")
    print(e.__cause__)  # Underlying exception
except premai.RateLimitError as e:
    print("Rate limited. Back off and retry.")
except premai.APIStatusError as e:
    print(f"API error: {e.status_code}")
    print(e.response)

For older SDK versions (Prem class):

from premai import Prem

client = Prem(api_key="your-api-key")

try:
    response = client.chat.completions.create(
        project_id="your-project-id",
        messages=[{"role": "user", "content": "Hello"}]
    )
except Exception as e:
    # Check exception type based on your SDK version
    print(f"Error: {type(e).__name__}: {e}")

Retry Logic

import time

def chat_with_retry(messages, max_retries=3, backoff=2):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                project_id="your-project-id",
                messages=messages
            )
        except Exception as e:
            error_name = type(e).__name__
            if "RateLimit" in error_name:
                wait_time = getattr(e, 'retry_after', backoff ** attempt)
                if attempt < max_retries - 1:
                    print(f"Rate limited. Waiting {wait_time}s...")
                    time.sleep(wait_time)
                else:
                    raise
            elif "ServerError" in error_name or "APIStatusError" in error_name:
                if attempt < max_retries - 1:
                    time.sleep(backoff ** attempt)
                else:
                    raise
            else:
                raise

Production Patterns

Connection Pooling

from premai import Prem

# Reuse client across your application
client = Prem(api_key="your-api-key")

def get_response(message: str) -> str:
    return client.chat.completions.create(
        project_id="your-project-id",
        messages=[{"role": "user", "content": message}]
    ).choices[0].message.content

# DON'T create new clients for each request

Async for High Throughput

import asyncio
from premai import AsyncPrem  # or AsyncPremAI

async def process_batch(messages: list[str]) -> list[str]:
    client = AsyncPrem(api_key="your-api-key")
    
    tasks = [
        client.chat.completions.create(
            project_id="your-project-id",
            messages=[{"role": "user", "content": msg}]
        )
        for msg in messages
    ]
    
    responses = await asyncio.gather(*tasks)
    return [r.choices[0].message.content for r in responses]

# Process 100 messages concurrently
results = asyncio.run(process_batch(["message"] * 100))

Conversation Management

def manage_history(messages: list, max_tokens: int = 4000) -> list:
    """Keep conversation within token limits."""
    # Rough estimation: 4 chars per token
    total_chars = sum(len(m["content"]) for m in messages)
    
    while total_chars > max_tokens * 4 and len(messages) > 2:
        # Remove oldest user/assistant pair
        messages = messages[2:]
        total_chars = sum(len(m["content"]) for m in messages)
    
    return messages

Complete Examples

Customer Support Bot

import os
from premai import Prem

class CustomerSupportBot:
    def __init__(self):
        self.client = Prem(api_key=os.environ["PREMAI_API_KEY"])
        self.project_id = os.environ["PREMAI_PROJECT_ID"]
        self.repository_id = os.environ["PREMAI_REPO_ID"]
        self.history = []
    
    def chat(self, message: str) -> dict:
        self.history.append({"role": "user", "content": message})
        
        try:
            response = self.client.chat.completions.create(
                project_id=self.project_id,
                messages=self.history,
                system_prompt="""You are a helpful customer support agent.
                Answer based on the provided documentation.
                If unsure, offer to connect with a human agent.""",
                repositories={
                    "ids": [self.repository_id],
                    "limit": 3,
                    "similarity_threshold": 0.7
                },
                temperature=0.3,
                max_tokens=500
            )
            
            answer = response.choices[0].message.content
            self.history.append({"role": "assistant", "content": answer})
            
            sources = []
            if response.document_chunks:
                sources = [c.document_name for c in response.document_chunks]
            
            return {"answer": answer, "sources": sources}
        
        except Exception as e:
            return {"answer": f"Error: {e}", "sources": []}
    
    def reset(self):
        self.history = []

# Usage
if __name__ == "__main__":
    bot = CustomerSupportBot()
    
    while True:
        user_input = input("You: ").strip()
        if user_input.lower() == "quit":
            break
        if user_input.lower() == "reset":
            bot.reset()
            print("Conversation reset.")
            continue
        
        result = bot.chat(user_input)
        print(f"Bot: {result['answer']}")
        if result['sources']:
            print(f"Sources: {', '.join(result['sources'])}")
        print()

Document Q&A API

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from premai import Prem

app = FastAPI()
client = Prem()

class Question(BaseModel):
    query: str
    repository_ids: list[str]

class Answer(BaseModel):
    answer: str
    sources: list[str]

@app.post("/ask", response_model=Answer)
async def ask_question(question: Question):
    try:
        response = client.chat.completions.create(
            project_id="your-project-id",
            messages=[{"role": "user", "content": question.query}],
            repositories={"ids": question.repository_ids},
            temperature=0.2
        )
        
        sources = []
        if response.document_chunks:
            sources = list(set(c.document_name for c in response.document_chunks))
        
        return Answer(
            answer=response.choices[0].message.content,
            sources=sources
        )
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

Troubleshooting

Common Issues

"Authentication failed"

  • Verify API key is correct
  • Check environment variable is set
  • Ensure no whitespace in key

"Project not found"

  • Verify project ID in dashboard
  • Check you have access to the project

"Rate limit exceeded"

  • Implement retry with backoff
  • Check your plan limits
  • Contact support for increases

"Model not available"

  • Check model name spelling
  • Verify model is enabled for your project
  • Some models require enterprise plans

Frequently Asked Questions

Where does my data go?

PremAI deploys in your cloud account (AWS, GCP, Azure). Data never leaves your infrastructure. See our private AI platform guide.

Is this compatible with OpenAI's SDK?

The API design follows OpenAI conventions for easy migration. Main differences:

  • Use project_id parameter
  • Import from premai
  • Use system_prompt parameter (not system role in messages)
  • Repository feature for RAG

What models are available?

50+ models including Llama 3.3, Mistral, Claude, GPT-4. Check your dashboard for current availability.

How do I handle long conversations?

Truncate or summarize old messages to stay within context limits. See the conversation management pattern above.

Next Steps


Additional Resources


Last updated February 2026. SDK features may change. See official documentation for latest information.

Subscribe to Prem AI

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
[email protected]
Subscribe