PremAI Python SDK Quickstart: Complete Guide (2026)
Most AI SDKs make you choose: easy development or data privacy. Use OpenAI's SDK and your data flows through their servers. Self-host everything and you're writing infrastructure code instead of features.
PremAI's Python SDK offers a different path: an OpenAI-like development experience where your data stays in your infrastructure. Same simplicity, complete control.
This guide is comprehensive. We'll cover everything from basic installation to production-ready implementations: chat completions, streaming, RAG with repositories, embeddings, fine-tuning, and integrations with LangChain and LlamaIndex. By the end, you'll have working code for real applications.
Why PremAI SDK vs Other AI SDKs
The AI SDK landscape expanded significantly in 2025-2026:
| SDK | Primary Use Case | Data Privacy | Key Feature |
|---|---|---|---|
| PremAI | Private enterprise AI | Your cloud | Fine-tuning + RAG built-in |
| OpenAI | General purpose | OpenAI servers | GPT-5.2, Agents SDK |
| Anthropic Claude | Coding, reasoning | Anthropic servers | Claude Agent SDK |
| Together AI | Open models | Together servers | Fast fine-tuning |
| Fireworks AI | Low latency | Fireworks servers | Sub-100ms inference |
Why PremAI is different:
- Your infrastructure: Deploys in your AWS/GCP/Azure
- No data retention: Data deleted after inference
- Model portability: Export fine-tuned weights
- 50+ models: Single API for Llama, Mistral, Claude, GPT, DeepSeek
- Built-in RAG: No separate vector database needed
2026 SDK Landscape Changes:
- OpenAI released Agents SDK (open-source, provider-agnostic)
- Anthropic released Claude Agent SDK (Python/TypeScript)
- Together AI SDK v2.0 with TypeScript-like typing
- All major SDKs now use httpx and Pydantic
Installation and Setup
Basic Installation
pip install premai
⚠️ SDK Version Note
IMPORTANT: PremAI has two SDK packages with different APIs:
| Package | Import | Status |
|---|---|---|
premai (PyPI) |
from premai import PremAI |
Current recommended |
prem-python-sdk (GitHub) |
from premai import Prem |
Legacy, still supported |
This guide covers both. Check your version:
import premai
print(f"PremAI SDK version: {premai.__version__}")
For versions 1.x+: Use PremAI class For older versions: Use Prem class
Always check the official documentation for your installed version.
Verify Installation
import premai
print(f"PremAI SDK version: {premai.__version__}")
# Test connection (adjust import based on your version)
from premai import Prem # or PremAI for newer versions
client = Prem(api_key="your-api-key")
print("Connection successful!")
Requirements
- Python 3.8+ (3.9+ for newest SDK)
- PremAI account (sign up at premai.io)
- API key from your PremAI dashboard
- Project ID (created in dashboard)
Authentication Methods
Method 1: Environment Variable (Recommended for Production)
# Set environment variable
export PREMAI_API_KEY="your-api-key-here"
from premai import Prem
# Client automatically reads PREMAI_API_KEY
client = Prem()
Why this is recommended:
- API key never appears in code
- Easy to manage across environments
- Works with container orchestration
- Compatible with secret managers
Method 2: Direct Initialization
from premai import Prem
client = Prem(api_key="your-api-key-here")
Use when:
- Quick testing
- Notebooks and prototyping
- Keys loaded from secret manager at runtime
Security Best Practices
import os
from premai import Prem
# Never do this
# client = Prem(api_key="sk-abc123...") # Key in code!
# Do this instead
api_key = os.environ.get("PREMAI_API_KEY")
if not api_key:
raise ValueError("PREMAI_API_KEY environment variable not set")
client = Prem(api_key=api_key)
Understanding Projects in PremAI
What is a Project?
A project is your workspace in PremAI. Each project has:
- Default model configuration - Which model to use when none specified
- System prompt - Default instructions for all conversations
- Connected repositories - Document collections for RAG
- Usage tracking - Separate metrics per project
- Team access - Who can use this project
Creating a Project
Projects are created in the PremAI dashboard:
- Log in to app.premai.io
- Click "New Project"
- Configure default model
- Set system prompt (optional)
- Note your Project ID
Using Projects
from premai import Prem
client = Prem(api_key="your-api-key")
# Use project settings (model, system prompt from dashboard)
response = client.chat.completions.create(
project_id="your-project-id",
messages=[{"role": "user", "content": "Hello!"}]
)
# Override project defaults
response = client.chat.completions.create(
project_id="your-project-id",
model="llama-3.1-70b-instruct", # Override default model
system_prompt="You are a pirate.", # Override system prompt
messages=[{"role": "user", "content": "Hello!"}]
)
Multiple Projects Pattern
from premai import Prem
client = Prem(api_key="your-api-key")
# Different projects for different use cases
PROJECTS = {
"customer_support": "proj-cs-123",
"code_assistant": "proj-code-456",
"document_qa": "proj-docs-789"
}
def chat(project_name: str, message: str):
return client.chat.completions.create(
project_id=PROJECTS[project_name],
messages=[{"role": "user", "content": message}]
)
Chat Completions: Complete Guide
Basic Chat Completion
from premai import Prem
client = Prem(api_key="your-api-key")
response = client.chat.completions.create(
project_id="your-project-id",
messages=[
{"role": "user", "content": "What is Python?"}
]
)
# Access the response
print(response.choices[0].message.content)
Response Structure
response = client.chat.completions.create(...)
# Full response structure
print(response.id) # Unique response ID
print(response.model) # Model used
print(response.choices[0].message.role) # "assistant"
print(response.choices[0].message.content) # The response text
print(response.choices[0].finish_reason) # "stop", "length", etc.
print(response.usage.prompt_tokens) # Input tokens
print(response.usage.completion_tokens) # Output tokens
print(response.usage.total_tokens) # Total tokens
System Prompts
⚠️ IMPORTANT: The PremAI SDK does NOT support"role": "system"in the messages array. You MUST use thesystem_promptparameter instead.
# ✅ CORRECT: Use system_prompt parameter
response = client.chat.completions.create(
project_id="your-project-id",
system_prompt="You are a helpful Python tutor. Explain concepts with code examples.",
messages=[{"role": "user", "content": "What are decorators?"}]
)
# ❌ INCORRECT: This will NOT work as expected
# response = client.chat.completions.create(
# project_id="your-project-id",
# messages=[
# {"role": "system", "content": "You are a helpful Python tutor."}, # NOT SUPPORTED
# {"role": "user", "content": "What are decorators?"}
# ]
# )
Multi-Turn Conversations
# Maintain conversation history
conversation = []
def chat(user_message: str) -> str:
conversation.append({"role": "user", "content": user_message})
response = client.chat.completions.create(
project_id="your-project-id",
messages=conversation
)
assistant_message = response.choices[0].message.content
conversation.append({"role": "assistant", "content": assistant_message})
return assistant_message
# Usage
print(chat("I want to learn Python"))
print(chat("What should I start with?"))
print(chat("Show me an example"))
Specifying Models
# Available models (check dashboard for current list)
models = [
"llama-3.1-8b-instruct",
"llama-3.1-70b-instruct",
"llama-3.3-70b-instruct",
"mistral-large",
"claude-3-5-sonnet",
"gpt-4o"
]
response = client.chat.completions.create(
project_id="your-project-id",
model="llama-3.3-70b-instruct",
messages=[{"role": "user", "content": "Hello!"}]
)
Generation Parameters
response = client.chat.completions.create(
project_id="your-project-id",
messages=[{"role": "user", "content": "Write a creative story opening"}],
# Creativity control
temperature=0.8, # 0-2, higher = more random
top_p=0.9, # Nucleus sampling threshold
# Length control
max_tokens=1000, # Maximum response length
# Determinism
seed=42 # For reproducible outputs
)
Parameter Guide
| Parameter | Range | Default | Effect |
|---|---|---|---|
| temperature | 0-2 | 1.0 | Randomness. 0=deterministic, 2=very random |
| top_p | 0-1 | 1.0 | Nucleus sampling. Lower = more focused |
| max_tokens | 1-∞ | Model limit | Maximum response length |
| seed | integer | None | Reproducible outputs |
Use Case Examples
Factual Q&A (low temperature):
response = client.chat.completions.create(
project_id="your-project-id",
messages=[{"role": "user", "content": "What is the capital of France?"}],
temperature=0.1,
max_tokens=100
)
Creative writing (high temperature):
response = client.chat.completions.create(
project_id="your-project-id",
messages=[{"role": "user", "content": "Write a poem about coding"}],
temperature=0.9,
max_tokens=500
)
Code generation (moderate temperature):
response = client.chat.completions.create(
project_id="your-project-id",
messages=[{"role": "user", "content": "Write a Python function to sort a list"}],
temperature=0.3,
max_tokens=500
)
Streaming Responses
Basic Streaming
from premai import Prem
client = Prem(api_key="your-api-key")
stream = client.chat.completions.create(
project_id="your-project-id",
messages=[{"role": "user", "content": "Explain machine learning in detail"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta["content"]:
print(chunk.choices[0].delta["content"], end="", flush=True)
print() # Newline at end
Collecting Streamed Response
def stream_and_collect(messages: list) -> str:
"""Stream response while also collecting the full text."""
stream = client.chat.completions.create(
project_id="your-project-id",
messages=messages,
stream=True
)
full_response = ""
for chunk in stream:
content = chunk.choices[0].delta.get("content")
if content:
print(content, end="", flush=True)
full_response += content
print()
return full_response
# Usage
response = stream_and_collect([
{"role": "user", "content": "Write a short story"}
])
print(f"\nTotal length: {len(response)} characters")
Async Streaming
import asyncio
from premai import AsyncPrem # or AsyncPremAI for newer versions
async def stream_response(prompt: str):
client = AsyncPrem(api_key="your-api-key")
stream = await client.chat.completions.create(
project_id="your-project-id",
messages=[{"role": "user", "content": prompt}],
stream=True
)
async for chunk in stream:
if chunk.choices[0].delta.get("content"):
print(chunk.choices[0].delta["content"], end="", flush=True)
print()
# Run
asyncio.run(stream_response("Explain quantum computing"))
RAG with Repositories
Understanding Repositories
Repositories are document collections for retrieval-augmented generation (RAG). When you query with a repository ID, PremAI:
- Converts your query to an embedding
- Searches the repository for relevant chunks
- Includes retrieved context in the LLM prompt
- Generates a grounded response
Creating a Repository
# Create repository
repository = client.repositories.create(
name="product-documentation",
description="Company product docs and FAQs",
organization="your-org-name"
)
print(f"Created repository: {repository.id}")
Uploading Documents
Note: The document upload API uses singularrepository.document, notrepositories.documents.
# Single document
client.repository.document.create(
repository_id="repo-123",
file="./docs/user-guide.pdf"
)
# Multiple documents
documents = [
"./docs/installation.md",
"./docs/api-reference.pdf",
"./docs/troubleshooting.txt"
]
for doc_path in documents:
result = client.repository.document.create(
repository_id="repo-123",
file=doc_path
)
print(f"Uploaded: {doc_path} -> {result.document_id}")
Supported Document Formats
- PDF (.pdf)
- Word (.docx)
- Text (.txt)
- Markdown (.md)
- HTML (.html)
Querying with RAG
response = client.chat.completions.create(
project_id="your-project-id",
messages=[
{"role": "user", "content": "How do I install the software?"}
],
repositories={
"ids": ["repo-123"], # Repository IDs
"limit": 5, # Number of chunks to retrieve
"similarity_threshold": 0.7 # Minimum relevance score
}
)
print(response.choices[0].message.content)
Accessing Retrieved Context
response = client.chat.completions.create(
project_id="your-project-id",
messages=[{"role": "user", "content": "What are the system requirements?"}],
repositories={"ids": ["repo-123"]}
)
# Main response
print("Answer:", response.choices[0].message.content)
# Retrieved documents
if response.document_chunks:
print("\nSources:")
for chunk in response.document_chunks:
print(f"- {chunk.document_name}")
print(f" Relevance: {chunk.similarity_score:.2f}")
print(f" Content: {chunk.content[:200]}...")
print()
RAG Best Practices
1. Use appropriate thresholds:
# Start with 0.7, adjust based on results
# Lower threshold = more results, potentially less relevant
# Higher threshold = fewer results, more relevant
2. Combine with system prompts:
response = client.chat.completions.create(
project_id="your-project-id",
messages=[{"role": "user", "content": "What's the refund policy?"}],
system_prompt="""Answer based ONLY on the provided context.
If the answer isn't in the context, say "I don't have that information."
Always cite your sources.""",
repositories={"ids": ["repo-policies"], "limit": 3}
)
For more RAG details, see the datasets documentation.
Embeddings
Basic Embedding Generation
response = client.embeddings.create(
project_id="your-project-id",
model="text-embedding-3-large",
input="What is machine learning?"
)
embedding = response.data[0].embedding
print(f"Embedding dimension: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")
Batch Embeddings
texts = [
"Machine learning is a subset of AI",
"Deep learning uses neural networks",
"Natural language processing handles text",
"Computer vision processes images"
]
response = client.embeddings.create(
project_id="your-project-id",
model="text-embedding-3-large",
input=texts
)
embeddings = [item.embedding for item in response.data]
print(f"Generated {len(embeddings)} embeddings")
Available Embedding Models
| Model | Dimensions | Use Case |
|---|---|---|
| text-embedding-3-large | 3072 | Best quality |
| text-embedding-3-small | 1536 | Good balance |
| text-embedding-ada-002 | 1536 | Legacy compatibility |
Similarity Search
import numpy as np
def cosine_similarity(a: list, b: list) -> float:
a = np.array(a)
b = np.array(b)
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
# Create embeddings
query = "How to train a model?"
documents = [
"Fine-tuning involves training on specific data",
"The weather is nice today",
"Machine learning models learn from data"
]
query_embedding = client.embeddings.create(
project_id="your-project-id",
input=query
).data[0].embedding
doc_embeddings = client.embeddings.create(
project_id="your-project-id",
input=documents
).data
# Find most similar
similarities = []
for i, doc_emb in enumerate(doc_embeddings):
sim = cosine_similarity(query_embedding, doc_emb.embedding)
similarities.append((sim, documents[i]))
# Sort by similarity
similarities.sort(reverse=True)
for sim, doc in similarities:
print(f"{sim:.3f}: {doc}")
Fine-Tuning
Preparing Training Data
Training data should be in JSONL format:
{"messages": [{"role": "user", "content": "What is your return policy?"}, {"role": "assistant", "content": "We offer a 30-day money-back guarantee on all products."}]}
{"messages": [{"role": "user", "content": "How do I track my order?"}, {"role": "assistant", "content": "Log into your account and click 'Order History' to see tracking information."}]}
{"messages": [{"role": "user", "content": "Do you ship internationally?"}, {"role": "assistant", "content": "Yes, we ship to over 100 countries. Shipping costs vary by location."}]}
Creating a Fine-Tuning Job
# Upload training data
dataset = client.datasets.create(
name="customer-support-v2",
file_path="./training_data.jsonl"
)
print(f"Dataset created: {dataset.id}")
# Start fine-tuning
job = client.finetuning.create(
base_model="llama-3.1-8b-instruct",
dataset_id=dataset.id,
method="lora", # Options: "lora", "qlora", "full"
hyperparameters={
"learning_rate": 2e-4,
"num_epochs": 3,
"batch_size": 8,
"lora_r": 64,
"lora_alpha": 128,
"lora_dropout": 0.05
}
)
print(f"Fine-tuning job started: {job.id}")
Monitoring Progress
import time
while True:
job = client.finetuning.get(job_id=job.id)
print(f"Status: {job.status}")
print(f"Progress: {job.progress}%")
if job.current_loss:
print(f"Current loss: {job.current_loss:.4f}")
if job.status in ["completed", "failed"]:
break
time.sleep(60) # Check every minute
if job.status == "completed":
print(f"Training complete! Model ID: {job.model_id}")
else:
print(f"Training failed: {job.error}")
Using Fine-Tuned Models
response = client.chat.completions.create(
project_id="your-project-id",
model=f"ft:{job.model_id}",
messages=[
{"role": "user", "content": "What's your return policy?"}
]
)
print(response.choices[0].message.content)
Fine-Tuning Best Practices
- Data quality > quantity: 100 excellent examples beat 10,000 mediocre ones
- Consistent formatting: Use the same conversation format throughout
- Start with LoRA: Less expensive, often sufficient
- Hold out test data: Keep 10-20% for evaluation
For detailed guides, see fine-tuning documentation.
LangChain Integration
Installation
pip install langchain langchain-community
Basic Usage
from langchain_community.chat_models import ChatPremAI
from langchain_core.messages import HumanMessage, SystemMessage
chat = ChatPremAI(
project_id="your-project-id",
premai_api_key="your-api-key"
)
response = chat.invoke([
HumanMessage(content="What is Python?")
])
print(response.content)
Note: For system prompts with LangChain, you can use SystemMessage, but be aware this may override your LaunchPad system prompt.With Chains
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
prompt = ChatPromptTemplate.from_messages([
("system", "You are a {role}. Be helpful and concise."),
("user", "{question}")
])
chain = prompt | chat | StrOutputParser()
result = chain.invoke({
"role": "Python expert",
"question": "How do list comprehensions work?"
})
print(result)
RAG with LangChain
from langchain_community.chat_models import ChatPremAI
chat = ChatPremAI(
project_id="your-project-id",
premai_api_key="your-api-key",
repositories={"ids": ["repo-123"]}
)
response = chat.invoke([
HumanMessage(content="What does the documentation say about installation?")
])
print(response.content)
Streaming with LangChain
for chunk in chat.stream([HumanMessage(content="Tell me a story")]):
print(chunk.content, end="", flush=True)
LlamaIndex Integration
Installation
pip install llama-index llama-index-llms-premai
Basic Usage
from llama_index.llms.premai import PremAI
llm = PremAI(
project_id="your-project-id",
api_key="your-api-key"
)
response = llm.complete("What is machine learning?")
print(response.text)
Chat Mode
from llama_index.core.llms import ChatMessage
messages = [
ChatMessage(role="user", content="What is Python?")
]
response = llm.chat(messages)
print(response.message.content)
Building RAG with LlamaIndex
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.premai import PremAI
from llama_index.core import Settings
# Set PremAI as the LLM
Settings.llm = PremAI(
project_id="your-project-id",
api_key="your-api-key"
)
# Load and index documents
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
# Query
query_engine = index.as_query_engine()
response = query_engine.query("What are the key findings?")
print(response)
Error Handling and Best Practices
Exception Types
⚠️ SDK Version Note: Exception class names vary by SDK version. Check the official documentation for your version.
For newer SDK versions (premai on PyPI):
import premai
from premai import PremAI
client = PremAI(api_key="your-api-key")
try:
response = client.chat.completions.create(
project_id="your-project-id",
messages=[{"role": "user", "content": "Hello"}]
)
except premai.APIConnectionError as e:
print("The server could not be reached")
print(e.__cause__) # Underlying exception
except premai.RateLimitError as e:
print("Rate limited. Back off and retry.")
except premai.APIStatusError as e:
print(f"API error: {e.status_code}")
print(e.response)
For older SDK versions (Prem class):
from premai import Prem
client = Prem(api_key="your-api-key")
try:
response = client.chat.completions.create(
project_id="your-project-id",
messages=[{"role": "user", "content": "Hello"}]
)
except Exception as e:
# Check exception type based on your SDK version
print(f"Error: {type(e).__name__}: {e}")
Retry Logic
import time
def chat_with_retry(messages, max_retries=3, backoff=2):
for attempt in range(max_retries):
try:
return client.chat.completions.create(
project_id="your-project-id",
messages=messages
)
except Exception as e:
error_name = type(e).__name__
if "RateLimit" in error_name:
wait_time = getattr(e, 'retry_after', backoff ** attempt)
if attempt < max_retries - 1:
print(f"Rate limited. Waiting {wait_time}s...")
time.sleep(wait_time)
else:
raise
elif "ServerError" in error_name or "APIStatusError" in error_name:
if attempt < max_retries - 1:
time.sleep(backoff ** attempt)
else:
raise
else:
raise
Production Patterns
Connection Pooling
from premai import Prem
# Reuse client across your application
client = Prem(api_key="your-api-key")
def get_response(message: str) -> str:
return client.chat.completions.create(
project_id="your-project-id",
messages=[{"role": "user", "content": message}]
).choices[0].message.content
# DON'T create new clients for each request
Async for High Throughput
import asyncio
from premai import AsyncPrem # or AsyncPremAI
async def process_batch(messages: list[str]) -> list[str]:
client = AsyncPrem(api_key="your-api-key")
tasks = [
client.chat.completions.create(
project_id="your-project-id",
messages=[{"role": "user", "content": msg}]
)
for msg in messages
]
responses = await asyncio.gather(*tasks)
return [r.choices[0].message.content for r in responses]
# Process 100 messages concurrently
results = asyncio.run(process_batch(["message"] * 100))
Conversation Management
def manage_history(messages: list, max_tokens: int = 4000) -> list:
"""Keep conversation within token limits."""
# Rough estimation: 4 chars per token
total_chars = sum(len(m["content"]) for m in messages)
while total_chars > max_tokens * 4 and len(messages) > 2:
# Remove oldest user/assistant pair
messages = messages[2:]
total_chars = sum(len(m["content"]) for m in messages)
return messages
Complete Examples
Customer Support Bot
import os
from premai import Prem
class CustomerSupportBot:
def __init__(self):
self.client = Prem(api_key=os.environ["PREMAI_API_KEY"])
self.project_id = os.environ["PREMAI_PROJECT_ID"]
self.repository_id = os.environ["PREMAI_REPO_ID"]
self.history = []
def chat(self, message: str) -> dict:
self.history.append({"role": "user", "content": message})
try:
response = self.client.chat.completions.create(
project_id=self.project_id,
messages=self.history,
system_prompt="""You are a helpful customer support agent.
Answer based on the provided documentation.
If unsure, offer to connect with a human agent.""",
repositories={
"ids": [self.repository_id],
"limit": 3,
"similarity_threshold": 0.7
},
temperature=0.3,
max_tokens=500
)
answer = response.choices[0].message.content
self.history.append({"role": "assistant", "content": answer})
sources = []
if response.document_chunks:
sources = [c.document_name for c in response.document_chunks]
return {"answer": answer, "sources": sources}
except Exception as e:
return {"answer": f"Error: {e}", "sources": []}
def reset(self):
self.history = []
# Usage
if __name__ == "__main__":
bot = CustomerSupportBot()
while True:
user_input = input("You: ").strip()
if user_input.lower() == "quit":
break
if user_input.lower() == "reset":
bot.reset()
print("Conversation reset.")
continue
result = bot.chat(user_input)
print(f"Bot: {result['answer']}")
if result['sources']:
print(f"Sources: {', '.join(result['sources'])}")
print()
Document Q&A API
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from premai import Prem
app = FastAPI()
client = Prem()
class Question(BaseModel):
query: str
repository_ids: list[str]
class Answer(BaseModel):
answer: str
sources: list[str]
@app.post("/ask", response_model=Answer)
async def ask_question(question: Question):
try:
response = client.chat.completions.create(
project_id="your-project-id",
messages=[{"role": "user", "content": question.query}],
repositories={"ids": question.repository_ids},
temperature=0.2
)
sources = []
if response.document_chunks:
sources = list(set(c.document_name for c in response.document_chunks))
return Answer(
answer=response.choices[0].message.content,
sources=sources
)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
Troubleshooting
Common Issues
"Authentication failed"
- Verify API key is correct
- Check environment variable is set
- Ensure no whitespace in key
"Project not found"
- Verify project ID in dashboard
- Check you have access to the project
"Rate limit exceeded"
- Implement retry with backoff
- Check your plan limits
- Contact support for increases
"Model not available"
- Check model name spelling
- Verify model is enabled for your project
- Some models require enterprise plans
Frequently Asked Questions
Where does my data go?
PremAI deploys in your cloud account (AWS, GCP, Azure). Data never leaves your infrastructure. See our private AI platform guide.
Is this compatible with OpenAI's SDK?
The API design follows OpenAI conventions for easy migration. Main differences:
- Use
project_idparameter - Import from
premai - Use
system_promptparameter (not system role in messages) - Repository feature for RAG
What models are available?
50+ models including Llama 3.3, Mistral, Claude, GPT-4. Check your dashboard for current availability.
How do I handle long conversations?
Truncate or summarize old messages to stay within context limits. See the conversation management pattern above.
Next Steps
Additional Resources
- PremAI Documentation
- Private AI Platform Guide
- Fine-Tuning AI Models
- Self-Hosted LLM Guide
- Private LLM Deployment
Last updated February 2026. SDK features may change. See official documentation for latest information.