Prem
  • Back to homepage
  • All Articles
  • Resources
Sign in Subscribe
Introducing Benchmarks v2

Introducing Benchmarks v2

Prem's Benchmarks v2 is an open-source project evaluating 13+ LLM inference engines, including vLLM and TensorRT LLM, across precisions like float32, float16, int4, and int8. It helps the open-source community and enterprises understand LLM inference performance.
02 May 2024 12 min read
RAG Strategies
RAGs

RAG Strategies

The article "RAG Strategies" explores Retrieval-Augmented Generation (RAG) methods, detailing Naive RAG, Advanced RAG, and Modular RAG approaches. It introduces RAFT, a fine-tuning technique, and discusses optimizing large language models for RAG tasks.
18 Apr 2024 15 min read
Finetuning with LoRA and variants

Finetuning with LoRA and variants

The article "Finetuning with LoRA and Variants" discusses Low-Rank Adaptation (LoRA), a technique that efficiently fine-tunes large language models by adding a small number of trainable parameters, reducing computational costs. It also explores LoRA+ and other innovations enhancing model adaptation.
16 Apr 2024 9 min read
Mixture of Experts - Part 2

Mixture of Experts - Part 2

The article "Mixture of Experts - Part 2" explores the resurgence of Mixture of Experts (MoE) architectures in NLP, highlighting models like Mixtral 8x7B and DeepMind's Mixture-of-Depths. It discusses MoE's evolution since 1991, focusing on scalability and efficiency in large language models.
09 Apr 2024 11 min read
Model Alignment Process
LLMs

Model Alignment Process

The article discusses methods to align large language models (LLMs) with human preferences, focusing on techniques like Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO). It also introduces non-RL methods such as Kahneman-Tversky Optimization (KTO).
28 Mar 2024 11 min read
Serverless Deployment of Mistral 7B v0.2 using Runpod

Serverless Deployment of Mistral 7B v0.2 using Runpod

The article provides a step-by-step guide to deploying the Mistral 7B v0.2 model on RunPod's serverless GPU cloud infrastructure. It covers setting up the environment, writing deployment scripts, and configuring the Docker environment for efficient AI application scaling.
26 Mar 2024 12 min read
Serverless Deployment with Google Gemma using Beam Cloud

Serverless Deployment with Google Gemma using Beam Cloud

Deploy Google Gemma 2B on Beam Cloud using FastAPI for serverless inference. This guide covers model setup, Hugging Face token authentication, autoscaling, and seamless deployment. Learn how Beam Cloud simplifies LLM hosting with scalable infrastructure.
22 Mar 2024 8 min read
Serverless Deployment of Mistral 7B with Modal Labs and HuggingFace

Serverless Deployment of Mistral 7B with Modal Labs and HuggingFace

Learn how to deploy Mistral-7B-Instruct serverlessly using Modal Labs for cost-efficient, scalable AI inference. This guide covers serverless benefits, cost savings, cold starts, and a step-by-step deployment process with Hugging Face Transformers.
21 Mar 2024 9 min read
SLM Journey Unveiled

SLM Journey Unveiled

Prem’s "SLM Journey Unveiled" details training a 1B parameter Small Language Model with 8K context length. It covers dataset challenges, Distributed Data Parallelism (DDP) with Ray, and optimization techniques for data partitioning and gradient synchronization.
20 Mar 2024 9 min read
Providers Empirical Testing
LLMs

Providers Empirical Testing

Prem's study evaluates how different providers implement Large Language Models (LLMs), affecting model quality. Using various prompts, the study compares providers like Anyscale, Replicate, FireworksAI, and Together, assessing parameters such as verbosity, correctness, and processing speed
18 Mar 2024 6 min read
LLM Datasets and Contamination
LLMs

LLM Datasets and Contamination

Prem AI's post addresses dataset contamination in LLM training, where overlap between training and test sets inflates performance. It explores detection methods, data curation best practices, and ethical concerns around mislabeled or duplicated content.
12 Mar 2024 14 min read
Model Merging

Model Merging

Discover LLM model merging, a technique to combine multiple Large Language Models (LLMs) into a single, more powerful model without extra training. Explore top merging methods like Linear, SLERP, Task Arithmetic, TIES, and DARE, with YAML configs and practical use cases.
27 Feb 2024 12 min read
Emergent Capabilities of LLMs
LLMs

Emergent Capabilities of LLMs

ChatGPT ha detto: ChatGPT The article explores how LLMs develop emergent abilities like few-shot prompting and chain-of-thought reasoning as they scale, highlighting key research findings and debates on predictability, potential misuse, and understanding their limitations.
20 Feb 2024 12 min read
Evaluation of LLMs - Part 2
LLMs

Evaluation of LLMs - Part 2

The article explores using large language models (LLMs) as evaluators, addressing concerns about accuracy and inherent biases. It highlights the need for scalable meta-evaluation schemes and discusses fine-tuned evaluation models like Prometheus 13B, which aligns closely with human evaluators.
11 Feb 2024 7 min read
Evaluation of LLMs - Part 1
LLMs

Evaluation of LLMs - Part 1

The article "Evaluation of LLMs - Part 1" delves into the rapid development of Large Language Models (LLMs) and the necessity for robust evaluation strategies. It examines traditional n-gram-based metrics like BLEU and ROUGE, discussing their roles and limitations in assessing LLM performance.
30 Jan 2024 14 min read
Startup Grants Program: Unlock Your AI Potential With Prem!

Startup Grants Program: Unlock Your AI Potential With Prem!

Prem has launched a Startup Grants Program to support AI innovators. Selected startups receive six months of free access to APIs from models like OpenAI and Anthropic, along with complimentary fine-tuning services. This initiative aims to empower visionary AI projects.
24 Jan 2024 2 min read
Mamba Simplified - Part 2 - S4 and Mamba

Mamba Simplified - Part 2 - S4 and Mamba

This article delves into State Space Models, focusing on the S4 and Mamba architectures. It discusses their mathematical foundations, including differential equations and convolutions, and examines how these models balance parallelizable training with efficient inference in sequence modeling.
23 Jan 2024 21 min read
Mamba Simplified - Part 1 - Essential Pre-Requisites
LLMs

Mamba Simplified - Part 1 - Essential Pre-Requisites

This article introduces foundational concepts like derivatives, differential equations, and neural networks, essential for understanding Mamba—a Small Language Model based on State Space Models (SSMs). It serves as a refresher to grasp Mamba's architecture effectively.
13 Jan 2024 11 min read
The Tiny LLM Revolution - Part 1
LLMs

The Tiny LLM Revolution - Part 1

This article examines the emergence of Small Language Models (SLMs), discussing the impact of high-quality data on their capabilities. It highlights studies like TinyStories and Microsoft's Phi-series, exploring how SLMs can achieve performance comparable to larger models.
04 Jan 2024 14 min read
LLM360, A true Open Source LLM
News

LLM360, A true Open Source LLM

LLM360 introduces true open-source AI with its 7B parameter models, Amber and CrystalCoder, providing full transparency by sharing training datasets, preprocessing code, model configurations, intermediate checkpoints, and evaluation metrics, fostering collaborative and reproducible AI research.
22 Dec 2023 11 min read
The Synthetic Data Revolution
Data Privacy

The Synthetic Data Revolution

This article delves into the emergence of synthetic data in AI, discussing its generation methods, applications across various data types, and its significance in overcoming data scarcity and privacy challenges, ultimately contributing to the pursuit of Artificial General Intelligence (AGI)
19 Dec 2023 13 min read
MoEs comeback in GenAI with Mixtral
MoEs

MoEs comeback in GenAI with Mixtral

This article takes a deep dive into Mixture of Experts models, spotlighting Mistral's latest release. Learn how this architecture enhances AI scalability, efficiency, and performance, paving the way for next-gen AI systems that balance resource optimization with powerful capabilities.
13 Dec 2023 9 min read
RAGs are cool, but what about their privacy?
RAGs

RAGs are cool, but what about their privacy?

This article explores privacy concerns in Retrieval-Augmented Generation (RAG) applications, highlighting data protection challenges and offering actionable solutions to ensure secure and compliant AI systems while leveraging the benefits of RAG.
12 Dec 2023 6 min read
← Newer Posts Page 3 of 3
Prem © 2025
  • Sign up
Powered by Ghost