Skip to main content

Models: DeepSeek-First with LiteLLM

OfficePlane routes all LLM calls through LiteLLM, a thin proxy library that exposes a unified interface across providers. Model names are passed as a single string in provider/model-name format, making it straightforward to swap providers without touching call sites.

Why LiteLLM

Before LiteLLM, switching from one provider to another required updating SDK imports, authentication patterns, and response parsing in every file that called an LLM. LiteLLM collapses this to a single string:

from litellm import acompletion

response = await acompletion(
model="deepseek/deepseek-chat-v4-flash", # or "openai/gpt-4o", "anthropic/claude-3-5-sonnet", ...
messages=[{"role": "user", "content": prompt}],
)

Changing the model is an environment variable change, not a code change.

Two Tiers: Flash and Pro

Skills declare which tier they need in SKILL.md. The runtime resolves the tier to a concrete model string using model_for_tier().

TierDefault ModelUse Cases
flashdeepseek/deepseek-chat-v4-flashIngestion, bulk generation, summarization, translation, most editing tasks
prodeepseek/deepseek-chat-v4-proMulti-step reasoning, compliance analysis, fact-checking, complex restructuring

The flash tier is the default for new skills. Upgrade to pro only when the task requires sustained multi-step reasoning, because pro is ~8x slower and significantly more expensive per token.

Environment Variables

VariableDefaultDescription
OFFICEPLANE_AGENT_MODEL_FLASHdeepseek/deepseek-chat-v4-flashModel used for flash-tier skills
OFFICEPLANE_AGENT_MODEL_PROdeepseek/deepseek-chat-v4-proModel used for pro-tier skills
OFFICEPLANE_INGESTION_MODELdeepseek/deepseek-chat-v4-flashModel used for the vision ingestion pipeline

To override for a deployment, set these in .env or pass them to Docker:

OFFICEPLANE_AGENT_MODEL_FLASH=openai/gpt-4o-mini
OFFICEPLANE_AGENT_MODEL_PRO=openai/gpt-4o
OFFICEPLANE_INGESTION_MODEL=google/gemini-2.0-flash

Any model string supported by LiteLLM is valid. Run litellm --list-models to see all available options.

model_for_tier() Factory

# src/officeplane/agent/model_config.py

import os

_FLASH = os.getenv("OFFICEPLANE_AGENT_MODEL_FLASH", "deepseek/deepseek-chat-v4-flash")
_PRO = os.getenv("OFFICEPLANE_AGENT_MODEL_PRO", "deepseek/deepseek-chat-v4-pro")

def model_for_tier(tier: str) -> str:
"""Return the LiteLLM model string for the given tier.

Args:
tier: "flash" or "pro"

Returns:
LiteLLM provider/model-name string.
"""
if tier == "pro":
return _PRO
return _FLASH

Skills call this at dispatch time:

model = model_for_tier(skill.tier)  # "flash" or "pro" from SKILL.md frontmatter
response = await acompletion(model=model, messages=messages)

Embeddings

Document chunks are embedded and stored in PostgreSQL with the pgvector extension (1536 dimensions, inner-product similarity).

PropertyCurrentPending
EncoderOpenAI text-embedding-ada-002Google embedding-001
Dimensions1536768
Cost$0.0001 / 1K tokensFree (Gemini quota)

The pending migration to embedding-001 will reduce storage by ~50% and eliminate embedding costs. Existing 1536-dim vectors will be re-embedded during a one-time migration job; a compatibility flag in the chunks table (embedding_model) tracks which encoder produced each row.

Semantic search uses pgvector's <=> cosine distance operator:

SELECT id, content, embedding <=> $1 AS distance
FROM chunks
ORDER BY distance
LIMIT 10;