Model Roles¶
Lerim separates orchestration (OpenAI Agents SDK lead), DSPy extraction, and DSPy summarization. A [roles.codex] block is also parsed for config (e.g. Cloud UI); it is not consumed by LerimOAIAgent at runtime today.
Runtime roles¶
| Role | Runtime | Purpose | Default model |
|---|---|---|---|
lead |
OpenAI Agents SDK | Orchestrates sync/maintain/ask via tools (write_memory, extract_pipeline, memory_search, …); only path that writes memory |
See default.toml |
extract |
DSPy | Extracts decision and learning candidates from session transcripts | See default.toml |
summarize |
DSPy | Generates structured session summaries from transcripts | See default.toml |
Config-only role¶
| Role | Purpose |
|---|---|
codex |
Parsed into Config (e.g. for Lerim Cloud). Reserved for future use — not wired into the lead runtime. |
flowchart LR
Transcript[SessionTranscript] --> Ext[extract_DSPy]
Transcript --> Sum[summarize_DSPy]
Ext --> Lead[lead_OAI_agent]
Sum --> Lead
Lead --> Tools[SDK_tools]
Tools include DSPy pipeline tools invoked by the lead agent and filesystem/search helpers as defined in src/lerim/runtime/oai_tools.py.
Role configuration¶
Each role is configured under [roles.<name>] in your TOML config.
[roles.lead]
provider = "minimax"
model = "MiniMax-M2.5"
api_base = ""
fallback_models = ["zai:glm-4.7"]
timeout_seconds = 300
max_iterations = 10
openrouter_provider_order = []
thinking = true
The lead agent is the only component allowed to write memory files. It
orchestrates the full sync, maintain, and ask flows. Uses LitellmModel
via the OpenAI Agents SDK to support non-OpenAI providers.
[roles.extract]
provider = "minimax"
model = "MiniMax-M2.5"
api_base = ""
fallback_models = ["zai:glm-4.5-air"]
timeout_seconds = 180
max_window_tokens = 300000
window_overlap_tokens = 5000
openrouter_provider_order = []
thinking = true
max_workers = 4
Extraction runs through dspy.ChainOfThought with transcript windowing.
Large transcripts are split into overlapping windows of max_window_tokens,
processed independently, then merged in a final call.
[roles.summarize]
provider = "minimax"
model = "MiniMax-M2.5"
api_base = ""
fallback_models = ["zai:glm-4.5-air"]
timeout_seconds = 180
max_window_tokens = 300000
window_overlap_tokens = 5000
openrouter_provider_order = []
thinking = true
max_workers = 4
Summarization uses the same windowed ChainOfThought approach as extraction, producing structured summaries with frontmatter.
Non-OpenAI providers and ResponsesProxy¶
The OpenAI Agents SDK natively uses the OpenAI Responses API. To support
non-OpenAI providers (MiniMax, Z.AI, Ollama, etc.), Lerim uses a
ResponsesProxy that translates Responses API calls into standard Chat
Completions API calls via LiteLLM. This is transparent -- you configure
providers the same way as before.
Switching providers¶
You can point any role at a different provider by changing provider and model.
Use OpenAI directly¶
Requires OPENAI_API_KEY in your environment.
Use Z.AI (Coding Plan)¶
Requires ZAI_API_KEY in your environment.
Use Anthropic via OpenRouter¶
Requires OPENROUTER_API_KEY in your environment. OpenRouter proxies the
request to Anthropic.
Use Ollama (local models)¶
No API key required. Make sure Ollama is running locally (ollama serve or the
macOS background service). Lerim automatically loads models into RAM before each
sync/maintain cycle and unloads them immediately after, so the model only uses
memory during active processing. Disable this with auto_unload = false in
[providers].
Override the api_base per-role or set the default in [providers]:
[providers]
ollama = "http://127.0.0.1:11434"
auto_unload = true # free model RAM between cycles (default)
If Lerim runs in Docker and Ollama on the host, use host.docker.internal:
Use vllm-mlx (Apple Silicon local models)¶
No API key required. Requires vllm-mlx
running locally (pip install vllm-mlx). Start the server with:
Override the default base URL per-role or in [providers]:
Cost optimization
Use a cheaper/faster model for extract and summarize (high-volume DSPy
tasks) and a more capable model for lead (orchestration and reasoning).
Common options¶
All roles share these configuration keys:
| Option | Description |
|---|---|
provider |
Backend: minimax, zai, openrouter, openai, ollama, mlx, opencode_go, … |
model |
Model identifier (for OpenRouter, use the full slug e.g. anthropic/claude-sonnet-4-5-20250929) |
api_base |
Custom API endpoint. Empty = use default from [providers] |
fallback_models |
Ordered fallback chain: "model" (same provider) or "provider:model" |
timeout_seconds |
HTTP request timeout in seconds |
thinking |
Enable model reasoning (default: true, set false for non-reasoning models) |
Orchestration role (lead) also has: max_iterations.
DSPy roles (extract, summarize) also have: max_window_tokens, window_overlap_tokens, max_workers (default: 4, set 1 for local models).
Fallback models¶
When a primary model fails, Lerim tries each fallback in order:
[roles.extract]
provider = "minimax"
model = "MiniMax-M2.5"
fallback_models = ["zai:glm-4.5-air", "openai:gpt-4.1-mini"]
"model-slug"-- uses the same provider as the role"provider:model-slug"-- uses a different provider (requires that provider's API key)
API key resolution¶
| Provider | Environment variable |
|---|---|
minimax |
MINIMAX_API_KEY |
zai |
ZAI_API_KEY |
openrouter |
OPENROUTER_API_KEY |
openai |
OPENAI_API_KEY |
ollama |
(none required) |
mlx |
(none required) |
Missing keys
If the required API key for a role's provider is not set, Lerim raises an error at startup. There is no silent fallback.