Capacity and SLOs¶
These targets describe the intended operating envelope for the current open-source runtime on one machine, not a hosted multi-tenant service.
Baseline assumptions¶
- Storage is on a local SSD, not a network filesystem.
- One background runtime owns routine ingest and curation.
- SQLite runs with WAL enabled so readers can continue during short writes.
- Embedding model download and first-time warmup are outside steady-state SLOs.
- LLM answer generation time is tracked separately from local retrieval time.
SLO targets¶
| Area | Target |
|---|---|
answer retrieval latency |
p50 under 500 ms, p95 under 2 s for a warm local index up to 100k records/project. |
answer end-to-end latency |
p95 under 8 s for normal questions when the answer model is available locally or through a responsive API. |
| Ingest throughput | Drain normal agent-session traces at 5-20 MB/minute, bounded mostly by extraction/model time rather than SQLite writes. |
| Queue recovery | After a crash or restart, queued ingest work resumes without manual cleanup; duplicate processing must be idempotent. |
| Index freshness | FTS and embedding indexes are derived from canonical records and should normally have zero missing rows after writes settle. |
| Curation | Routine compaction, supersession, and index refresh should finish in minutes for an active project, not block reads. |
Search index freshness contract¶
The records table is the source of truth. records_fts and
record_embeddings are derived indexes used by answer retrieval. Search quality
depends on those derived indexes staying complete, so freshness is an explicit
operational contract:
- Every committed record write should schedule or perform a best-effort refresh for both FTS and embedding rows.
- A healthy warm index reports
record_count == fts_count,record_count == embedding_count, andmissing_embedding_count == 0for the inspected project scope. - During startup, crash recovery, model changes, or first-time embedding setup, temporary gaps are allowed, but the runtime must make forward progress toward those healthy counts without requiring manual SQL edits.
- Retrieval may continue while indexes catch up, but user-facing diagnostics should describe the index as warming, stale, or degraded instead of implying that search is fully fresh.
- Index repair must rebuild from canonical records rather than treating derived index rows as authoritative.
Practical capacity envelope¶
- Trace size: normal traces should stay under 5 MB; large traces up to 50 MB are supported but should be processed asynchronously.
- Records per project: design for 10k-100k active records/project before users need to archive, split projects, or tune curation.
- Global database: design for hundreds of projects and up to about 1M total records on a developer laptop.
- Queue depth: short bursts of hundreds of pending sessions are acceptable if the runtime can make steady forward progress and expose retry state.
SQLite write-lock assumptions¶
SQLite is the right default while Lerim is local-first, but it is still a single-writer database.
- Keep write transactions short: normalize outside the transaction, then commit records, versions, embeddings, and queue state together.
- Allow many readers, but assume only one active writer for
ingestorcurateat a time. - Use bounded busy timeouts and retry with backoff instead of blocking the CLI indefinitely.
- Do not place
~/.lerim/context.sqlite3on Dropbox, iCloud Drive, NFS, or other syncing/network filesystems. - If a manual command and background runtime contend, the user-facing command should fail clearly or wait briefly with a visible message.
Cloud scale trigger points¶
Move beyond local SQLite only when the product needs behavior that local-first SQLite should not pretend to provide:
- multiple machines writing to the same project concurrently
- shared team memory with permissions and audit trails
- p95
answerretrieval above 2 s after index tuning at 100k+ active records/project - sustained ingest backlog that cannot drain overnight on a normal developer laptop
- trace volumes regularly above 50 MB/session or thousands of sessions/day
- need for centralized backup, observability, billing, or SRE ownership
Until those triggers appear, prefer improving local indexing, pruning, curation, and queue visibility over introducing a server.