Capacity and SLOs¶

These targets describe the intended operating envelope for the current open-source runtime on one machine, not a hosted multi-tenant service.

Baseline assumptions¶

Storage is on a local SSD, not a network filesystem.
One background runtime owns routine ingest and curation.
SQLite runs with WAL enabled so readers can continue during short writes.
Embedding model download and first-time warmup are outside steady-state SLOs.
LLM answer generation time is tracked separately from local retrieval time.

Current Measured Baseline¶

The current pre-release benchmark artifact is local hybrid retrieval over LongMemEval-S session windows. It is not an official hosted service SLO, it was generated before the final clean release-candidate rerun, and it does not yet cover 10k-100k record scale.

Corpus size	Measured local retrieval
100 records	p50 8.7 ms, p99 9.7 ms
1,000 records	p50 32.4 ms, p99 54.5 ms

See Retrieval Latency for the benchmark boundary and raw artifact.

SLO targets¶

Area	Target
Current `answer` retrieval latency claim	Publish only measured benchmark numbers. Current measured LongMemEval-S local artifact is sub-second through 1,000 records; do not extrapolate that to 10k-100k records until benchmarked.
Tuned retrieval latency target	p50 under 500 ms, p95 under 2 s for a warm local index up to 100k records/project after retrieval/index optimization.
`answer` end-to-end latency	p95 under 8 s for normal questions when the answer model is available locally or through a responsive API.
Ingest throughput	Drain normal agent-session traces at 5-20 MB/minute, bounded mostly by extraction/model time rather than SQLite writes.
Queue recovery	After a crash or restart, queued ingest work resumes without manual cleanup; duplicate processing must be idempotent.
Index freshness	FTS and embedding indexes are derived from canonical records and should normally have zero missing rows after writes settle.
Curation	Routine compaction, supersession, and index refresh should finish in minutes for an active project, not block reads.

Search index freshness contract¶

The records table is the source of truth. records_fts and record_embeddings are derived indexes used by answer retrieval. Search quality depends on those derived indexes staying complete, so freshness is an explicit operational contract:

Every committed record write should schedule or perform a best-effort refresh for both FTS and embedding rows.
A healthy warm index reports record_count == fts_count, record_count == embedding_count, and missing_embedding_count == 0 for the inspected project scope.
During startup, crash recovery, model changes, or first-time embedding setup, temporary gaps are allowed, but the runtime must make forward progress toward those healthy counts without requiring manual SQL edits.
Retrieval may continue while indexes catch up, but user-facing diagnostics should describe the index as warming, stale, or degraded instead of implying that search is fully fresh.
Index repair must rebuild from canonical records rather than treating derived index rows as authoritative.

Practical capacity envelope¶

Trace size: normal traces should stay under 5 MB; large traces up to 50 MB are supported but should be processed asynchronously.
Records per project: design for 10k-100k active records/project before users need to archive, split projects, or tune curation.
Global database: design for hundreds of projects and up to about 1M total records on a developer laptop.
Queue depth: short bursts of hundreds of pending sessions are acceptable if the runtime can make steady forward progress and expose retry state.

SQLite write-lock assumptions¶

SQLite is the right default while Lerim is local-first, but it is still a single-writer database.

Keep write transactions short: normalize outside the transaction, then commit records, versions, embeddings, and queue state together.
Allow many readers, but assume only one active writer for ingest or curate at a time.
Use bounded busy timeouts and retry with backoff instead of blocking the CLI indefinitely.
Do not place ~/.lerim/context.sqlite3 on Dropbox, iCloud Drive, NFS, or other syncing/network filesystems.
If a manual command and background runtime contend, the user-facing command should fail clearly or wait briefly with a visible message.

Cloud scale trigger points¶

Move beyond local SQLite only when the product needs behavior that local-first SQLite should not pretend to provide:

multiple machines writing to the same project concurrently
shared team memory with permissions and audit trails
p95 answer retrieval above 2 s after index tuning at 100k+ active records/project
sustained ingest backlog that cannot drain overnight on a normal developer laptop
trace volumes regularly above 50 MB/session or thousands of sessions/day
need for centralized backup, observability, billing, or SRE ownership

Until those triggers appear, prefer improving local indexing, pruning, curation, and queue visibility over introducing a server.