Tracing¶
Lerim uses MLflow for agent observability.
Tracing is opt-in and controlled by [observability].mlflow_enabled in
~/.lerim/config.toml. The LERIM_MLFLOW environment variable can override it
for one-off runs.
What gets traced¶
When tracing is enabled, MLflow records:
- Runtime operations -- ingest, curate, answer, and Context Brief emit root
lerim.<operation>traces tagged with run id, session id, project id, and workspace artifact paths. - Named agent spans -- trace ingestion, context curation, context answering,
and context-brief compilation emit
lerim.agent.<name>spans when they run inside a traced runtime operation. - Agent events -- ingest, curate, and Context Brief still write detailed
local
agent_trace.jsonfiles under the run workspace. These are the most detailed per-phase record today; MLflow currently shows the operation and named-agent boundaries rather than every pipeline step as its own nested span. - Retrieval actions -- answerer retrieval planning and read-only context queries are recorded in the local debug trace when verbose answer output is enabled.
- agent_trace.json -- each ingest/curate run also writes local agent events under the run workspace. Answer debug output writes model and retrieval events.
- Lerim run id correlation -- each ingest/curate trace is tagged with
lerim.run_id, and MLflowclient_request_idis set to the same value used in the local runmanifest.jsonand workspace folder name.
Setup¶
MLflow ships as a Lerim dependency, so pip install lerim already includes the
client library. A common local setup is a small Docker Compose MLflow service
outside the Lerim repo, for example under ~/codes/personal/local-mlflow.
No account needed
The shared MLflow server is local. No authentication, external account, or API key is required. Each project uses the same tracking URL with a different experiment name.
Enable tracing¶
Enable tracing for the long-running Lerim server process. Setting it only on a
client command like lerim ingest will not enable tracing for a server that is
already running.
MLflow has two separate roles in Lerim:
- Lerim server writes traces. This happens during
lerim serveor the Docker service started bylerim up, when tracing is enabled in config. - Shared MLflow server stores and shows traces. It must be running when
LERIM_MLFLOW_REQUIRED=1; otherwise Lerim fails early instead of silently losing observability.
Add this to ~/.lerim/config.toml:
Restart the service after changing the file:
Viewing traces¶
Start the shared MLflow server:
Then navigate to http://127.0.0.1:5050.
The server is both the trace writer target and the UI. If it is stopped and
LERIM_MLFLOW_REQUIRED=1, Lerim refuses to start traced work.
In the UI, look for:
- Experiments -- select the
lerimexperiment. - Traces -- the primary view for Lerim operation and agent spans. Expand a
trace to see named spans such as
lerim.agent.trace_ingestion,lerim.agent.context_curator,lerim.agent.context_answerer, orlerim.agent.context_brief_compiler. - Run id -- match a local run folder to MLflow by searching for the
manifest.jsonrun_idvalue. It is also stored asclient_request_idand thelerim.run_idtag. - Model labels and inputs -- agent spans include model/scope inputs where the runtime has them.
- Local graph detail -- for per-node graph events, open the matching
run folder's
agent_trace.json.
Classic MLflow Runs may be empty for agent traces. That does not mean tracing is broken; check the Traces view or use the API check below.
Filtering
Use the MLflow search bar to filter traces by experiment name, tags, status, or text. This is useful when you have many ingest/curate cycles logged.
Verify Logging¶
You do not need the UI to confirm that the shared server is reachable:
curl -s http://127.0.0.1:5050/api/2.0/mlflow/experiments/search \
-H 'Content-Type: application/json' \
-d '{"max_results": 20}'
You should see the lerim experiment after a traced run creates it.
Local Run Artifacts¶
Each ingest or curate execution writes a local artifact bundle under:
Important files:
manifest.json-- run id, operation, project, session id, artifact paths, and status.mlflow_client_request_idmatches the MLflow trace request id.events.jsonl-- compact started/succeeded/failed events for that run.agent_trace.json-- serialized pipeline, model, or retrieval events when available.agent.log-- short human-readable agent summary on success.error.json-- structured error details on failure.
Notes¶
- Lerim reads
MLFLOW_TRACKING_URI,LERIM_MLFLOW_EXPERIMENT, andLERIM_MLFLOW_REQUIREDfrom.env/ shell. [observability].mlflow_enabled = trueis the persistent switch for the server process.LERIM_MLFLOW=trueis still supported as an environment override.- If
MLFLOW_TRACKING_URIis missing and strict mode is off, Lerim still has a legacy SQLite fallback under~/.lerim/observability/mlflow.db. - Hidden provider chain-of-thought is not available to Lerim or MLflow unless a provider exposes it. Visible prompts, model responses, tool calls, tool results, timing, token metadata, and spans are the expected trace payload.
Troubleshooting¶
If Lerim says MLflow is required but unavailable, start the shared server:
For the legacy SQLite fallback only: if mlflow ui reports an out-of-date or
unknown database revision, start Lerim once with tracing enabled. Lerim checks
the MLflow schema at startup and will upgrade compatible databases. If MLflow
cannot migrate the recorded revision, Lerim backs up the incompatible DB under
~/.lerim/observability/backups/ and creates a fresh trace DB.