LLM Observability

Also known as: LLM observability, model observability

Tracing what a model actually did in production — prompts, retrievals, latencies, and costs — so failures are debuggable, not mysterious.

LLM Observability

Observability for LLM systems means capturing the full trace of each request — the prompt, what was retrieved, the response, the tokens, the latency, the cost — so when something goes wrong you can see where. I wired this in with Phoenix at North AI.

A model is a non-deterministic black box, which makes the surrounding instrumentation more important, not less. Without traces, “the AI gave a weird answer” is unfixable folklore; with them, it’s a specific retrieval that pulled the wrong passage.