Learn how to set up LLM tracing for better insights into AI requests and performance. Easy steps and code examples included.
Naman Arora
January 24, 2026

# Practical LLM Tracing Setup Guide
[ANECDOTE_PLACEHOLDER]
LLM tracing is the practice of recording the lifecycle of requests to large language models. I will use the phrase LLM tracing a lot. I want to explain what it is. I will show how to set it up with code you can run today. Many posts explain why tracing matters, but few show how to wire it end to end. This guide fills that gap. I will show OpenTelemetry LLM setup, LangChain tracing integration, and a reusable decorator you can drop into your code.
## 1. Intro and Goals
What is LLM tracing? Think of it like adding CCTV to a factory line. Each machine is a step that changes the product. Tracing shows which machine slows the line. For LLM systems, the machines are prompt building, tokenization, network calls, model inference, and post-processing. Tracing shows latency, errors, and metadata like model name and cost.
Why trace LLM calls in production? Because AI requests are noisy. You may change a prompt and see worse results, but you do not know why. Tracing gives hard evidence. It shows which step is slow, how many tokens were used, what the request ID was, and which prompt version ran. This helps debug cost spikes, silent regressions, and intermittent failures.
**Tutorial Goals and Outcomes**
- Show a runnable OpenTelemetry setup for Python.
- Instrument low-level HTTP calls and LangChain.
- Provide a decorator for quick adoption.
- Capture tokens, cost, model name, and request ID as span attributes.
- Run a local demo with Jaeger or OTLP collector.
**Scope**
- Python focused, using OpenTelemetry and LangChain.
- Hands-on code for tracing and exporting to OTLP or Jaeger.
- No heavy APM vendor lock-in.
**Demo Repos**
- Demo page: /demo
- LLM Observability & Tracing pillar page: /llm-observability
**What I Will Cover Next**
- Prerequisites, install, and environment
- End-to-end demo with LangChain and OpenAI
- Troubleshooting and privacy notes
## 2. Prerequisites
Setting up prerequisites is like laying out ingredients before cooking. If you miss an ingredient, you stop in the middle of the recipe.
**What Do I Need to Start Tracing LLM Calls?**
- Python 3.9 or newer
- virtualenv or venv
- Access to LLM provider API keys, for example, OpenAI API key
- A tracing backend like Jaeger, or an OTLP collector such as otelcol
**Which Packages Support OpenTelemetry for Python?**
- opentelemetry-api
- opentelemetry-sdk
- opentelemetry-exporter-otlp
- opentelemetry-instrumentation-requests
- opentelemetry-exporter-jaeger or jaeger-client
- langchain
- openai
**requirements.txt**
opentelemetry-api>=1.19.0
opentelemetry-sdk>=1.19.0
opentelemetry-exporter-otlp>=1.19.0
opentelemetry-instrumentation-requests>=0.35b0
opentelemetry-exporter-jaeger>=1.19.0
langchain>=0.0.200
openai>=0.27.0
requests>=2.28.0
**Example Environment Variables**
OPENAIAPIKEY=sk-REPLACEWITHKEY
OTELEXPORTEROTLP_ENDPOINT=http://localhost:4317
OTELSERVICENAME=llm-service
OTELRESOURCEATTRIBUTES=deployment.environment=staging,team=ai-platform
**Optional Tracing Backends to Try**
- Jaeger for local testing
- OTLP collector like otelcol for forwarding
- Tempo for trace storage
## 3. Expected Outcomes Before You Start
A finished trace is like a CCTV playback showing each machine and the time spent at each one.
**What Will Tracing Show Me for LLM Workflows?**
- A span for prompt building
- A span for the HTTP request to the model
- Token counts and estimated cost as attributes
- Errors and exceptions as trace events
- Request ID returned by provider
**Example Trace Summary JSON**
{
"trace_id": "abc123",
"spans": [
{
"name": "prompt_build",
"duration_ms": 45,
"attributes": {
"prompt.template": "summarize_v1",
"prompt.tokens": 78
}
},
{
"name": "llm_request",
"duration_ms": 1200,
"attributes": {
"model": "gpt-4",
"cost_usd": 0.0048,
"request_id": "req-xyz"
}
},
{
"name": "post_processing",
"duration_ms": 12,
"attributes": {
"output.snippet": "In summary, ..."
}
}
]
}
**How Does Tracing Help Debug LLM Issues?**
- It tells you which step is slow. You can fix that step.
- It shows token usage per request, so you can spot cost leaks.
- It ties output to the exact prompt version, so you can run A/B tests.
Link for deep reading: LLM Observability & Tracing pillar page /llm-observability
## 4. Step 1: Install and Configure OpenTelemetry
Installing OpenTelemetry is like wiring a new camera to a command center. You set where the feed goes and tag the feed with a camera ID.
**How Do I Set Up OpenTelemetry for Python?**
- Install opentelemetry packages from requirements.txt
- Create a TracerProvider and export spans to OTLP or Jaeger
- Set resource attributes like service.name
**Python Code to Initialize Tracer Provider**
import os
from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import BatchSpanProcessor
resource = Resource.create({
"service.name": os.getenv("OTELSERVICENAME", "llm-service"),
"deployment.environment": os.getenv("DEPLOYMENT_ENV", "dev")
})
provider = TracerProvider(resource=resource)
trace.settracerprovider(provider)
otlpendpoint = os.getenv("OTELEXPORTEROTLPENDPOINT", "http://localhost:4317")
exporter = OTLPSpanExporter(endpoint=otlp_endpoint)
provider.addspanprocessor(BatchSpanProcessor(exporter))
tracer = trace.gettracer(name_)
**How Do I Export Traces to Jaeger or OTLP?**
- For Jaeger, use opentelemetry-exporter-jaeger and a Jaeger exporter.
- For OTLP, use OTLPSpanExporter as above.
- Choose BatchSpanProcessor for production to reduce overhead.
## 5. Step 2: Instrument Low-Level HTTP Clients
Instrumenting HTTP is like tagging every parcel with a tracking number. Each request gets metadata.
**Can I Trace OpenAI API Calls?**
Yes. OpenAI calls are HTTP requests. Instrument the HTTP client used by the OpenAI library or add headers manually.
**How to Propagate Trace Context to External LLM Providers?**
- Inject trace context into outgoing headers using TraceContextTextMapPropagator.
- Many tracing backends do not automatically connect server to provider spans because the provider is external. But the traceparent header helps when providers support it.
**Example Code for Requests Instrumentation**
import requests
from opentelemetry.instrumentation.requests import RequestsInstrumentor
RequestsInstrumentor().instrument()
from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator
carrier = {}
TraceContextTextMapPropagator().inject(carrier)
response = requests.post(
"https://api.openai.com/v1/completions",
headers={**carrier, "Authorization": f"Bearer {os.getenv('OPENAIAPIKEY')}"},
json={"model": "gpt-4", "prompt": "Hello"}
)
## 6. Step 3: Instrument LangChain, Using Built-In Hooks or Callbacks
Instrumenting LangChain is like marking each station on an assembly line. You tag the start and end of each station.
**How to Add Tracing to LangChain?**
- LangChain has callback handlers and tracing hooks.
- You can write a BaseCallbackHandler to start and end spans around chain steps.
**Does LangChain Support OpenTelemetry?**
- LangChain does not ship a full OpenTelemetry tracer out of the box. But callbacks let you add spans easily.
**Example LangChain Callback Handler**
from langchain.callbacks.base import BaseCallbackHandler
from opentelemetry import trace
tracer = trace.get_tracer("langchain-otel")
class OTELCallback(BaseCallbackHandler):
def onllmstart(self, serialized, prompts, **kwargs):
self.span = tracer.startspan("llmcall", attributes={
"llm.model": serialized.get("model", "unknown"),
"prompt.count": len(prompts)
})
self.span.enter()
def onllmend(self, response, **kwargs):
if hasattr(self, "span"):
self.span.setattribute("llm.responselength", len(str(response)))
self.span.exit(None, None, None)
def onllmerror(self, error, **kwargs):
if hasattr(self, "span"):
self.span.record_exception(error)
self.span.set_status(Status(StatusCode.ERROR, str(error)))
self.span.exit(None, None, None)
Link for more context: LLM Observability & Tracing pillar page /llm-observability
## 7. Step 4: Build a Reusable Python Decorator for LLM Calls
A decorator is like a sleeve that wraps each operation with monitoring sensors. You slide it on, and each call is measured.
**How to Instrument Custom LLM Wrappers?**
- Write a decorator that starts a span, records attributes, and handles errors.
**Can I Add Tracing Without Changing Library Code?**
- Yes. Wrap calls in a decorator or monkey patch a single wrapper function.
**Full Decorator Example**
import time
from functools import wraps
import os
from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode
tracer = trace.gettracer(name_)
def llm_traced(func):
@wraps(func)
def wrapper(*args, **kwargs):
model = kwargs.get("model", "unknown")
with tracer.startascurrent_span("llm.request") as span:
span.set_attribute("component", "llm")
span.set_attribute("model", model)
span.setattribute("service.version", os.getenv("SERVICEVERSION", "dev"))
start = time.time()
try:
result = func(*args, **kwargs)
latency_ms = int((time.time() - start) * 1000)
span.setattribute("latencyms", latency_ms)
snippet = str(result)[:200]
span.setattribute("responsesnippet", snippet)
return result
except Exception as e:
span.record_exception(e)
span.set_status(Status(StatusCode.ERROR, str(e)))
raise
return wrapper
Link to demo: /demo
## 8. Step 5: Capture Semantic Metadata and Costs
Adding metadata is like labeling each CCTV clip with who worked on the machine and what product passed by.
**What LLM Metadata Should I Record in Traces?**
- model name
- prompt token count
- completion token count
- estimated cost in USD
- provider request ID
- prompt version ID
**How to Add Token Usage and Cost to Traces?**
- Use a tokenizer to count tokens. For OpenAI, you can estimate cost with model pricing.
- Record token counts and cost as span attributes.
**Example Token Counting and Span Attributes**
import tiktoken
from opentelemetry import trace
tracer = trace.gettracer(name_)
def recordpromptmetrics(span, prompt, model):
enc = tiktoken.encodingformodel(model)
prompt_tokens = len(enc.encode(prompt))
span.setattribute("tokens.prompt", prompttokens)
costpertoken = 0.000002 # example
estimatedcost = prompttokens * costpertoken
span.setattribute("estimatedcostusd", estimatedcost)
Link for production concepts: LLMOps & Production AI pillar page /llmops-production-ai
## 9. Step 6: End-to-End Demo: OpenAI, LangChain, and OTLP
Show the whole factory line working while the monitors record each step. You watch the feed and see where the slowdown is.
**Full Runnable Script (Compact)**
import os
import time
import openai
import requests
from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.instrumentation.requests import RequestsInstrumentor
from functools import wraps
resource = Resource.create({"service.name": os.getenv("OTELSERVICENAME", "llm-service")})
provider = TracerProvider(resource=resource)
trace.settracerprovider(provider)
exporter = OTLPSpanExporter(endpoint=os.getenv("OTELEXPORTEROTLP_ENDPOINT", "http://localhost:4317"))
provider.addspanprocessor(BatchSpanProcessor(exporter, maxqueuesize=2048, scheduledelaymillis=5000))
tracer = trace.gettracer(name_)
RequestsInstrumentor().instrument()
openai.apikey = os.getenv("OPENAIAPI_KEY")
def llm_traced(func):
@wraps(func)
def wrapper(*args, **kwargs):
model = kwargs.get("model", "gpt-4")
with tracer.startascurrent_span("llm.request") as span:
span.set_attribute("model", model)
start = time.time()
try:
result = func(*args, **kwargs)
span.setattribute("latencyms", int((time.time()-start)*1000))
span.setattribute("responsesnippet", str(result)[:200])
return result
except Exception as e:
span.record_exception(e)
raise
return wrapper
@llm_traced
def call_openai(prompt, model="gpt-4"):
resp = openai.ChatCompletion.create(model=model, messages=[{"role":"user","content":prompt}])
return resp
def run():
prompt = "Write a short poem about coffee and debugging."
resp = call_openai(prompt, model="gpt-4")
print("Got response ID:", resp["id"])
if name == "main":
run()
**Start Jaeger Locally**
docker run -d , name jaeger -e COLLECTORZIPKINHOST_PORT=:9411 -p 16686:16686 -p 14250:14250 jaegertracing/all-in-one:1.41
Then run your script. Visit Jaeger UI at http://localhost:16686 to view traces.
**Can I Trace a LangChain App End to End?**
Yes. Add the OTELCallback to the chain and use the OTLP exporter. You will see chain spans and nested LLM spans.
**How to View Traces Locally with Jaeger?**
Open Jaeger UI and search by service name. You will see trace timelines and span attributes.
Link: /demo
## 10. Step 7: Validate Traces and Common Checks
Validation is like checking every camera feed is live and has timestamps.
**How Do I Know My Traces Are Working?**
- Check the collector logs or health endpoint.
- Open Jaeger UI and search by service name.
- Verify spans include expected attributes like model and tokens.
**What to Test**
1. Error paths, confirm exceptions are recorded.
2. Retries, confirm each retry has its own span.
3. Timeouts, confirm spans end with proper status.
4. Parallel chains, confirm context propagation works across threads.
**Common Mistakes and Fixes**
- No spans seen: check exporter endpoint and environment variables.
- Missing attributes: ensure spans are created before attributes are set.
- Context lost across threads: use contextvars or proper propagation libraries.
**Quick Curl Check for Jaeger (Example)**
curl http://localhost:16686/api/traces?service=llm-service
## 11. Step 8: Troubleshooting and Performance Considerations
Troubleshooting tracing is like tuning camera frame rate so you do not overload the network.
**Does Tracing Add Latency to LLM Calls?**
- Minimal. Starting spans is cheap. But exporting can add overhead if sync. Use BatchSpanProcessor and async exporters.
**How to Sample Traces for High Volume?**
- Use probabilistic sampling in TracerProvider.
- Sample only error traces or a subset of requests.
**BatchSpanProcessor Example Config**
from opentelemetry.sdk.trace.export import BatchSpanProcessor
provider.addspanprocessor(BatchSpanProcessor(exporter, maxqueuesize=2048, scheduledelaymillis=5000))
**Probabilistic Sampler Example**
from opentelemetry.sdk.trace.sampling import TraceIdRatioBased
provider = TracerProvider(sampler=TraceIdRatioBased(0.1), resource=resource)
Avoid sending full prompts to traces. Truncate or hash them.
## 12. Step 9: Security and Privacy Notes
Treat traces like CCTV footage that may contain sensitive faces, mask them.
**Is It Safe to Store Prompts in Traces?**
- Store only what you need. Prompts can contain PII or secrets. Avoid raw prompts in production traces unless you have clear policies.
**How to Redact Sensitive Data in Traces?**
- Hash or redact prompt content before setting span attributes.
**Example Hashing Snippet**
import hashlib
from opentelemetry import trace
def attachprompthash(span, prompt):
prompt_hash = hashlib.sha256(prompt.encode()).hexdigest()
span.setattribute("prompthash", prompt_hash)
span.setattribute("promptredacted", True)
Also, follow provider terms of service when storing request content.
Link: LLMOps & Production AI pillar page /llmops-production-ai
## 13. Appendix: Useful Snippets and Config
An appendix is like a toolbox with ready-made parts.
**requirements.txt (Repeat)**
opentelemetry-api
opentelemetry-sdk
opentelemetry-exporter-otlp
opentelemetry-instrumentation-requests
opentelemetry-exporter-jaeger
langchain
openai
requests
tiktoken
**docker-compose.yml for otelcol + Jaeger**
version: '3'
services:
jaeger:
image: jaegertracing/all-in-one:1.41
ports:
"16686:16686"
"14250:14250"
otelcol:
image: otel/opentelemetry-collector-contrib:0.78.0
command: [", config=/etc/otel-collector-config.yaml"]
volumes:
./otel-config.yaml:/etc/otel-collector-config.yaml
ports:
"4317:4317"
**Sample Environment File**
OPENAIAPIKEY=sk-...
OTELEXPORTEROTLP_ENDPOINT=http://localhost:4317
OTELSERVICENAME=llm-service
**Checklist for Production Rollout**
1. Enable BatchSpanProcessor
2. Set service.name and resource attributes
3. Configure sampling
4. Redact or hash prompts
5. Monitor exporter health and lag
Link to demo: /demo
## Conclusion with LaikaTest
I showed how to go from zero to end-to-end LLM tracing. We installed OpenTelemetry, instrumented HTTP calls, added LangChain callbacks, and wrote a decorator for quick wins. We also captured tokens and cost as span attributes. Many posts explain why you need tracing. This guide shows the wiring and working code. You can run the demo locally, inspect traces in Jaeger, and expand the approach to production.
If you want a practical next step, run a quick LaikaTest after deployment. LaikaTest helps teams run prompt A/B tests on real traffic and link those results to traces. That means you can see which prompt version was used, what the model returned, what the cost was, and how latency changed. It is helpful when you need evidence that a prompt change actually improved outcomes. LaikaTest works well with the tracing setup in this guide. It helps you close the loop from experiment to trace to evaluation.
**Further Reading**
- LLM Observability & Tracing pillar page: /llm-observability
- LLMOps & Production AI pillar page: /llmops-production-ai
- Demo page: /demo
Run a quick LaikaTest, check your traces in Jaeger, and you will stop guessing and start knowing which prompt change worked.