# Practical LLM Tracing Setup Guide [ANECDOTE_PLACEHOLDER] LLM tracing is the practice of recording the lifecycle of requests to large language models. I will use the phrase LLM tracing a lot. I want to explain what it is. I will show how to set it up with code you can run today. Many posts explain why tracing matters, but few show how to wire it end to end. This guide fills that gap. I will show OpenTelemetry LLM setup, LangChain tracing integration, and a reusable decorator you can drop into your code. ## 1. Intro and Goals What is LLM tracing? Think of it like adding CCTV to a factory line. Each machine is a step that changes the product. Tracing shows which machine slows the line. For LLM systems, the machines are prompt building, tokenization, network calls, model inference, and post-processing. Tracing shows latency, errors, and metadata like model name and cost. Why trace LLM calls in production? Because AI requests are noisy. You may change a prompt and see worse results, but you do not know why. Tracing gives hard evidence. It shows which step is slow, how many tokens were used, what the request ID was, and which prompt version ran. This helps debug cost spikes, silent regressions, and intermittent failures. **Tutorial Goals and Outcomes** - Show a runnable OpenTelemetry setup for Python. - Instrument low-level HTTP calls and LangChain. - Provide a decorator for quick adoption. - Capture tokens, cost, model name, and request ID as span attributes. - Run a local demo with Jaeger or OTLP collector. **Scope** - Python focused, using OpenTelemetry and LangChain. - Hands-on code for tracing and exporting to OTLP or Jaeger. - No heavy APM vendor lock-in. **Demo Repos** - Demo page: /demo - LLM Observability & Tracing pillar page: /llm-observability **What I Will Cover Next** - Prerequisites, install, and environment - End-to-end demo with LangChain and OpenAI - Troubleshooting and privacy notes ## 2. Prerequisites Setting up prerequisites is like laying out ingredients before cooking. If you miss an ingredient, you stop in the middle of the recipe. **What Do I Need to Start Tracing LLM Calls?** - Python 3.9 or newer - virtualenv or venv - Access to LLM provider API keys, for example, OpenAI API key - A tracing backend like Jaeger, or an OTLP collector such as otelcol **Which Packages Support OpenTelemetry for Python?** - opentelemetry-api - opentelemetry-sdk - opentelemetry-exporter-otlp - opentelemetry-instrumentation-requests - opentelemetry-exporter-jaeger or jaeger-client - langchain - openai **requirements.txt**

opentelemetry-api>=1.19.0

opentelemetry-sdk>=1.19.0

opentelemetry-exporter-otlp>=1.19.0

opentelemetry-instrumentation-requests>=0.35b0

opentelemetry-exporter-jaeger>=1.19.0

langchain>=0.0.200

openai>=0.27.0

requests>=2.28.0

**Example Environment Variables**

OPENAIAPIKEY=sk-REPLACEWITHKEY

OTELEXPORTEROTLP_ENDPOINT=http://localhost:4317

OTELSERVICENAME=llm-service

OTELRESOURCEATTRIBUTES=deployment.environment=staging,team=ai-platform

**Optional Tracing Backends to Try** - Jaeger for local testing - OTLP collector like otelcol for forwarding - Tempo for trace storage ## 3. Expected Outcomes Before You Start A finished trace is like a CCTV playback showing each machine and the time spent at each one. **What Will Tracing Show Me for LLM Workflows?** - A span for prompt building - A span for the HTTP request to the model - Token counts and estimated cost as attributes - Errors and exceptions as trace events - Request ID returned by provider **Example Trace Summary JSON**

{

"trace_id": "abc123",

"spans": [

{

"name": "prompt_build",

"duration_ms": 45,

"attributes": {

"prompt.template": "summarize_v1",

"prompt.tokens": 78

}

{

"name": "llm_request",

"duration_ms": 1200,

"attributes": {

"model": "gpt-4",

"cost_usd": 0.0048,

"request_id": "req-xyz"

}

{

"name": "post_processing",

"duration_ms": 12,

"attributes": {

"output.snippet": "In summary, ..."

}

]

}

**How Does Tracing Help Debug LLM Issues?** - It tells you which step is slow. You can fix that step. - It shows token usage per request, so you can spot cost leaks. - It ties output to the exact prompt version, so you can run A/B tests. Link for deep reading: LLM Observability & Tracing pillar page /llm-observability ## 4. Step 1: Install and Configure OpenTelemetry Installing OpenTelemetry is like wiring a new camera to a command center. You set where the feed goes and tag the feed with a camera ID. **How Do I Set Up OpenTelemetry for Python?** - Install opentelemetry packages from requirements.txt - Create a TracerProvider and export spans to OTLP or Jaeger - Set resource attributes like service.name **Python Code to Initialize Tracer Provider**

import os

from opentelemetry import trace

from opentelemetry.sdk.resources import Resource

from opentelemetry.sdk.trace import TracerProvider

from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

from opentelemetry.sdk.trace.export import BatchSpanProcessor

resource = Resource.create({

"service.name": os.getenv("OTELSERVICENAME", "llm-service"),

"deployment.environment": os.getenv("DEPLOYMENT_ENV", "dev")

})

provider = TracerProvider(resource=resource)

trace.settracerprovider(provider)

otlpendpoint = os.getenv("OTELEXPORTEROTLPENDPOINT", "http://localhost:4317")

exporter = OTLPSpanExporter(endpoint=otlp_endpoint)

provider.addspanprocessor(BatchSpanProcessor(exporter))

tracer = trace.gettracer(name_)

**How Do I Export Traces to Jaeger or OTLP?** - For Jaeger, use opentelemetry-exporter-jaeger and a Jaeger exporter. - For OTLP, use OTLPSpanExporter as above. - Choose BatchSpanProcessor for production to reduce overhead. ## 5. Step 2: Instrument Low-Level HTTP Clients Instrumenting HTTP is like tagging every parcel with a tracking number. Each request gets metadata. **Can I Trace OpenAI API Calls?** Yes. OpenAI calls are HTTP requests. Instrument the HTTP client used by the OpenAI library or add headers manually. **How to Propagate Trace Context to External LLM Providers?** - Inject trace context into outgoing headers using TraceContextTextMapPropagator. - Many tracing backends do not automatically connect server to provider spans because the provider is external. But the traceparent header helps when providers support it. **Example Code for Requests Instrumentation**

import requests

from opentelemetry.instrumentation.requests import RequestsInstrumentor

RequestsInstrumentor().instrument()

manual propagation example

from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator

carrier = {}

TraceContextTextMapPropagator().inject(carrier)

response = requests.post(

"https://api.openai.com/v1/completions",

headers={**carrier, "Authorization": f"Bearer {os.getenv('OPENAIAPIKEY')}"},

json={"model": "gpt-4", "prompt": "Hello"}

)

## 6. Step 3: Instrument LangChain, Using Built-In Hooks or Callbacks Instrumenting LangChain is like marking each station on an assembly line. You tag the start and end of each station. **How to Add Tracing to LangChain?** - LangChain has callback handlers and tracing hooks. - You can write a BaseCallbackHandler to start and end spans around chain steps. **Does LangChain Support OpenTelemetry?** - LangChain does not ship a full OpenTelemetry tracer out of the box. But callbacks let you add spans easily. **Example LangChain Callback Handler**

from langchain.callbacks.base import BaseCallbackHandler

from opentelemetry import trace

tracer = trace.get_tracer("langchain-otel")

class OTELCallback(BaseCallbackHandler):

def onllmstart(self, serialized, prompts, **kwargs):

self.span = tracer.startspan("llmcall", attributes={

"llm.model": serialized.get("model", "unknown"),

"prompt.count": len(prompts)

})

self.span.enter()

def onllmend(self, response, **kwargs):

if hasattr(self, "span"):

self.span.setattribute("llm.responselength", len(str(response)))

self.span.exit(None, None, None)

def onllmerror(self, error, **kwargs):

if hasattr(self, "span"):

self.span.record_exception(error)

self.span.set_status(Status(StatusCode.ERROR, str(error)))

self.span.exit(None, None, None)

attach to chain

yourchain.addcallback_handler(OTELCallback())

Link for more context: LLM Observability & Tracing pillar page /llm-observability ## 7. Step 4: Build a Reusable Python Decorator for LLM Calls A decorator is like a sleeve that wraps each operation with monitoring sensors. You slide it on, and each call is measured. **How to Instrument Custom LLM Wrappers?** - Write a decorator that starts a span, records attributes, and handles errors. **Can I Add Tracing Without Changing Library Code?** - Yes. Wrap calls in a decorator or monkey patch a single wrapper function. **Full Decorator Example**

import time

from functools import wraps

import os

from opentelemetry import trace

from opentelemetry.trace import Status, StatusCode

tracer = trace.gettracer(name_)

def llm_traced(func):

@wraps(func)

def wrapper(*args, **kwargs):

model = kwargs.get("model", "unknown")

with tracer.startascurrent_span("llm.request") as span:

span.set_attribute("component", "llm")

span.set_attribute("model", model)

span.setattribute("service.version", os.getenv("SERVICEVERSION", "dev"))

start = time.time()

try:

result = func(*args, **kwargs)

latency_ms = int((time.time() - start) * 1000)

span.setattribute("latencyms", latency_ms)

snippet = str(result)[:200]

span.setattribute("responsesnippet", snippet)

return result

except Exception as e:

span.record_exception(e)

span.set_status(Status(StatusCode.ERROR, str(e)))

raise

return wrapper

usage

@llm_traced

def call_llm(prompt, model="gpt-4"):

...

Link to demo: /demo ## 8. Step 5: Capture Semantic Metadata and Costs Adding metadata is like labeling each CCTV clip with who worked on the machine and what product passed by. **What LLM Metadata Should I Record in Traces?** - model name - prompt token count - completion token count - estimated cost in USD - provider request ID - prompt version ID **How to Add Token Usage and Cost to Traces?** - Use a tokenizer to count tokens. For OpenAI, you can estimate cost with model pricing. - Record token counts and cost as span attributes. **Example Token Counting and Span Attributes**

import tiktoken

from opentelemetry import trace

tracer = trace.gettracer(name_)

def recordpromptmetrics(span, prompt, model):

enc = tiktoken.encodingformodel(model)

prompt_tokens = len(enc.encode(prompt))

span.setattribute("tokens.prompt", prompttokens)

simple cost estimate

costpertoken = 0.000002 # example

estimatedcost = prompttokens * costpertoken

span.setattribute("estimatedcostusd", estimatedcost)

Link for production concepts: LLMOps & Production AI pillar page /llmops-production-ai ## 9. Step 6: End-to-End Demo: OpenAI, LangChain, and OTLP Show the whole factory line working while the monitors record each step. You watch the feed and see where the slowdown is. **Full Runnable Script (Compact)**

demo_tracing.py

import os

import time

import openai

import requests

from opentelemetry import trace

from opentelemetry.sdk.resources import Resource

from opentelemetry.sdk.trace import TracerProvider

from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

from opentelemetry.sdk.trace.export import BatchSpanProcessor

from opentelemetry.instrumentation.requests import RequestsInstrumentor

from functools import wraps

Tracer init

resource = Resource.create({"service.name": os.getenv("OTELSERVICENAME", "llm-service")})

provider = TracerProvider(resource=resource)

trace.settracerprovider(provider)

exporter = OTLPSpanExporter(endpoint=os.getenv("OTELEXPORTEROTLP_ENDPOINT", "http://localhost:4317"))

provider.addspanprocessor(BatchSpanProcessor(exporter, maxqueuesize=2048, scheduledelaymillis=5000))

tracer = trace.gettracer(name_)

RequestsInstrumentor().instrument()

openai.apikey = os.getenv("OPENAIAPI_KEY")

def llm_traced(func):

@wraps(func)

def wrapper(*args, **kwargs):

model = kwargs.get("model", "gpt-4")

with tracer.startascurrent_span("llm.request") as span:

span.set_attribute("model", model)

start = time.time()

try:

result = func(*args, **kwargs)

span.setattribute("latencyms", int((time.time()-start)*1000))

span.setattribute("responsesnippet", str(result)[:200])

return result

except Exception as e:

span.record_exception(e)

raise

return wrapper

@llm_traced

def call_openai(prompt, model="gpt-4"):

resp = openai.ChatCompletion.create(model=model, messages=[{"role":"user","content":prompt}])

return resp

def run():

prompt = "Write a short poem about coffee and debugging."

resp = call_openai(prompt, model="gpt-4")

print("Got response ID:", resp["id"])

if name == "main":

run()

**Start Jaeger Locally**

docker run -d , name jaeger -e COLLECTORZIPKINHOST_PORT=:9411 -p 16686:16686 -p 14250:14250 jaegertracing/all-in-one:1.41

Then run your script. Visit Jaeger UI at http://localhost:16686 to view traces. **Can I Trace a LangChain App End to End?** Yes. Add the OTELCallback to the chain and use the OTLP exporter. You will see chain spans and nested LLM spans. **How to View Traces Locally with Jaeger?** Open Jaeger UI and search by service name. You will see trace timelines and span attributes. Link: /demo ## 10. Step 7: Validate Traces and Common Checks Validation is like checking every camera feed is live and has timestamps. **How Do I Know My Traces Are Working?** - Check the collector logs or health endpoint. - Open Jaeger UI and search by service name. - Verify spans include expected attributes like model and tokens. **What to Test** 1. Error paths, confirm exceptions are recorded. 2. Retries, confirm each retry has its own span. 3. Timeouts, confirm spans end with proper status. 4. Parallel chains, confirm context propagation works across threads. **Common Mistakes and Fixes** - No spans seen: check exporter endpoint and environment variables. - Missing attributes: ensure spans are created before attributes are set. - Context lost across threads: use contextvars or proper propagation libraries. **Quick Curl Check for Jaeger (Example)**

curl http://localhost:16686/api/traces?service=llm-service

## 11. Step 8: Troubleshooting and Performance Considerations Troubleshooting tracing is like tuning camera frame rate so you do not overload the network. **Does Tracing Add Latency to LLM Calls?** - Minimal. Starting spans is cheap. But exporting can add overhead if sync. Use BatchSpanProcessor and async exporters. **How to Sample Traces for High Volume?** - Use probabilistic sampling in TracerProvider. - Sample only error traces or a subset of requests. **BatchSpanProcessor Example Config**

from opentelemetry.sdk.trace.export import BatchSpanProcessor

provider.addspanprocessor(BatchSpanProcessor(exporter, maxqueuesize=2048, scheduledelaymillis=5000))

**Probabilistic Sampler Example**

from opentelemetry.sdk.trace.sampling import TraceIdRatioBased

provider = TracerProvider(sampler=TraceIdRatioBased(0.1), resource=resource)

Avoid sending full prompts to traces. Truncate or hash them. ## 12. Step 9: Security and Privacy Notes Treat traces like CCTV footage that may contain sensitive faces, mask them. **Is It Safe to Store Prompts in Traces?** - Store only what you need. Prompts can contain PII or secrets. Avoid raw prompts in production traces unless you have clear policies. **How to Redact Sensitive Data in Traces?** - Hash or redact prompt content before setting span attributes. **Example Hashing Snippet**

import hashlib

from opentelemetry import trace

def attachprompthash(span, prompt):

prompt_hash = hashlib.sha256(prompt.encode()).hexdigest()

span.setattribute("prompthash", prompt_hash)

span.setattribute("promptredacted", True)

Also, follow provider terms of service when storing request content. Link: LLMOps & Production AI pillar page /llmops-production-ai ## 13. Appendix: Useful Snippets and Config An appendix is like a toolbox with ready-made parts. **requirements.txt (Repeat)**

opentelemetry-api

opentelemetry-sdk

opentelemetry-exporter-otlp

opentelemetry-instrumentation-requests

opentelemetry-exporter-jaeger

langchain

openai

requests

tiktoken

**docker-compose.yml for otelcol + Jaeger**

version: '3'

services:

jaeger:

image: jaegertracing/all-in-one:1.41

ports:

"16686:16686"
"14250:14250"

otelcol:

image: otel/opentelemetry-collector-contrib:0.78.0

command: [", config=/etc/otel-collector-config.yaml"]

volumes:

./otel-config.yaml:/etc/otel-collector-config.yaml

ports:

"4317:4317"

**Sample Environment File**

OPENAIAPIKEY=sk-...

OTELEXPORTEROTLP_ENDPOINT=http://localhost:4317

OTELSERVICENAME=llm-service

**Checklist for Production Rollout** 1. Enable BatchSpanProcessor 2. Set service.name and resource attributes 3. Configure sampling 4. Redact or hash prompts 5. Monitor exporter health and lag Link to demo: /demo ## Conclusion with LaikaTest I showed how to go from zero to end-to-end LLM tracing. We installed OpenTelemetry, instrumented HTTP calls, added LangChain callbacks, and wrote a decorator for quick wins. We also captured tokens and cost as span attributes. Many posts explain why you need tracing. This guide shows the wiring and working code. You can run the demo locally, inspect traces in Jaeger, and expand the approach to production. If you want a practical next step, run a quick LaikaTest after deployment. LaikaTest helps teams run prompt A/B tests on real traffic and link those results to traces. That means you can see which prompt version was used, what the model returned, what the cost was, and how latency changed. It is helpful when you need evidence that a prompt change actually improved outcomes. LaikaTest works well with the tracing setup in this guide. It helps you close the loop from experiment to trace to evaluation. **Further Reading** - LLM Observability & Tracing pillar page: /llm-observability - LLMOps & Production AI pillar page: /llmops-production-ai - Demo page: /demo Run a quick LaikaTest, check your traces in Jaeger, and you will stop guessing and start knowing which prompt change worked.

LLM Tracing Setup Guide

manual propagation example

attach to chain

yourchain.addcallback_handler(OTELCallback())

usage

@llm_traced

def call_llm(prompt, model="gpt-4"):

...

simple cost estimate

demo_tracing.py

Tracer init

Tags

Share