Score Evaluation
Track performance metrics for experimental prompts to measure impact and determine winning variants
What is Score Evaluation?
Score Evaluation allows you to push performance metrics after using experimental prompts. This data helps you:
- Compare Variants: Understand which prompt performs better
- Make Data-Driven Decisions: Rely on metrics instead of intuition
- Track Multiple Metrics: Ratings, conversions, response times, or custom values
- Analyze Over Time: See how variants evolve during the experiment
Key Concept: Scores can only be pushed for prompts fetched via getExperimentPrompt() . Regular prompts cannot be scored.
Score Types
LaikaTest supports three score types to cover the most common evaluation scenarios:
{ name: 'rating', type: 'int', value: 5 }{ name: 'response_time_ms', type: 'int', value: 342 }{ name: 'helpful', type: 'bool', value: true }{ name: 'task_completed', type: 'bool', value: false }{ name: 'user_feedback', type: 'string', value: 'Excellent response!' }{ name: 'sentiment', type: 'string', value: 'positive' }Pushing Scores
After using an experimental prompt, push performance metrics using the SDK:
Basic Usage
import { LaikaTest } from '@laikatest/js-client';const client = new LaikaTest(process.env.LAIKATEST_API_KEY);// Fetch experimental promptconst prompt = await client.getExperimentPrompt('my-experiment', { userId: 'user-123'});// Use the prompt in your appconst response = await callYourLLM(prompt.getContent());// Push scoresconst result = await prompt.pushScore( [ { name: 'rating', type: 'int', value: 5 }, { name: 'helpful', type: 'bool', value: true }, { name: 'feedback', type: 'string', value: 'Great response!' } ], { sessionId: 'session-abc-123', userId: 'user-123' });Required Parameters
- scores: Array of score objects with name, type, and value.
- sessionId or userId:At least one is required. Both are recommended for richer analytics.
Tip: Provide both identifiers when possible to improve tracking accuracy.
How Score Tracking Works
getExperimentPrompt().pushScore().Common Use Cases
User Satisfaction Tracking
await prompt.pushScore( [ { name: 'user_rating', type: 'int', value: 4 }, { name: 'satisfied', type: 'bool', value: true }, { name: 'comment', type: 'string', value: 'Very helpful!' } ], { userId: 'user-123', sessionId: 'session-456' });Conversion Tracking
await prompt.pushScore( [ { name: 'clicked_cta', type: 'bool', value: true }, { name: 'purchased', type: 'bool', value: true }, { name: 'cart_value', type: 'int', value: 149 } ], { userId: 'user-123' });Performance Metrics
await prompt.pushScore( [ { name: 'response_time_ms', type: 'int', value: 342 }, { name: 'tokens_used', type: 'int', value: 1250 }, { name: 'task_completed', type: 'bool', value: true } ], { sessionId: 'session-789' });Error Handling
Score submission failures do NOT throw exceptions. Instead, the method resolves safely:
const result = await prompt.pushScore(scores, options);if (result.success) { console.log('Score submitted successfully!');} else { console.warn('Score submission failed:', result.error);}Important:Validation errors (e.g., missing name/type/value or wrong type) will throw immediately. Network issues will not crash your application.
Best Practices
- Track Every Interaction: Push scores for all prompt uses.
- Use Descriptive Names: Avoid generic metric names.
- Mix Metric Types: Track both success metrics and diagnostics.
- Provide Both Identifiers: Send both sessionId and userId when possible.
- Don’t Block UI: You can optionally call
pushScore()without await. - Validate Values: Ensure the value type matches the declared score type.
Next Steps
- Observability — View and analyze experiment metrics
- Experiments — Learn how to design A/B tests & workflows