Score Evaluation

Track performance metrics for experimental prompts to measure impact and determine winning variants

What is Score Evaluation?

Score Evaluation allows you to push performance metrics after using experimental prompts. This data helps you:

  • Compare Variants: Understand which prompt performs better
  • Make Data-Driven Decisions: Rely on metrics instead of intuition
  • Track Multiple Metrics: Ratings, conversions, response times, or custom values
  • Analyze Over Time: See how variants evolve during the experiment

Key Concept: Scores can only be pushed for prompts fetched via getExperimentPrompt() . Regular prompts cannot be scored.

Score Types

LaikaTest supports three score types to cover the most common evaluation scenarios:

Integer (`int`)
Numeric metrics (ratings, counts, durations).
TypeScript
{ name: 'rating', type: 'int', value: 5 }{ name: 'response_time_ms', type: 'int', value: 342 }
Boolean (`bool`)
Yes/No outcomes (success, helpfulness).
TypeScript
{ name: 'helpful', type: 'bool', value: true }{ name: 'task_completed', type: 'bool', value: false }
String (`string`)
Free-form feedback or categories.
TypeScript
{ name: 'user_feedback', type: 'string', value: 'Excellent response!' }{ name: 'sentiment', type: 'string', value: 'positive' }

Pushing Scores

After using an experimental prompt, push performance metrics using the SDK:

Basic Usage

TypeScript
import { LaikaTest } from '@laikatest/js-client';const client = new LaikaTest(process.env.LAIKATEST_API_KEY);// Fetch experimental promptconst prompt = await client.getExperimentPrompt('my-experiment', {  userId: 'user-123'});// Use the prompt in your appconst response = await callYourLLM(prompt.getContent());// Push scoresconst result = await prompt.pushScore(  [    { name: 'rating', type: 'int', value: 5 },    { name: 'helpful', type: 'bool', value: true },    { name: 'feedback', type: 'string', value: 'Great response!' }  ],  {    sessionId: 'session-abc-123',    userId: 'user-123'  });

Required Parameters

  • scores: Array of score objects with name, type, and value.
  • sessionId or userId:At least one is required. Both are recommended for richer analytics.

Tip: Provide both identifiers when possible to improve tracking accuracy.

How Score Tracking Works

1
Assignment: User receives a prompt via getExperimentPrompt().
2
Metadata: The prompt carries experiment & variant metadata internally.
3
Usage: Your application uses the prompt output.
4
Score Submission: You call pushScore().
5
Dashboard Analytics: Scores appear in experiment reporting.

Common Use Cases

User Satisfaction Tracking

TypeScript
await prompt.pushScore(  [    { name: 'user_rating', type: 'int', value: 4 },    { name: 'satisfied', type: 'bool', value: true },    { name: 'comment', type: 'string', value: 'Very helpful!' }  ],  { userId: 'user-123', sessionId: 'session-456' });

Conversion Tracking

TypeScript
await prompt.pushScore(  [    { name: 'clicked_cta', type: 'bool', value: true },    { name: 'purchased', type: 'bool', value: true },    { name: 'cart_value', type: 'int', value: 149 }  ],  { userId: 'user-123' });

Performance Metrics

TypeScript
await prompt.pushScore(  [    { name: 'response_time_ms', type: 'int', value: 342 },    { name: 'tokens_used', type: 'int', value: 1250 },    { name: 'task_completed', type: 'bool', value: true }  ],  { sessionId: 'session-789' });

Error Handling

Score submission failures do NOT throw exceptions. Instead, the method resolves safely:

TypeScript
const result = await prompt.pushScore(scores, options);if (result.success) {  console.log('Score submitted successfully!');} else {  console.warn('Score submission failed:', result.error);}

Important:Validation errors (e.g., missing name/type/value or wrong type) will throw immediately. Network issues will not crash your application.

Best Practices

  • Track Every Interaction: Push scores for all prompt uses.
  • Use Descriptive Names: Avoid generic metric names.
  • Mix Metric Types: Track both success metrics and diagnostics.
  • Provide Both Identifiers: Send both sessionId and userId when possible.
  • Don’t Block UI: You can optionally call pushScore() without await.
  • Validate Values: Ensure the value type matches the declared score type.

Next Steps