Score Evaluation

Track performance metrics for experimental prompts to measure impact and determine winning variants

What is Score Evaluation?

Score Evaluation allows you to push performance metrics after using experimental prompts. This data helps you:

Compare Variants: Understand which prompt performs better
Make Data-Driven Decisions: Rely on metrics instead of intuition
Track Multiple Metrics: Ratings, conversions, response times, or custom values
Analyze Over Time: See how variants evolve during the experiment

Key Concept: Scores can only be pushed for prompts fetched via getExperimentPrompt() . Regular prompts cannot be scored.

Score Types

LaikaTest supports three score types to cover the most common evaluation scenarios:

Integer (`int`)

Numeric metrics (ratings, counts, durations).

TypeScript

{ name: 'rating', type: 'int', value: 5 }{ name: 'response_time_ms', type: 'int', value: 342 }

Boolean (`bool`)

Yes/No outcomes (success, helpfulness).

TypeScript

{ name: 'helpful', type: 'bool', value: true }{ name: 'task_completed', type: 'bool', value: false }

String (`string`)

Free-form feedback or categories.

TypeScript

{ name: 'user_feedback', type: 'string', value: 'Excellent response!' }{ name: 'sentiment', type: 'string', value: 'positive' }

Pushing Scores

After using an experimental prompt, push performance metrics using the SDK:

Basic Usage

TypeScript

import { LaikaTest } from '@laikatest/js-client';const client = new LaikaTest(process.env.LAIKATEST_API_KEY);// Fetch experimental promptconst prompt = await client.getExperimentPrompt('my-experiment', {  userId: 'user-123'});// Use the prompt in your appconst response = await callYourLLM(prompt.getContent());// Push scoresconst result = await prompt.pushScore(  [    { name: 'rating', type: 'int', value: 5 },    { name: 'helpful', type: 'bool', value: true },    { name: 'feedback', type: 'string', value: 'Great response!' }  ],  {    sessionId: 'session-abc-123',    userId: 'user-123'  });

Required Parameters

scores: Array of score objects with name, type, and value.
sessionId or userId:At least one is required. Both are recommended for richer analytics.

Tip: Provide both identifiers when possible to improve tracking accuracy.

How Score Tracking Works

Assignment: User receives a prompt via getExperimentPrompt().

Metadata: The prompt carries experiment & variant metadata internally.

Usage: Your application uses the prompt output.

Score Submission: You call pushScore().

Dashboard Analytics: Scores appear in experiment reporting.

Common Use Cases

User Satisfaction Tracking

TypeScript

await prompt.pushScore(  [    { name: 'user_rating', type: 'int', value: 4 },    { name: 'satisfied', type: 'bool', value: true },    { name: 'comment', type: 'string', value: 'Very helpful!' }  ],  { userId: 'user-123', sessionId: 'session-456' });

Conversion Tracking

TypeScript

await prompt.pushScore(  [    { name: 'clicked_cta', type: 'bool', value: true },    { name: 'purchased', type: 'bool', value: true },    { name: 'cart_value', type: 'int', value: 149 }  ],  { userId: 'user-123' });

Performance Metrics

TypeScript

await prompt.pushScore(  [    { name: 'response_time_ms', type: 'int', value: 342 },    { name: 'tokens_used', type: 'int', value: 1250 },    { name: 'task_completed', type: 'bool', value: true }  ],  { sessionId: 'session-789' });

Error Handling

Score submission failures do NOT throw exceptions. Instead, the method resolves safely:

TypeScript

const result = await prompt.pushScore(scores, options);if (result.success) {  console.log('Score submitted successfully!');} else {  console.warn('Score submission failed:', result.error);}

Important:Validation errors (e.g., missing name/type/value or wrong type) will throw immediately. Network issues will not crash your application.

Best Practices

Track Every Interaction: Push scores for all prompt uses.
Use Descriptive Names: Avoid generic metric names.
Mix Metric Types: Track both success metrics and diagnostics.
Provide Both Identifiers: Send both sessionId and userId when possible.
Don’t Block UI: You can optionally call pushScore() without await.
Validate Values: Ensure the value type matches the declared score type.

Next Steps

Observability — View and analyze experiment metrics
Experiments — Learn how to design A/B tests & workflows