Skip to content
The beta version of Evalite v1 is now available! Install with pnpm add evalite@betaView beta docs →

Variant Comparison

Overview

evalite.each() enables comparing multiple task variants (models, prompts, configs) within a single eval. This lets you:

  • Compare different models on the same dataset
  • A/B test prompt strategies
  • Test different config parameters (temperature, system prompts, etc.)

Basic Usage

import { evalite } from "evalite";
import { openai } from "@ai-sdk/openai";
import { generateText } from "ai";
import { Factuality, Levenshtein } from "autoevals";
evalite.each([
{ name: "GPT-4o mini", input: { model: "gpt-4o-mini", temp: 0.7 } },
{ name: "GPT-4o", input: { model: "gpt-4o", temp: 0.7 } },
{ name: "Claude Sonnet", input: { model: "claude-3-5-sonnet", temp: 1.0 } },
])("Compare models", {
data: async () => [
{ input: "What's the capital of France?", expected: "Paris" },
{ input: "What's the capital of Germany?", expected: "Berlin" },
],
task: async (input, variant) => {
return generateText({
model: openai(variant.model),
temperature: variant.temp,
prompt: input,
});
},
scorers: [Factuality, Levenshtein],
});

Example: Prompt Comparison

evalite.each([
{
name: "Direct",
input: {
system: "Answer concisely.",
},
},
{
name: "Chain of Thought",
input: {
system: "Think step by step, then answer.",
},
},
{
name: "Few-Shot",
input: {
system: `Examples:
Q: What's 2+2? A: 4
Q: What's 5+3? A: 8
Now answer the question.`,
},
},
])("Prompt Strategies", {
data: async () => [
{ input: "What's 12 * 15?", expected: "180" },
// ...
],
task: async (input, variant) => {
return generateText({
model: openai("gpt-4o-mini"),
system: variant.system,
prompt: input,
});
},
scorers: [Levenshtein],
});