The beta version of Evalite v1 is now available! Install with
pnpm add evalite@beta • View beta docs → Variant Comparison
Overview
evalite.each() enables comparing multiple task variants (models, prompts, configs) within a single eval. This lets you:
- Compare different models on the same dataset
- A/B test prompt strategies
- Test different config parameters (temperature, system prompts, etc.)
Basic Usage
import { evalite } from "evalite";import { openai } from "@ai-sdk/openai";import { generateText } from "ai";import { Factuality, Levenshtein } from "autoevals";
evalite.each([ { name: "GPT-4o mini", input: { model: "gpt-4o-mini", temp: 0.7 } }, { name: "GPT-4o", input: { model: "gpt-4o", temp: 0.7 } }, { name: "Claude Sonnet", input: { model: "claude-3-5-sonnet", temp: 1.0 } },])("Compare models", { data: async () => [ { input: "What's the capital of France?", expected: "Paris" }, { input: "What's the capital of Germany?", expected: "Berlin" }, ], task: async (input, variant) => { return generateText({ model: openai(variant.model), temperature: variant.temp, prompt: input, }); }, scorers: [Factuality, Levenshtein],});Example: Prompt Comparison
evalite.each([ { name: "Direct", input: { system: "Answer concisely.", }, }, { name: "Chain of Thought", input: { system: "Think step by step, then answer.", }, }, { name: "Few-Shot", input: { system: `Examples:Q: What's 2+2? A: 4Q: What's 5+3? A: 8Now answer the question.`, }, },])("Prompt Strategies", { data: async () => [ { input: "What's 12 * 15?", expected: "180" }, // ... ], task: async (input, variant) => { return generateText({ model: openai("gpt-4o-mini"), system: variant.system, prompt: input, }); }, scorers: [Levenshtein],});