CI/CD
Evalite integrates seamlessly into CI/CD pipelines, allowing you to validate LLM-powered features as part of your automated testing workflow.
Static UI Export
Export eval results as a static HTML bundle for viewing in CI artifacts without running a live server.
Basic Usage
evalite export
Exports latest full run to ./evalite-export
directory.
Options
Custom output directory:
evalite export --output=./my-export
Export specific run:
evalite export --run-id=123
Export Structure
Generated bundle contains:
index.html
- Standalone UI (works without server)data/*.json
- Pre-computed API responsesfiles/*
- Images, audio, etc. from eval resultsassets/*
- UI JavaScript/CSS
Viewing Exports
Local preview:
npx serve -s ./evalite-export
Static hosting: Upload to artifact.ci, S3, GitHub Pages, etc.
CI Integration Example
GitHub Actions workflow exporting UI to artifacts:
name: Run Evals
on: [push, pull_request]
jobs: evals: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3
- uses: actions/setup-node@v3 with: node-version: "22"
- run: npm install
- name: Run evaluations env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} run: npx evalite --threshold=70
- name: Export UI run: npx evalite export --output=./ui-export
- name: Upload static UI uses: actions/upload-artifact@v3 with: name: evalite-ui path: ui-export
View results by downloading artifact and running npx serve -s ./ui-export
.
Running on CI
Run Evalite in run-once mode (default):
evalite
Executes all evals and exits.
Score Thresholds
Fail CI builds if scores fall below threshold:
evalite --threshold=70
Exits with code 1 if average score < 70.
JSON Export
For programmatic analysis, export raw JSON:
evalite --outputPath=./results.json
Export Format
Typed hierarchical structure:
import type { Evalite } from "evalite";
type Output = Evalite.Exported.Output;
Contains:
run
: Metadata (id, runType, createdAt)evals
: Array of evaluations with:- Basic info (name, filepath, duration, status, averageScore)
results
: Individual test results with:- Test data (input, output, expected)
scores
: Scorer resultstraces
: LLM call traces
Use Cases
- Analytics: Import into dashboards for performance tracking
- Archiving: Store historical results for comparison
- Custom tooling: Build scripts around eval data