Benchmark Scenarios¶

Benchmark scenarios are Python classes registered with @register_scenario(...) and loaded from pare/scenarios/benchmark/.

This is the part of the docs most users need for day-to-day benchmark usage: how to inspect scenarios, choose subsets, and run sweeps.

Where They Live¶

List scenarios:

uv run pare scenarios list

Run benchmark sweeps:

uv run pare benchmark sweep --split full --observe-model gpt-5 --execute-model gpt-5

Run custom subset by IDs:

uv run pare benchmark sweep --scenarios scenario_a,scenario_b --observe-model gpt-5 --execute-model gpt-5

Review guidance: pare/scenarios/benchmark/scenario_review_guidelines.md
Metadata ledger: pare/scenarios/scenario_metadata.json
Reviewer assignment workflows can be managed through scripts/create_review_csvs.py

For execution-specific details beyond scenario authoring, see: