Benchmark Scenarios¶
Benchmark scenarios are Python classes registered with @register_scenario(...) and loaded from pare/scenarios/benchmark/.
This is the part of the docs most users need for day-to-day benchmark usage: how to inspect scenarios, choose subsets, and run sweeps.
Where They Live¶
pare/scenarios/benchmark/*.py: benchmark scenario files.pare/scenarios/registration.py: registration/discovery logic.pare/scenarios/scenario.py: base scenario lifecycle and helpers.
How to Inspect and Run¶
List scenarios:
uv run pare scenarios list
Run benchmark sweeps:
uv run pare benchmark sweep --split full --observe-model gpt-5 --execute-model gpt-5
Run custom subset by IDs:
uv run pare benchmark sweep --scenarios scenario_a,scenario_b --observe-model gpt-5 --execute-model gpt-5
Review and Curation¶
- Review guidance:
pare/scenarios/benchmark/scenario_review_guidelines.md - Metadata ledger:
pare/scenarios/scenario_metadata.json - Reviewer assignment workflows can be managed through
scripts/create_review_csvs.py
Runtime Details¶
For execution-specific details beyond scenario authoring, see: