Scenarios Overview¶
If you mainly want to use the benchmark, this is the most important section. The scenarios subsystem has two parts:
- Benchmark scenarios: curated, registered scenarios used for evaluation.
- Scenario generator: multi-step pipeline that produces new candidate scenarios.
Most Common Tasks¶
List available benchmark scenarios¶
uv run pare scenarios list
Filter scenarios by app usage¶
uv run pare scenarios list --apps StatefulEmailApp --apps StatefulCalendarApp
Run the benchmark¶
uv run pare benchmark sweep --split full --observe-model gpt-5 --execute-model gpt-5
Generate additional scenarios¶
uv run pare scenarios generate --num-scenarios 3
Benchmark Scenarios¶
Core modules:
pare/scenarios/scenario.py:PAREScenariobase class.pare/scenarios/registration.py: scenario discovery and registration.pare/scenarios/config.py: runner configs.pare/scenarios/benchmark/: benchmark scenario implementations.
CLI entrypoints:
pare scenarios listpare benchmark sweep
Execution/runtime details:
Scenario Generator¶
Core modules:
pare/scenarios/generator/scenario_generator.py: generator CLI/utilities.pare/scenarios/generator/agent/scenario_generating_agent_orchestrator.py: step orchestration.pare/scenarios/generator/agent/step_agents.py: step-specific LLM agents.pare/scenarios/scenario_metadata.json: metadata used for listing and uniqueness checks.
Outputs:
- Working file:
pare/scenarios/default_generation_output/editable_seed_scenario.py - Final generated scenarios:
pare/scenarios/default_generation_output/ - Step trajectory snapshots:
pare/scenarios/generator/step_trajectory/
See: