Scenarios Overview¶

If you mainly want to use the benchmark, this is the most important section. The scenarios subsystem has two parts:

Benchmark scenarios: curated, registered scenarios used for evaluation.
Scenario generator: multi-step pipeline that produces new candidate scenarios.

Most Common Tasks¶

uv run pare scenarios list

uv run pare scenarios list --apps StatefulEmailApp --apps StatefulCalendarApp

uv run pare benchmark sweep --split full --observe-model gpt-5 --execute-model gpt-5

uv run pare scenarios generate --num-scenarios 3

Core modules:

CLI entrypoints:

Execution/runtime details:

Core modules:

pare/scenarios/generator/scenario_generator.py: generator CLI/utilities.
pare/scenarios/generator/agent/scenario_generating_agent_orchestrator.py: step orchestration.
pare/scenarios/generator/agent/step_agents.py: step-specific LLM agents.
pare/scenarios/scenario_metadata.json: metadata used for listing and uniqueness checks.

Outputs:

Working file: pare/scenarios/default_generation_output/editable_seed_scenario.py
Final generated scenarios: pare/scenarios/default_generation_output/
Step trajectory snapshots: pare/scenarios/generator/step_trajectory/

See: