Scenario Summary Generation and Deduplication¶
This document describes the scenario summary generation system, which creates natural language summaries of scenario code and validates them against existing summaries to prevent duplicates.
Overview¶
The summary generation system consists of three main components:
- SummaryGeneratingAgent: An LLM-based agent that generates concise summaries of scenario code
- generate_scenario_summaries.py: A batch script to generate summaries for multiple scenario files
- validate_and_add_scenario_summary.py: A validation script that checks new summaries against existing ones using similarity metrics
SummaryGeneratingAgent¶
The SummaryGeneratingAgent is a specialized agent that analyzes scenario Python code and generates human-readable summaries.
Location¶
pare/scenario_generator/agent/summary_generating_agent.py
Features¶
- Automatic Summary Generation: Uses LLM to analyze scenario code and create concise summaries
- Scenario ID Extraction: Automatically extracts scenario IDs from
@register_scenariodecorators - Output Cleaning: Removes markdown formatting and common prefixes from LLM output
- Error Handling: Gracefully handles failures and returns
Nonewhen generation fails
API¶
__init__(llm_engine: LLMEngine)¶
Initialize the summary generating agent.
Parameters:
- llm_engine: The LLM engine to use for summary generation
generate_summary(scenario_code: str) -> str | None¶
Generate a summary for the given scenario code.
Parameters:
- scenario_code: The scenario Python code as a string
Returns:
- The generated summary text, or None if generation failed
Example:
from are.simulation.agents.llm.llm_engine_builder import LLMEngineBuilder
from are.simulation.agents.are_simulation_agent_config import LLMEngineConfig
from pare.scenario_generator.agent.summary_generating_agent import SummaryGeneratingAgent
config = LLMEngineConfig(model_name="gpt-4o-mini", provider="openai")
engine = LLMEngineBuilder().create_engine(engine_config=config)
agent = SummaryGeneratingAgent(engine)
scenario_code = """
@register_scenario("meeting_invite_coordination")
class MeetingInviteCoordination(Scenario):
# ... scenario code ...
"""
summary = agent.generate_summary(scenario_code)
generate_summary_from_file(file_path: Path | str) -> tuple[str | None, str | None]¶
Generate a summary for a scenario file and extract its scenario ID.
Parameters:
- file_path: Path to the scenario Python file
Returns:
- Tuple of (scenario_id, summary). Returns (None, None) if extraction/generation fails.
Example:
scenario_id, summary = agent.generate_summary_from_file("path/to/scenario.py")
if scenario_id and summary:
print(f"Scenario: {scenario_id}")
print(f"Summary: {summary}")
Summary Format¶
Summaries are 2-4 sentences that describe: - The primary objective/goal of the scenario - The applications used - The main workflow and interaction patterns - Key validation criteria
generate_scenario_summaries.py¶
A command-line script for batch generating summaries for scenario files.
Location¶
pare/scenario_generator/utils/generate_scenario_summaries.py
Usage¶
Generate Summary for a Single File¶
uv run python pare/scenario_generator/utils/generate_scenario_summaries.py \
--file pare/scenarios/generated_scenarios/meeting_invite_coordination.py
Generate Summaries for All Scenarios¶
uv run python pare/scenario_generator/utils/generate_scenario_summaries.py --all
This recursively searches all subdirectories in generated_scenarios/ for Python files.
Force Regeneration of Existing Summaries¶
uv run python pare/scenario_generator/utils/generate_scenario_summaries.py --all --force
Custom LLM Configuration¶
uv run python pare/scenario_generator/utils/generate_scenario_summaries.py \
--file scenario.py \
--model gpt-4 \
--provider openai
Custom Output File¶
uv run python pare/scenario_generator/utils/generate_scenario_summaries.py \
--all \
--output custom_summaries.json
Command-Line Options¶
| Option | Description | Default |
|---|---|---|
--file FILE_PATH |
Path to a single scenario file | None |
--all |
Generate summaries for all scenario files | False |
--force |
Force regeneration of existing summaries | False |
--model MODEL |
LLM model to use | gpt-4o-mini |
--provider PROVIDER |
LLM provider | openai |
--endpoint ENDPOINT |
Optional endpoint URL | None |
--output OUTPUT_FILE |
Path to output JSON file | generated_scenarios/scenario_summaries.json |
Output Format¶
The script saves summaries to a JSON file with the following structure:
{
"scenario_id_1": "Summary text for scenario 1...",
"scenario_id_2": "Summary text for scenario 2...",
...
}
validate_and_add_scenario_summary.py¶
A validation script that generates a summary for a scenario file, compares it against existing summaries using similarity metrics, and adds it to the JSON file only if it passes all threshold checks.
Location¶
pare/scenario_generator/utils/validate_and_add_scenario_summary.py
Usage¶
Basic Validation and Addition¶
uv run python pare/scenario_generator/utils/validate_and_add_scenario_summary.py \
--file pare/scenarios/generated_scenarios/new_scenario.py
Custom Similarity Thresholds¶
uv run python pare/scenario_generator/utils/validate_and_add_scenario_summary.py \
--file scenario.py \
--difflib-threshold 0.75 \
--jaccard-threshold 0.75 \
--cosine-threshold 0.90
Using in Scripts¶
The script prints "True" to stdout if validation passes and the summary is added, or "False" if validation fails:
if uv run python pare/scenario_generator/utils/validate_and_add_scenario_summary.py \
--file scenario.py; then
echo "Summary added successfully!"
else
echo "Summary too similar to existing ones"
fi
Command-Line Options¶
| Option | Description | Default |
|---|---|---|
--file FILE_PATH |
Path to the scenario file (required) | None |
--difflib-threshold FLOAT |
Threshold for difflib_ratio | 0.8 |
--jaccard-threshold FLOAT |
Threshold for jaccard_shingles | 0.8 |
--cosine-threshold FLOAT |
Threshold for cosine_tokens | 0.94 |
--k INT |
Shingle size for Jaccard similarity | 3 |
--model MODEL |
LLM model to use | gpt-4o-mini |
--provider PROVIDER |
LLM provider | openai |
--endpoint ENDPOINT |
Optional endpoint URL | None |
--output OUTPUT_FILE |
Path to output JSON file | generated_scenarios/scenario_summaries.json |
Similarity Metrics¶
The script uses three similarity metrics to compare summaries:
1. difflib_ratio¶
Description: Structural/sequential similarity using the longest matching subsequence algorithm.
Threshold: Default 0.8
Interpretation: Measures how similar the overall structure and sequence of words are between two summaries. Higher values indicate more structural similarity.
2. jaccard_shingles¶
Description: Pattern similarity based on overlapping k-gram token shingles (default k=3).
Threshold: Default 0.8
Interpretation: Measures the overlap of token patterns between summaries. More robust to minor edits than difflib. Higher values indicate more pattern overlap.
3. cosine_tokens¶
Description: Vocabulary similarity based on token frequency (bag-of-words approach).
Threshold: Default 0.94
Interpretation: Measures how similar the vocabulary usage is between summaries. Uses a higher threshold because summaries naturally share common words. Higher values indicate more vocabulary overlap.
Validation Logic¶
For a new summary to be accepted:
- Summary Generation: The script generates a summary for the input scenario file
- Comparison: The new summary is compared against ALL existing summaries in the JSON file
- Threshold Check: For each existing summary, all three metrics must be below their respective thresholds:
difflib_ratio < difflib_thresholdjaccard_shingles < jaccard_thresholdcosine_tokens < cosine_threshold- Addition: If all comparisons pass, the summary is added to the JSON file
- Return Value: Returns
Trueif added,Falseif validation failed
Output¶
The script provides detailed logging:
- For each comparison: Logs the scenario ID and all three metric values
- For violations: Logs a warning with the scenario ID and which thresholds were exceeded
- Final result: Prints "True" or "False" to stdout
Example Output:
INFO:__main__:Generating summary for scenario.py...
INFO:__main__:Generated summary for scenario 'new_scenario': Summary text...
INFO:__main__:Validating summary against existing summaries...
INFO:__main__:Comparing with 'meeting_invite_coordination': difflib_ratio=0.1234, jaccard_shingles=0.2345, cosine_tokens=0.3456
INFO:__main__:Comparing with 'weekend_grocery_pickup': difflib_ratio=0.4567, jaccard_shingles=0.5678, cosine_tokens=0.6789
WARNING:__main__: ⚠️ VIOLATION: Summary too similar to existing scenario 'weekend_grocery_pickup' (exceeds thresholds: difflib≥0.8, jaccard≥0.8, cosine≥0.94)
False
Integration with Scenario Generation¶
The summary system is integrated into the scenario generation workflow:
- During Generation: The
SeedScenarioGeneratingAgentuses summaries fromscenario_summaries.jsonto provide context when detecting duplicate scenarios - Similarity Detection: When a generated scenario is flagged as too similar to an existing one, the summary of the similar scenario is included in the error message to help the LLM understand what needs to be changed
File Structure¶
pare/
├── scenario_generator/
│ ├── agent/
│ │ └── summary_generating_agent.py # SummaryGeneratingAgent class
│ ├── prompt/
│ │ └── summary_generator_prompts.py # Prompt templates
│ └── utils/
│ ├── generate_scenario_summaries.py # Batch generation script
│ └── validate_and_add_scenario_summary.py # Validation script
└── scenarios/
└── generated_scenarios/
└── scenario_summaries.json # Summary storage (generated)
Best Practices¶
- Initial Setup: Run
generate_scenario_summaries.py --allto create initial summaries for all existing scenarios - New Scenarios: Use
validate_and_add_scenario_summary.pywhen adding new scenarios to ensure they're unique - Threshold Tuning: Adjust thresholds based on your needs:
- Lower thresholds (0.7-0.75) for stricter deduplication
- Higher thresholds (0.85-0.9) for more lenient deduplication
- Regular Updates: Periodically regenerate summaries with
--forceto ensure they're up-to-date with code changes
Error Handling¶
All components include comprehensive error handling:
- File Not Found: Scripts check for file existence and provide clear error messages
- LLM Failures: Summary generation failures are logged and return
None - JSON Errors: Invalid JSON files are handled gracefully with fallback to empty dictionaries
- Missing Scenario IDs: If scenario ID extraction fails, the filename is used as a fallback
Dependencies¶
are.simulation.agents.llm.llm_engine: LLM engine for summary generationare.simulation.agents.are_simulation_agent_config: LLM configuration- Standard library:
json,pathlib,logging,difflib,re,collections