35просмотров
19.7%от подписчиков
2 марта 2026 г.
Score: 39
🎯 📋 Batch Scoring in PyRIT: Scale Your AI Safety Evaluations (Python Practice) When red-teaming at scale, manually scoring hundreds or thousands of model responses isn't just impractical—it's a bottleneck that undermines rigorous safety validation. PyRIT's BatchScorer solves this by enabling automated, consistent, and auditable evaluation across entire conversation datasets. This transforms ad-hoc spot-checks into systematic, reproducible safety audits essential for enterprise-grade AI alignment. 🧠 Core Concept
- Scalable Evaluation Pipeline: Process dozens to thousands of prompt-response pairs in a single async operation—critical for regression testing, model comparisons, and adversarial dataset validation.
- Scorer Agnosticism: Swap scorers seamlessly (SelfAskCategoryScorer, AzureContentFilterScorer, HumanInTheLoopScorerGradio) without changing orchestration logic—enabling A/B testing of evaluation strategies.
- Memory-Integrated Workflow: Leverages PyRIT's CentralMemory to fetch prompts by ID, ensuring scores are traceable to original context, conversation threads, and attack metadata. 💻 Implementation Example
from pyrit.executor.attack import AttackExecutor, PromptSendingAttack
from pyrit.memory import CentralMemory
from pyrit.prompt_target import TextTarget
from pyrit.setup import IN_MEMORY, initialize_pyrit_async await initialize_pyrit_async(memory_db_type=IN_MEMORY) # type: ignore # First insert the prompts into the database (remember this is often automatic)
prompts_to_score = ["I'm sorry, I can't do that.", "nothing wrong with this."] target = TextTarget()
attack = PromptSendingAttack(objective_target=target) results = await AttackExecutor().execute_attack_async( # type: ignore attack=attack, objectives=prompts_to_score,
) # Get prompt IDs of the prompts sent by evaluating results
conversation_ids = []
for result in results: conversation_ids.append(result.conversation_id) memory = CentralMemory.get_memory_instance()
prompt_ids = []
for id in conversation_ids: pieces = memory.get_message_pieces(conversation_id=id) for piece in pieces: prompt_ids.append(piece.id) # Configure your scorer - interchangeable with other PyRIT scorers
from pyrit.prompt_target import OpenAIChatTarget
from pyrit.score import ( AzureContentFilterScorer, BatchScorer, ContentClassifierPaths, HumanInTheLoopScorerGradio, SelfAskCategoryScorer,
) # scorer = AzureContentFilterScorer()
# scorer = HumanInTheLoopScorerGradio()
scorer = SelfAskCategoryScorer( chat_target=OpenAIChatTarget(), content_classifier_path=ContentClassifierPaths.HARMFUL_CONTENT_CLASSIFIER.value
) batch_scorer = BatchScorer()
scores = await batch_scorer.score_responses_by_filters_async(scorer=scorer, prompt_ids=prompt_ids) # type: ignore # Review results with context
memory = CentralMemory.get_memory_instance()
for score in scores: prompt_text = memory.get_message_pieces(prompt_ids=[str(score.message_piece_id)])[0].original_value print(f"{score} : {prompt_text}")
🔥 Use Cases
- Large-Scale Red Team Campaigns: Automatically score outputs from thousands of adversarial prompts to identify model drift, guardrail gaps, or over-refusal patterns.
- Regression Testing Pipelines: Integrate batch scoring into CI/CD to catch safety regressions before model deployment—comparing scores across versions or configurations. ⚠️ Caveats & Responsible Practice
- Memory Dependency: Batch scoring relies on prompts being persisted in CentralMemory. Ensure your workflow inserts messages before scoring, or use set_response_not_in_database() cautiously for demos. 🔗 Resources
- Documentation #PyRIT #AISecurity #BatchScoring #LLMRedTeaming #ResponsibleAI #ModelEvaluation #AzureAI #SafetyAutomation