feat(reporting): add LLM-based vulnerability deduplication

- Add dedupe.py with XML-based LLM deduplication using direct litellm calls
- Integrate deduplication check in create_vulnerability_report tool
- Add get_existing_vulnerabilities() method to tracer for fetching reports
- Update schema and system prompt with deduplication guidelines
This commit is contained in:
0xallam
2026-01-07 19:23:17 -08:00
committed by Ahmed Allam
parent 0e9cd9b2a4
commit 01ae348da8
5 changed files with 268 additions and 1 deletions

View File

@@ -157,6 +157,45 @@ def create_vulnerability_report(
tracer = get_global_tracer()
if tracer:
from strix.llm.dedupe import check_duplicate
existing_reports = tracer.get_existing_vulnerabilities()
candidate = {
"title": title,
"description": description,
"impact": impact,
"target": target,
"technical_analysis": technical_analysis,
"poc_description": poc_description,
"poc_script_code": poc_script_code,
"endpoint": endpoint,
"method": method,
}
dedupe_result = check_duplicate(candidate, existing_reports)
if dedupe_result.get("is_duplicate"):
duplicate_id = dedupe_result.get("duplicate_id", "")
duplicate_title = ""
for report in existing_reports:
if report.get("id") == duplicate_id:
duplicate_title = report.get("title", "Unknown")
break
return {
"success": False,
"message": (
f"Potential duplicate of '{duplicate_title}' "
f"(id={duplicate_id[:8]}...). Do not re-report the same vulnerability."
),
"duplicate_of": duplicate_id,
"duplicate_title": duplicate_title,
"confidence": dedupe_result.get("confidence", 0.0),
"reason": dedupe_result.get("reason", ""),
}
cvss_breakdown = {
"attack_vector": attack_vector,
"attack_complexity": attack_complexity,

View File

@@ -2,6 +2,8 @@
<tool name="create_vulnerability_report">
<description>Create a vulnerability report for a discovered security issue.
IMPORTANT: This tool includes automatic LLM-based deduplication. Reports that describe the same vulnerability (same root cause on the same asset) as an existing report will be rejected.
Use this tool to document a specific fully verified security vulnerability.
DO NOT USE:
@@ -10,9 +12,12 @@ DO NOT USE:
- When you don't have a proof of concept, or still not 100% sure if it's a vulnerability
- For tracking multiple vulnerabilities (create separate reports)
- For reporting multiple vulnerabilities at once. Use a separate create_vulnerability_report for each vulnerability.
- To re-report a vulnerability that was already reported (even with different details)
White-box requirement (when you have access to the code): You MUST include code_file, code_before, code_after, and code_diff. These must contain the actual code (before/after) and a complete, apply-able unified diff.
DEDUPLICATION: If this tool returns with success=false and mentions a duplicate, DO NOT attempt to re-submit. The vulnerability has already been reported. Move on to testing other areas.
Professional, customer-facing report rules (PDF-ready):
- Do NOT include internal or system details: never mention local or absolute paths (e.g., "/workspace"), internal tools, agents, orchestrators, sandboxes, models, system prompts/instructions, connection issues, internal errors/logs/stack traces, or tester machine environment details.
- Tone and style: formal, objective, third-person, vendor-neutral, concise. No runbooks, checklists, or engineering notes. Avoid headings like "QUICK", "Approach", or "Techniques" that read like internal guidance.
@@ -122,7 +127,9 @@ H = High (total loss of availability)</description>
</parameter>
</parameters>
<returns type="Dict[str, Any]">
<description>Response containing success=true, message, report_id, severity, cvss_score</description>
<description>Response containing:
- On success: success=true, message, report_id, severity, cvss_score
- On duplicate detection: success=false, message (with duplicate info), duplicate_of (ID), duplicate_title, confidence (0-1), reason (why it's a duplicate)</description>
</returns>
</tool>
</tools>