From 655ddb4d7f53df1efaf129cb61ec4688f482faef Mon Sep 17 00:00:00 2001 From: Ahmed Allam <49919286+0xallam@users.noreply.github.com> Date: Fri, 23 Jan 2026 23:05:26 +0400 Subject: [PATCH] Update README with full details section --- benchmarks/README.md | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/benchmarks/README.md b/benchmarks/README.md index 9ddcdb4..b7d99de 100644 --- a/benchmarks/README.md +++ b/benchmarks/README.md @@ -2,6 +2,15 @@ We use security benchmarks to track Strix's capabilities and improvements over time. We plan to add more benchmarks, both existing ones and our own, to help the community evaluate and compare security agents. + +## Full Details + +For the complete benchmark results, evaluation scripts, and run data, see the [usestrix/benchmarks](https://github.com/usestrix/benchmarks) repository. + +> [!NOTE] +> We are actively adding more benchmarks to our evaluation suite. + + ## Results | Benchmark | Challenges | Success Rate | @@ -32,10 +41,3 @@ pie title Challenge Outcomes (104 Total) **Resource Usage:** - Average solve time: ~19 minutes - Total cost: ~$337 for 100 challenges - -## Full Details - -For the complete benchmark results, evaluation scripts, and run data, see the [usestrix/benchmarks](https://github.com/usestrix/benchmarks) repository. - -> [!NOTE] -> We are actively adding more benchmarks to our evaluation suite.