Update README with full details section

This commit is contained in:
Ahmed Allam
2026-01-23 23:05:26 +04:00
committed by GitHub
parent 2bc1e5e1cb
commit 655ddb4d7f

View File

@@ -2,6 +2,15 @@
We use security benchmarks to track Strix's capabilities and improvements over time. We plan to add more benchmarks, both existing ones and our own, to help the community evaluate and compare security agents. We use security benchmarks to track Strix's capabilities and improvements over time. We plan to add more benchmarks, both existing ones and our own, to help the community evaluate and compare security agents.
## Full Details
For the complete benchmark results, evaluation scripts, and run data, see the [usestrix/benchmarks](https://github.com/usestrix/benchmarks) repository.
> [!NOTE]
> We are actively adding more benchmarks to our evaluation suite.
## Results ## Results
| Benchmark | Challenges | Success Rate | | Benchmark | Challenges | Success Rate |
@@ -32,10 +41,3 @@ pie title Challenge Outcomes (104 Total)
**Resource Usage:** **Resource Usage:**
- Average solve time: ~19 minutes - Average solve time: ~19 minutes
- Total cost: ~$337 for 100 challenges - Total cost: ~$337 for 100 challenges
## Full Details
For the complete benchmark results, evaluation scripts, and run data, see the [usestrix/benchmarks](https://github.com/usestrix/benchmarks) repository.
> [!NOTE]
> We are actively adding more benchmarks to our evaluation suite.