Simbian launches new security benchmark with AI SOC LLM Leaderboard

June 12, 2025
The new benchmark compares LLMs across a diverse range of attacks and SOC tools in a realistic IT environment over all phases of alert investigation.

Simbian today announced the “AI SOC LLM Leaderboard,” a comprehensive benchmark to measure LLM performance in Security Operations Centers (SOCs). The new benchmark compares LLMs across a diverse range of attacks and SOC tools in a realistic IT environment over all phases of alert investigation, from alert ingestion to disposition and reporting. It includes a public leaderboard to help professionals decide the best LLM for their SOC needs.

“SOC analysts and vendors building tools for the SOC are rapidly embracing LLMs to scale their operations, increase accuracy, and reduce costs,” said Ambuj Kumar, Simbian CEO and co-founder. “Our benchmark enables SOC teams and vendors to pick the best LLM for this purpose.”

The benchmark comprehensively measures LLMs on the primary role of SOCs, which is to investigate alerts end-to-end. This task involves diverse skills, including the ability to:

  • Understand alerts from a broad range of detection sources;

  • Determine how to investigate any given alert;

  • Generate code to support that investigation;

  • Understand data, extract evidence, and map it to attack stages;

  • Reason over evidence to arrive at a clear disposition and severity;

  • Produce clear reports and response actions; and

  • Customize investigations for each organization’s context.

Simbian’s AI SOC LLM Leaderboard measures LLMs on autonomous end-to-end investigation of alerts, utilizing the above skills. To make the benchmark applicable across a range of SOC environments, it leverages 100 diverse full-kill chain scenarios that test all layers of defense. It also measures investigation performance in a lab environment mimicking an enterprise, with investigations autonomously retrieving data from live tools across the environment.

This first LLM benchmark tested today’s top-tier LLM models from Anthropic, OpenAI, Google, and DeepSeek. All tested models were able to complete over half (61%-67%) of the tasks involved in alert investigation, as long as there was a solid framework to break down an investigation into clearly defined tasks for LLMs. For this benchmark, that framework was provided by Simbian’s AI SOC Agent.

The AI SOC LLM Leaderboard reveals that LLMs are more capable than commonly believed for autonomous alert investigation. Marginal difference was observed between standard LLMs and thinking LLMs for alert investigation. The results showed that the best LLM for cybersecurity is a generalist (like Sonnet 3.5) that knows how to code as well as how to perform logical reasoning, rather than a specialist that excels at code (Sonnet 4.0) or at logical reasoning (Opus 4). Finally, the benchmark highlighted that specialization, such as SOC-specific training or a mix of LLMs, yields higher performance than any single LLM.

Alert fatigue is common across SOCs, and it is only getting worse with AI-powered attacks, requiring SOC teams to scale their capacity rapidly. AI offers a solution, and this benchmark guides the industry on the best LLM for the SOC. Simbian will update the measurement results periodically.

Follow the AI SOC LLM Leaderboard page at https://simbian.ai/best-ai-for-cybersecurity.