Scaling malicious code detection from pull requests to the software supply chain

Attackers are increasingly targeting the software supply chain, compromising widely used dependencies to distribute malicious code downstream at scale. Over the past few months alone, incidents involving packages like axios, LiteLLM, and Mistral showed how quickly these attacks can spread across trusted ecosystems.

In our previous post, Detecting malicious pull requests at scale with LLMs, we introduced BewAIre, a system we built to detect malicious code in pull requests by using large language models (LLMs). BewAIre quickly became a reliable part of our security workflows, helping us identify penetration tests, bug bounty activity, and real-world attacks, including activity from the recent Hackerbot campaign.

But pull requests are only part of the attack surface. We wanted to answer a harder question: Could we extend the same LLM-based detection approach to entire dependency packages and upstream package registries without sacrificing accuracy, latency, or predictable cost?

In this post, we show how we expanded BewAIre from pull request analysis to large-scale package scanning by combining stacked LLM evaluations with tool-driven investigation loops. We’ll walk through the engineering trade-offs behind scaling malicious code detection across ecosystems while maintaining high accuracy and operational efficiency.

Scaling malicious code detection from pull requests to the software supply chain | Datadog

Other newsrooms on this story

Related reading

Detect source code attacks with Datadog Code Threat Detection | Datadog

When an AI agent came knocking: Catching malicious contributions in Datadog’s…

Ongoing supply chain attacks worm into SAP npm packages

Forget typosquatting; slopsquatting is the software supply chain threat created…

Detecting Malicious Packages Using the OSV API | OpenSSF

'Phantom Squatting': An Emerging AI-Driven Supply Chain Threat