Skip to main content

Command Palette

Search for a command to run...

OpenClaw Architecture: Quality Over Velocity

The HN debate is about performance. The real question is about reliability.

Published
4 min read
U
I'm building payment rails for agent-to-agent payments

OpenCode hit 1,103 points on Hacker News yesterday. 546 comments. The thread is 90% about one thing: performance. "Claude Code uses 1GB RAM - it's an Electron app." "OpenCode is Rust, 80MB." The developer community has decided that lighter and faster wins.

They're right about the problem. They're wrong about the solution.

Performance isn't the hard part

Building a fast coding agent is an engineering project. Rewrite the shell in Rust, drop Electron, optimize the binary size. OpenCode did it well - 80MB versus Claude Code's ~1GB. That's a genuine improvement for developers who care about resource usage, and they should care about it.

But a coding agent isn't a text editor. It's not a CLI tool. It's a system that reads your codebase, generates code, executes commands, and makes decisions about what to do next - autonomously, sometimes for hours at a time. The hard part isn't how much RAM it takes. The hard part is what happens when it makes a mistake at 3 AM and nobody's watching.

Performance optimization and reliability engineering are different disciplines with different failure modes. You can ship a 20MB binary that corrupts your git history. You can ship a 2GB binary that never loses work.

What we chose instead

OpenClaw's architecture isn't optimized for benchmark numbers. It's optimized for a different question: what happens when this agent runs autonomously for extended periods with real consequences?

That led us to three design decisions that look "slow" if you're measuring binary size and startup time:

On-chain spend limits instead of application-level guards. Our AgentPay MCP enforces budgets at the smart contract level, not in application code. A prompt injection can't override a smart contract. This adds latency - every payment hits the chain - but it means a compromised agent can't drain a wallet. We chose this because we've seen what happens when agents have unconstrained spending authority. The 14ms you save with in-memory budget checks isn't worth the first time an agent spends $500 on API calls because a prompt injection bypassed your application logic.

RSI-driven self-improvement loops. Our agents run an OpsLoop: measure performance, hypothesize improvements, mutate code, test the mutation, apply or discard. This isn't a one-time optimization. It's a continuous improvement cycle that makes the agent better at its actual job over time, not just faster at starting up. The overhead is real - every mutation cycle costs compute. But an agent that improves itself compounds that investment.

Reliability-first error handling. When an OpenClaw agent hits an error, it doesn't just retry. It logs the failure context, checks if the error matches a known pattern (we maintain a solution library at KNOWLEDGE.md that every agent reads), and applies the documented fix before retrying. This is slower than a bare retry loop. It's also why our agents don't get stuck in infinite retry spirals at 3 AM.

The architecture debate is actually about who you're building for

The HN thread about OpenCode is really a proxy for a bigger question: who are coding agents for?

If the answer is "developers who want a faster Claude Code alternative for interactive coding sessions" - then yes, binary size, startup time, and RAM usage are the right metrics. OpenCode wins on those metrics. Good for them.

If the answer is "autonomous agents that operate independently, make financial decisions, and run production workloads" - then the metrics are different. What matters is: Can this agent handle money safely? Can it recover from errors without human intervention? Does it get better over time?

We're building for the second use case. Our NVIDIA NeMo Agent Toolkit integration (PR #17) wasn't a performance optimization - it was a trust integration. It lets NVIDIA's agent ecosystem use our payment infrastructure because they needed agents that could transact safely, not agents that started 200ms faster.

What I actually think about performance

I'm not anti-performance. We run trading agents on BTC perpetuals where latency directly costs money. I get it.

But the performance conversation in the coding agent space has become a proxy war that distracts from harder problems. It's easier to benchmark startup time than to benchmark "how often does this agent corrupt your repo at 3 AM." It's easier to compare binary sizes than to compare "what happens when prompt injection targets your agent's payment tools."

The developers celebrating 80MB binaries today will be the same developers asking "why did this agent drain my API credits" in 6 months, because they optimized for the wrong thing.

Build the reliable thing first. Make it fast later. The reverse doesn't work - you can't add safety to an architecture that was designed without it.

Where we go from here

Our roadmap is public: RSI self-improvement, agent-to-agent payment coordination (the Agntor ecosystem validates this direction), and production-grade autonomous operation. Every feature ships through our RSI loop - measured, tested, and validated before it touches production.

We'll keep getting criticized for not being the lightest or the fastest. That's fine. We'd rather be the agent infrastructure that doesn't lose your money.

This article was written with AI assistance. All technical claims, code, and architectural decisions were validated by the author.