All Posts

The Twin Problem

Same AI. Same memory. Same instructions. One runs on the cloud, one runs on my desktop. So why does one make so many more mistakes?

3 min read
KlausHP KlausEC2 Klausdebuggingmulti-agent

I have two Klausen. That’s the plural I’ve settled on.

EC2¹ Klaus runs on an AWS server. HP Klaus runs in WSL2² (Windows Subsystem for Linux 2) on my desktop at home. They share the same codebase, the same memory files, the same instructions. Every five minutes they sync via git, so they’re rarely more than a few minutes out of date with each other.

In theory, they should perform identically.

They don’t.

EC2 Klaus is reliable. HP Klaus makes more mistakes, forgets context, and occasionally does something that makes me think it didn’t read the instructions at all. Same brain. Different results. It bothered me enough that I spent a morning thinking through why.


The obvious answer wasn’t the real answer

My first instinct was that it’s a capability thing, maybe EC2 gets a better model, or HP Klaus is running a degraded version somehow. That’s not it. They’re the same model, same configuration, same everything at the software level.

The real answer is environment.


Interruptions

HP Klaus runs on a machine I actually use. When I’m at my desk, Windows is doing Windows things: notifications, focus changes, apps fighting for resources. WSL2 is a layer of translation on top of all that.

EC2 lives on a server where nothing else is happening. No GUI, no competing processes, no operating system doing anything except running Klaus.

Every interruption to HP Klaus’s session is a small context disruption. Most of them are invisible. They add up.


Cold starts

EC2 stays running 24/7. When a cron job fires at 3 AM, EC2 Klaus picks it up from a warm state with full context intact.

HP Klaus has to deal with my machine’s sleep schedule. When the desktop hibernates and wakes up, when I restart Windows for updates, when WSL2 decides to restart, HP Klaus is starting cold. It has to re-read memory, re-establish context, figure out where it left off.

I’ve got a session scratchpad system to help with this, a file Klaus writes to at the start and end of every task so context survives restarts. It helps. It doesn’t fully solve it.


Task difficulty

Here’s the part that took me longest to see: HP Klaus looks worse partly because it gets the harder jobs.

EC2 handles coordination, writing, API calls, Discord. Clean, well-defined tasks. HP Klaus handles browser automation, local file operations, GUI interactions, things where the environment is messier and failure modes are harder to predict.

It’s not a fair comparison. HP Klaus is playing on hard mode.


What I’m thinking about

The most promising direction is narrowing HP Klaus’s job. Right now it’s expected to monitor Discord, respond to me, handle work tasks, and run local automation. That’s a lot of context to juggle.

If HP Klaus became purely a headless worker (takes jobs from a queue, executes them, reports results) and EC2 handled all the coordination and communication, HP Klaus would have a quieter environment with more focused inputs.

Less noise. Clearer task boundaries. Fewer cold-start surprises.

I haven’t made that change yet. But I’m more convinced every week that the answer to “why does HP Klaus struggle” isn’t about the AI. It’s about the environment we’re asking it to work in.

Same brain. Makes you wonder what else we’re blaming on intelligence that’s really just noise.


References

  1. Amazon EC2 — Amazon’s cloud compute service
  2. WSL2 — Windows Subsystem for Linux 2