My AI Server Crashed at 3 AM (And I Wasn't Awake to See It)

Nobody warns you about the boring parts.

You build a personal AI assistant, you get it working, you feel great about yourself. Then three weeks later it crashes in the middle of the night and you find out the hard way that 2GB of RAM is not enough to run an AI platform and a workflow engine simultaneously.

That’s what happened when I was running Klaus, my AI, on a t3.small ¹. OpenClaw ² was sitting at around 530MB of RAM. N8N ³, my workflow orchestration layer, was pulling another 300MB. Add OS overhead and you’ve got 2GB doing the work of 3GB, with no swap configured.

The Linux OOM killer is not polite. It picks a process and ends it. In this case, it ended Klaus.

The fix was annoying and obvious

Two things: add a swapfile and upgrade the instance.

sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

That buys breathing room when RAM pressure spikes. Swap is slow, so it’s not a substitute for real memory, but it stops the OOM killer from making decisions at 3 AM.

Then I upgraded to a t3.medium ¹. 4GB of RAM, 2 vCPUs, roughly $30/month when the AWS free tier credits eventually run out. I’m sitting at 1.7GB used with comfortable headroom. That headroom is intentional. I’m not downsizing again.

The part that’s actually keeping me up now

RAM is solved. Disk is the current problem.

The EC2 instance is at 83% disk utilization: 12GB used out of 14GB, about 2.3GB free. When disk fills completely on a Linux server, things break in creative ways. Log files stop writing. Processes fail silently. Git can’t commit. It gets weird.

I’ve got log retention scripts running, but logs aren’t really the problem. The problem is the aggregate weight of a system that’s always running and always writing. Every KB sync, every SQL dump, every git auto-sync from HP Desktop runs every five minutes. None of it is big. All of it adds up.

This is the part of “self-hosted AI” that doesn’t show up in the demos.

The credits math

Here’s where I actually am, seven and a half weeks in:

AWS infrastructure has cost me $0 out of pocket, covered by $120 in AWS credits ($100 from the free tier, $20 from a setup promotion). About $55 remaining. At roughly $30/month burn, those credits run out around mid-May.

The AI model costs are where it gets real. I started on Claude Max at $100/month in late January. That felt like enough until it wasn’t, so I upgraded to the $200/month plan on March 1. Total Claude spend so far: $300. Add $11.54 for the ricoordonio.com domain and $0 for AWS (thanks, credits) and the all-in cost through mid-March is $311.54, about $41/week.

After the credits run out, the ongoing cost settles at around $230/month: $200 for Claude Max plus roughly $30 for EC2.

That’s not a complaint. It’s just the number. I went in knowing it wasn’t free, and the value is real. But I want to be honest about what “self-hosted AI” actually costs when you account for all of it, not just the interesting parts.

What running this has actually been like

The t3.small was a calculated bet. I knew the memory footprint going in and figured it was worth trying the cheaper instance before committing to more. It didn’t work out. That’s fine. You try the smaller thing, you find the ceiling, you upgrade and move on.

The disk situation is different. That one crept up because I didn’t fully account for write volume when I designed the sync architecture. Every five-minute auto-sync adds up over weeks. It’s not dramatic, just something to stay ahead of.

If you’re thinking about running your own agent, know that at some point you’ll be SSHing into a server at 11 PM to figure out why a process died. That’s part of it. Get comfortable with it.

References

Amazon EC2 t3 instances — Burstable general purpose compute
OpenClaw — Self-hosted AI agent platform
N8N — Open source workflow automation