I Built a Team of AI Agents That Live on My Own Hardware
Gary
Editor
Building With AI · Home Lab
I Built a Team of AI Agents That Live on My Own Hardware
Five specialist agents, a coordinator I talk to over Telegram, and a model router that quietly shops between my Windows PC, a Mac Mini, and the cloud. Here is how I set it up, and the two problems that ate most of my weekends.
For the last couple of months I have been running something that still feels slightly absurd to say out loud: a small software team made entirely of AI agents, hosted on hardware I already own, that ships real projects while I am asleep. I message a "coordinator" in Telegram the way I would message a colleague, and behind the scenes a handful of specialists pick up the work, hand it between each other, and report back when it is done.
The platform underneath it is OpenClaw, an open, self-hosted framework for running long-lived agents in Docker. This is the story of how I stood it up, the agents I designed, and the two challenges that taught me the most: data that kept vanishing every time I restarted a container, and the surprisingly hard problem of getting agents to actually talk to one another.
The setup: containers, a model router, and a phone
The whole system runs as a set of Docker containers on my home network. Nothing exotic — a single docker compose up brings up the entire team plus the supporting cast: a PostgreSQL database (with the pgvector extension for embeddings), Redis for caching and queues, an n8n instance for workflow automation, Portainer so I can see what is happening, and a Cloudflare Tunnel for when I want to expose a finished app to the outside world.
The piece that makes the economics work is a model router called LiteLLM. Instead of pointing every agent at one expensive cloud model, LiteLLM sits in the middle and routes each request to whichever model makes sense. My configuration looks roughly like this:
minimax -> MiniMax M2.7 (cloud)
qwen3-32b -> Windows PC running LM Studio
glm-flash -> Mac Mini running Ollama
claude -> Claude Sonnet (explicit calls only)Minimax M2.7 is my default LLM, unless I want to just do small tasks locally and time is not important. Claude is kept out of the fallback chain entirely = agents have to ask for it by name.
I talk to the system through Telegram. Each agent has its own bot token, so I can message any specialist directly, but in practice I almost always just message the coordinator and let it delegate. Every agent also exposes a small control UI on its own local port, which is handy when I want to watch one think.
The agents I created
Rather than one general-purpose assistant, I modelled the team on how a real software group is structured — each agent has a narrow role, its own system prompt, and a model chosen to fit the job.
Coordinator
My primary point of contact. Breaks requests into steps and delegates to specialists. Runs on MiniMax with Claude as a fallback for the hard reasoning.
PM
Does the research for me then turns a vague request into a spec: problem statement, acceptance criteria, constraints. The "think before you build" agent.
Dev
Writes and debugs the actual code. Skills ask it to plan out, ensure it's got up to date skills of the software and tools it needs and build with a full testing process for each part of the product.
Testing
QA and correctness. Runs the test suites, reviews the code, and hands bugs back to Dev. Ships with pytest and Playwright pre-installed.
EA / Marketing
Calendar, research, and content, drafting posts and market research for my projects. Read-only access to everything else.
The flow mirrors a normal delivery pipeline: me → coordinator → PM → Dev → Testing, with a loop back to Dev whenever Testing finds a problem. Giving each agent a job description and a sensible model assignment turned out to matter far more than any single clever prompt. A focused agent on a cheap local model frequently beats a do-everything agent on an expensive one.
Challenge #1: the data that kept disappearing
The first hard lesson was a classic, and I walked straight into it. Containers are ephemeral by design. Anything an agent wrote inside its own container — its memory, its notes, the code it had just generated — lived only in that container's writable layer. The moment I ran docker compose down, pulled a new image, or simply restarted to apply a config change, all of it evaporated.
I lost an afternoon of agent work this way before the penny dropped. An agent would tell me, completely sincerely, that it had finished a feature — and the files were just gone, because "finished" had meant "saved inside a container I had since recreated."
The fix is unglamorous but absolute: if you want it to survive a restart, it cannot live inside the container. It has to be on a volume.
So I split persistence into two kinds. For the databases — Postgres, Redis, n8n, Portainer — I used named Docker volumes, which Docker manages and keeps around between restarts. For everything I want to see and edit myself — each agent's config and memory, the shared handoff folder, and all the project code — I used bind mounts straight to folders on the host machine:
volumes:
- ./agents/dev/config:/home/node/.openclaw:rw
- ./shared:/home/node/shared:rw
- ./projects:/home/node/projects:rwNow the container is genuinely disposable. I can rebuild the Dev agent from scratch and its memory, its work, and its identity are all still sitting in plain folders on disk. The only thing I deliberately left ephemeral is /tmp, mounted as a RAM disk, because scratch space should evaporate. The mental shift was treating the container as the engine and the host folders as the hard drive — never the other way round.
Challenge #2: getting agents to talk to each other
The second challenge was subtler and, honestly, more interesting. Once each agent worked on its own, I needed them to collaborate — for the PM to hand a spec to Dev, for Dev to hand a build to Testing, for Testing to hand bugs back. There is no shared chat window between them. So how does a handoff actually happen?
My answer was a shared folder and a strict file-naming convention. Every agent can read a common shared/ directory, and work moves through it as plain Markdown files:
TASK-<agent>-<slug>.md # work assigned to an agent
TASK-dev-<slug>-issue.md # a bug handed back to Dev
STATUS-<slug>.md # rolling status, newest line on topEach agent runs an hourly cron job that watches for files addressed to it — the PM reacts to TASK-pm-*, Dev to TASK-dev-*, and so on. When the coordinator needs to know where a project stands, it reads the top line of the relevant STATUS- file. It is essentially a team that communicates by leaving notes in a shared inbox.
Simple in theory. In practice, this is where most of my debugging time went, and the failures were rarely loud — they were silent.
Invented filenames. An agent would decide a file called
REVIEW-orDONE-was clearer than the convention. The cron watchers only match known prefixes, so those files were simply ignored. The work was done; nobody picked it up.Notes left in private rooms. Each agent has a private workspace folder. More than once an agent wrote a handoff there instead of in
shared/— the equivalent of writing a memo and locking it in your own drawer.Mismatched slugs. If the PM called a project
auth-refreshand Dev called itlogin, the status trail fractured into two and the coordinator could no longer answer "where is this at?"Polling lag. Hourly crons mean a handoff can sit untouched for up to an hour. Fine for an overnight pipeline, frustrating when you are watching live.
The cure was to write the rules down once, in a canonical HANDOFF-PROTOCOL.md that every agent reads and treats as the source of truth, ahead of its own instructions. The lesson generalises well beyond AI: a distributed team needs a shared protocol more than it needs clever individuals.
There was a second, lower-level communication problem too — between the router and the models. Streaming responses from the local models would occasionally die mid-flight with a cryptic fallback error, a single malformed chunk poisoning the whole reply. Diagnosing that meant cranking LiteLLM up to full debug logging so it would print the offending model and the raw chunk. Not glamorous, but it is the kind of plumbing that decides whether a system like this is a toy or something you can actually rely on.
What I actually do with it
The point of all this was never the plumbing — it was to have a team that builds things. A sample of what has come out of the pipeline:
Bloom — a small web app that went through the full cycle: built by Dev, tested by Testing (59 passing unit tests), signed off for deployment.
Software Memory — a knowledge-management platform with semantic search and a knowledge graph, where the Testing agent caught real security issues (including hardcoded credentials) before they shipped.
Security Software — an automated CVE scanner for self-hosted Version Control System that checks every active branch and generates AI-written mitigation steps.
Love Languages — a quiz app, fully specced, built, and tested end to end.
And on the EA/Marketing side, full go-to-market research and marketing plans for product ideas, complete with sourced market sizing.
None of these are throwaway demos. They were specced by one agent, built by another, broken and fixed by a third, with me acting as the development manager rather than the bottleneck.
What I would tell anyone starting out
If you take three things from this: decide what persists before you build anything, because losing a day of work to an ephemeral container is a lesson you only need once. Give your agents narrow roles and right-sized models — a focused agent on a cheap local model routinely beats a generalist on an expensive one. And write the communication protocol down, because the moment more than one agent is involved, coordination becomes the actual product.
I still find it slightly absurd. But it works, it runs mostly on hardware I already owned, and it has quietly become one of the most useful things in my home lab. If you have a spare machine and an afternoon, it is more achievable than it looks.
Tags
Discussion
Please to join the discussion.