I now run an AI-driven, three-agent system that finds its own work, executes it, and routes the output to me for review.
My job has shifted from doing the work to directing it. I set the strategy, review what the agents produce, and handle the edge cases that need human judgment. The agents handle implementation.
It’s the closest thing I’ve found to actually using AI as an autonomous collaborator, rather than an occasional assistant.
Here’s the full story.
How the system actually works
Let me walk through the architecture before getting into how it was built.
The task queue
At the center of everything is a simple folder system:
~/gravitykit/tasks/
inbox/ -- new tasks land here
in-progress/ -- runner moves task here while working
review/ -- completed output, waiting for my approval
blocked/ -- agent got stuck and needs input
done/ -- I move approved work here
Tasks are markdown files with a YAML header (title, type, priority, due date) and a checklist of steps. The agent works through each step, ticking them off as it goes. If it finishes, the file moves to review/. If it hits something it can’t resolve without input, it moves to blocked/ and pings me.
This checklist format also solves a practical problem: if a run gets interrupted, the next sweep can see which steps are already ticked and pick up from where it left off rather than starting over.
The content pipeline
One of the more useful things the task queue enables is a fully staged content pipeline. It works like this:
- The Scout researches and surfaces content ideas based on support tickets, keyword gaps, and competitor analysis, then logs them as tasks in the inbox
- The Completer picks up a content idea, researches it further, and produces a structured outline, which lands in
review/ - I review the outline, approve it (or edit it), and optionally drop in screenshots or reference material
- The Completer picks it back up, writes the full draft, and pushes it directly to the site as a draft post
I’m involved at the outline review step and again at final approval. Everything else runs without me.
Three agents
The system has three agents, each with a distinct role.
The Scanner monitors our internal tools continuously: Slack, Linear, and Notion. It’s looking for anything that needs a marketing response. A teammate asks for copy in a Slack thread. A Linear ticket gets tagged for marketing input. A Notion page needs updating. When the Scanner finds something, it formats it as a task and drops it in the inbox.
The Scout reads our support desk every day. Pre-sales questions, support tickets, customer language in the wild. It’s specifically hunting for content gaps (topics customers ask about that we haven’t written about), documentation holes, and marketing opportunities. When it finds something worth acting on, it logs a task.
The critical thing about these two agents: they generate most of their own task queue. I’m not sitting there writing briefs. The system is identifying the work, structuring it, and queueing it up.
The Completer is the executor. It runs on a 30-minute cycle via LaunchD (Apple’s built-in job scheduler, which runs persistently in the background on macOS). It checks the inbox, picks up the next task, reads the instructions, and works through it step by step. When it’s done, it moves the file to review/ and sends me a Slack message with a summary of what it did.
Skills and platform access
This is the part that makes the system genuinely useful rather than just a demo.
Each agent has access to a library of specialized marketing skills. These aren’t general AI capabilities. They’re specific, documented procedures for things I actually do: how to pull analytics data, how to structure a competitor comparison, how to write in GravityKit’s voice, how to format a help doc.
Beyond skills, each agent is connected to the platforms we actually use via MCP (Model Context Protocol) integrations. This means the agents can:
- Pull live analytics data from our dashboards
- Push content directly to our WordPress site
- Edit existing pages and posts
- Save documentation articles as drafts in our docs system
- Run SEO keyword research
- Read and post to Slack
- Pull data from Linear and Notion
The distinction matters. These agents aren’t just generating text that I then paste somewhere. They’re operating the actual tools. When the Completer finishes a documentation article, it doesn’t hand me a Word doc. It saves the draft directly to the docs system.
Where the agent’s instructions live
One design decision worth explaining: the agent’s behavioral rules don’t live in the runner script. They live in dedicated CLAUDE.md files that Claude Code reads automatically when it starts, combined with persona files that give the Completer a specific identity depending on the type of task it’s working on.
This separation matters. The CLAUDE.md handles global behavior: how to run a pre-flight check, how to format questions if the brief is unclear, how to write the self-review section at the end of each task, when to move something to blocked/ versus just proceeding with reasonable assumptions. The persona files shape how the agent writes and thinks for a given task type. Skills then invoke specific brand and style guides at a more granular level — the voice, the format, the rules for that particular piece of output.
Keeping instructions in files rather than baked into code means I can edit the agent’s behavior without touching the scripts. It’s version-controlled, readable, and easy to iterate on.
The problem I was trying to solve
I’m the growth and operations lead at GravityKit, a WordPress plugin company. My job involves a lot of repeatable work that still requires judgment: scanning Slack for teammate requests that need a marketing response, reading through support tickets looking for content gaps, writing competitor research briefs, drafting documentation articles, updating landing pages.
None of this is particularly hard. But all of it requires context, attention, and time. And a lot of it follows a pattern.
I kept thinking: this is exactly the kind of work AI should be able to handle. Not replacing judgment, but handling the execution once the direction is clear.
The question was how to actually build it.
Designing the system
I didn’t start with code. I started with a conversation.
I use a personal AI assistant (a custom Claude-powered bot I run via Telegram) for day-to-day tasks and thinking. So I did what I usually do with a hard design problem: I talked it through over several sessions.
We went back and forth on the architecture. What should trigger tasks? How should the agent decide what to do first? What happens when it hits an edge case? What’s the folder structure? How does it notify me when something’s done?
By the end, we had a full implementation spec: folder structure, task file format, scheduling approach, agent behavior rules, Slack notification design, known risks, phased build plan, the works. It was written up as a detailed markdown document.
That document became the handoff.
The build: one afternoon with Claude Code
I opened Claude Code, handed the brief to Opus 4.7 and said: build it.
By the end of the afternoon, it was running. The folder structure existed. The runner script worked. LaunchD was configured. The agents were picking up test tasks from the inbox, working through them, and moving files to the review folder when done.
I didn’t write a single line of code myself. I reviewed the output, flagged a few things to adjust, and directed the build from a high level. That was my job.
This felt worth noting because the narrative around AI and technical work often assumes you need to be a developer to build something like this. You don’t. You need to be able to clearly describe what you want, review what you get, and iterate on it. Those aren’t developer skills. They’re thinking skills.
What my job looks like now
This is the part I didn’t fully anticipate before building it.
My role has genuinely changed. I’m not executing tasks anymore. I’m managing a team that executes tasks.
I review output. I approve what’s good. I send things back for revision with notes on what needs changing. I handle edge cases that need human judgment. I set the strategic direction for what we’re actually trying to accomplish.
The Scanner and Scout surface the work. The Completer does the work. I decide what matters, review the results, and stay focused on the things that actually need me in the loop.
The skill that matters most in this setup, the one I’ve gotten better at since building this: writing a clear brief. The quality of the agent’s output tracks almost directly with the quality of the task description. Vague in, vague out. Specific, well-structured brief in, solid work out.
That’s not a new lesson. It’s what good managers have always known about delegating to humans. It just turns out it applies equally well to agents. I’ve written more about this systems-level thinking in WordPress marketing if you want to go deeper on the underlying mental model.
A few honest notes
This isn’t without limitations.
The system only runs when my Mac is awake. LaunchD doesn’t fire with the lid closed. Tasks queue up and get processed when the machine is on, which works fine for my workflow but wouldn’t suit something that needs to run 24/7.
Long or complex tasks can occasionally hit context window limits. The checklist format helps by breaking big tasks into smaller steps, but it’s something to be aware of.
And the pre-flight check matters. If a task description is too vague, the agent either makes assumptions that lead to mediocre output or moves it to blocked/, which means I have to clarify before anything happens. Either outcome is fine, but it reinforces the brief-writing point.
Why I’m writing this
I built this because I had a specific problem and a specific context. I’m not a developer. I wasn’t looking to build an impressive technical project. I was looking to use AI to its full potential, not just as a faster way to do what I was already doing.
What surprised me is how accessible this actually was. The design work took longer than the build. And the design work was just thinking, writing, and iterating. Things I’ve always done.
If you’re in a similar role, running marketing or operations for a small-to-mid-sized SaaS company, and you’re doing a lot of repeatable work that follows a pattern, I think the architecture here is worth understanding. Not because everyone should build exactly this, but because the underlying pattern (agents that find their own work, an agent that executes, you as PM) applies broadly.
You don’t have to be a developer to build a team. You just have to know what you want and be able to describe it clearly.
This post is based on a system I built for internal use at GravityKit. Specific platform names and internal details have been kept general where appropriate.