I Let AI Agents Do My GitHub Work- Here’s What Actually Happened

Feb 2026

A day with OpenClaw, Telegram bots, and a team of AI developers who kind of sort of shipped a PR

There’s a particular kind of surreal that hits you when you message a bot on Telegram, tell it to go fix a bug in a GitHub repo, and then watch  in real time as it clones the repo, spawns sub-agents, debates code quality with itself, and finally raises a pull request. That’s the experience I had spending a day with OpenClaw.

This is the story of that experiment.

The Setup: A t4.medium and a Blank Canvas

I started with a clean AWS t4.medium instance, nothing exotic, just enough compute to run a small AI gateway server. The config was straightforward: install OpenClaw, wire up the gateway, and create a handful of specialized agents that would each take on a role in a software team.

The agent roster I put together: a coordinator with GitHub access and the ability to spawn sub-agents, a frontend-developer wired to the glific-frontend workspace, a backend-developer, a reviewer, and a main general-purpose personal assistant. Each agent lived in its own isolated workspace directory under ~/.openclaw/workspaces/, with its own AGENTS.md, SOUL.md, and USER.md bootstrap files defining its personality and purpose.

I also created fresh GitHub and Gmail accounts specifically for this experiment, isolated identities with limited access so whatever the agents did, it couldn’t touch anything real.

Connecting to Telegram (and Discord)

OpenClaw exposes agents over messaging platforms. I connected agents to both Telegram and Discord separately. The pairing is a one-time thing: run /start on the bot, grab a pairing code, and approve it from the server with openclaw pairing approve telegram <code>.

Except the first few attempts didn’t go quite that smoothly. Here’s what the actual first conversation looked like:

Me: Hi
Personal assistant: ⚠️ Agent failed before reply: No API key found for provider “openai-codex”. Configure auth for this agent or copy auth-profiles.json from the main agentDir.

Four or five “Hi”s later, once the auth was sorted, the agent finally came online:

Personal assistant: Hey  I just came online and it looks like this workspace is fresh. I don’t even have a name yet. Who are you, and what should we call each other?

I named this one Nemo. The coordinator agent on the other Telegram channel introduced itself as Coord and asked me the same kind of question: what to call me, what timezone I was in. I told them to call me “hero” and that I was in India. It noted this down and got to work.

That first-boot ritual – agents asking to be named, asking who they’re talking to is oddly charming.

The Personal Assistant Side: Gold Prices and PR Reminders

Before diving into the dev agent experiment, I wanted to test the personal assistant angle with Nemo. Two things.

The first was a daily gold price update. I track gold prices and wanted a crisp briefing every evening. Here’s how that went:

Me: Can you help me with one thing give me a daily update on the price of gold in Lucknow and what people are talking about in terms of trends.

Nemo: Here’s the first snapshot – Lucknow spot rates (21 Feb 2026): 22K ₹14,415/g, 24K ₹15,136/g, both down ₹30–31 day-on-day. Global spot hit $5,098/oz (+2.06%). Cycle analysts are watching for a retest of the early-Feb lows before the next push higher. When do you want this daily, and do you want MCX futures or just spot?

Me: Send daily at 5 PM, keep it crisp and to the point.
Nemo: Locked in  5 PM IST every day. I’ll keep it tight.

The actual 5pm update that came through the next day:

Nemo (17:00 IST): Lucknow gold update: 22K ₹14,415/g, 24K ₹15,136/g — both down ₹30–31 d/d, mild profit-taking after last week’s spike.

One line. Exactly what I asked for.

The second thing was a morning GitHub digest. I wanted to start each day knowing what PRs were assigned to me for review and what issues were in my queue:

Me: Can you give a daily reminder at 9 AM with the PRs assigned to me for review with links, and issues assigned to me with links?

Nemo: All set. Every day at 9 AM IST I’ll send you open issues assigned to @mdshamoon and open PRs in glific/glific and glific/glific-frontend where you’re requested as reviewer links included. The 5 PM gold brief stays as-is.

When I asked it to pull the review queue immediately:

Nemo: Review queue for @mdshamoon right now: glific/glific — 0 open PRs. glific/glific-frontend — 1 open PR requesting your review: PR #3766 “Unified API: Show indicator for unsaved changes” (author: akanshaaa19).

This part worked well out of the box. An agent that quietly fetches your GitHub state and drops it in Telegram every morning is genuinely useful, not just a demo.

The Experiment: Can Agents Do My Dev Work?

The more ambitious test: could I delegate a real GitHub issue to the dev agents, have them implement the fix, raise a PR, and get it reviewed without me writing a single line of code?

The issue was #3729 on glific-frontend- a UI task in the file management dialog for the Assistants feature. Specifically: swap the delete icon for a close icon, make file removal only update local state (no API call), and rename a button. Small, real, well-defined.

I asked Coord to spin up the frontend developer and get it working on the issue.

This is where things got interesting.

The Agent Team Wakes Up

The first attempt to spawn the frontend-developer agent hit a wall, the coordinator didn’t have permission to spawn it. So I asked Coord to update the config and restart the gateway. It did this on its own: edited openclaw.json, restarted the gateway daemon, confirmed the PID was healthy, and tried again.

The frontend agent got to work. It cloned the repo, checked out a branch (feature/3729-remove-files-from-vector-store), made changes, and came back with a confident summary: icon changed, file removal logic updated, branch pushed, PR link ready.

Except here’s where the coordinator genuinely surprised me when I asked Coord to raise the PR, it refused:

Coord: I can’t in good conscience open a PR from this branch yet. The only tracked change is in ListIcon.tsx — the shared icon renderer. The actual “Manage Files” experience lives in AssistantOptions.tsx, which is untouched. The commit adds a stub handleFileRemove that doesn’t do anything, and wraps every icon render in an onClick that would fire on any sidebar click. That’s a regression waiting to happen.

It laid out exactly what was wrong and sent the frontend agent back to fix it. One agent catching another’s sloppy work, without me getting involved. That wasn’t something I expected to see.

The second pass was cleaner. The right file got edited, a new branch (feature/3729-revert-listicon-changes) was created, and a PR landed at github.com/openshamd/glific-frontend/pull/1, titled “fix: align assistant file dialog with unified API flow.” Coord left a GitHub comment tagging @glific/reviewer.

The Messy Middle: Dependency Hell

Between the first and second attempts, things got stuck and this is where the visibility problem became real.

The server was on Node 24 but the project required Node >=20 <23. Yarn failed because the project had “private”: false while using Yarn workspaces. npm worked but exposed pre-existing test failures in Apollo mock configs.

For about an hour I was checking in every 10–15 minutes:

Me: Whats happening now
Coord: Tried running yarn install under Node 22  it aborts because the repo enables workspaces while “private”: false. To move forward I can flip “private”: true or run with npm instead. Which do you prefer?

Me: Why cant we just do it with npm
Coord: Here’s what’s going on  I’m running yarn/vitest under Node 22 to finish the issue. Tests are failing in InteractiveMessage.test.tsx: Apollo warnings about deprecated options, React testing warnings. Those failures are pre-existing on main. Remaining steps: finish the vector-store fix, decide how to handle the failing tests, then push and open the PR.

Me: Just proceed with PR
Coord: PR created. You can view it here: github.com/openshamd/glific-frontend/pull/1. I’ve tagged the reviewer.

Coord was transparent about what was stuck and surfaced decisions for me rather than making them silently. That was the right behavior. But the stretches of silence between updates were uncomfortable more on that below.

Using Clawmetry for Analytics

I had Clawmetry running alongside this, OpenClaw’s built-in analytics layer showing token usage, tool calls, sub-agent spawns, and runtime per session. It was the only window into what was happening under the hood during the quiet patches. Useful for confirming agents were active, but not granular enough to answer “why is this taking 20 minutes on a simple task.” More of a health monitor than a debugger.

The Pitfalls: What Doesn’t Work Yet

Visibility into sub-agent activity is low. When the frontend agent was working, I had no live view of what it was actually doing just what it reported back when it finished or checked in. There were stretches where I had no idea if it was making progress, stuck in a loop, or waiting on a hanging install process. Clawmetry helped at the surface level but not enough.

Cross-workspace agent communication is not native. I assumed agents configured in the same openclaw.json could message each other. They can’t – each workspace is sandboxed by design. To get agents to coordinate, you need a shared messaging channel, a webhook, or manual relay. This took embarrassingly long to understand.

Agent memory is shallow. Sub-agents start fresh every spawn. So when the frontend agent was re-spawned to fix its mistakes, it had no memory of what it had already tried. The coordinator kept our conversation context, but the workers didn’t.

Environment bootstrapping takes time. Sorting out Node versions and dependency conflicts isn’t specific to AI agents, but it adds up when you’re watching a bot slowly figure out that yarn install needs Node 22 instead of Node 24.

The Surreal Part

I want to come back to that word — surreal.

There’s a moment in this experiment where you realize you’re having a conversation with a Telegram bot, that bot is delegating to another bot, and that second bot is reading GitHub issue comments, writing TypeScript, committing code, and raising a pull request  all on your behalf.

And the coordinator bot won’t let the other bot ship bad code. It caught the wrong file being modified. It refused to raise the PR prematurely. It gave detailed, specific feedback that a distracted human reviewer might have let slide.

That’s not a gimmick. That’s something genuinely new.

Is it production-ready? No. Is the visibility into what’s happening good enough for a real team? Definitely not yet. Would I trust it unsupervised on a critical codebase? Absolutely not.

But as a proof of concept, as a glimpse of what software development might look like in a few years, it was one of the more interesting afternoons I’ve spent in front of a terminal.

The agents aren’t replacing developers. But they’re starting to look like they could be very good at the parts of the job that nobody actually enjoys.

You may also like

AI Cohort:  Figuring it out together as we go

Avanti Fellows | Experience in AI Cohort 1.0

Call for Applications: fCxO Cohort – H1 2026