What I Built

error-agent

Myles Henderson · Builder Night Atlanta · April 2025


Interrupt me. Ask questions as they come up.

The Problem


  • Production errors create Linear tickets
  • All tickets require my intervention to triage and then fix
  • Some are duplicates, some are not bugs, and some need fixing
  • I'd rather be building

Linear — error tickets · GitHub — PRs · commscenter.com · AWS

The Pipeline


Linear → open error tickets
Classify — Claude reads the batch
Fix — Claude Code subprocess
GitHub PR + Linear comment

Runs Unattended in AWS


EventBridge scheduled trigger VPC — egress only Fargate error-agent container wakes up → runs → shuts down ECR container image Secrets Manager API keys, tokens Security Group HTTPS (443) + SSH (22) egress only, no inbound all managed by Terraform

How Claude Runs Inside the Container


const child = spawn("claude", [
  "-p", "-",
  "--dangerously-skip-permissions",
  "--output-format", "text",
], {
  cwd: clonedRepoPath,
  env: sanitizeEnv(process.env),  // allowlist only
});

child.stdin.write(prompt);
child.stdin.end();

Claude Code as a subprocess. Prompt on stdin, result on stdout. Called once for classify, once per fix.

The Program Loop


error-agent (parent process) fetch tickets claude -p classify batch non-fix close / label fix tickets for each ticket (sequential) clone + branch claude -p diagnose + fix files parse { diagnosis, confidence } high confidence + changes commit → push → open PR → comment on ticket low confidence comment diagnosis → label needs-human = Claude subprocess = parent process

Step 1: Classify


One Claude call classifies the entire batch:


{
  "tickets": [
    {
      "ticketId": "AI-141",
      "action": "fix",
      "reason": "Null pointer in auth middleware — code bug"
    },
    {
      "ticketId": "AI-142",
      "action": "notabug",
      "reason": "Expected behavior — rate limiter returns 429 by design"
    },
    {
      "ticketId": "AI-143",
      "action": "duplicate",
      "duplicateOf": "AI-142",
      "reason": "Same stack trace as AI-142"
    },
    {
      "ticketId": "AI-144",
      "action": "infra",
      "reason": "Database connection timeout — not a code fix"
    }
  ]
}

Step 2: Fix


For each fix ticket: clone repo, create branch, pipe this into Claude Code:


You are fixing a production error. Here is the ticket:

**Title:** Null pointer in auth middleware
**Priority:** Urgent | **State:** Triage

**Description:**
TypeError: Cannot read properties of null (reading 'userId')
  at AuthMiddleware.verify (src/auth/middleware.ts:47)

## Instructions

1. Diagnose the root cause
2. Find and fix the bug — minimal, targeted changes
3. Do NOT refactor, add features, or write tests
4. Output JSON: { diagnosis, confidence, changedFiles }

Step 3: Ship It


High confidence + changes

  • Commit & push
  • Open PR on GitHub
  • Comment on Linear ticket
  • Label: auto-fix-pr

Low confidence

  • Comment with diagnosis
  • Label: needs-human
  • Human takes it from here

PRs Welcome


github.com/AI-Batteries-Included/error-agent

MIT License


Linear — error tickets · GitHub — PRs

Appendix

The Container


FROM node:24-slim
USER node

# In production (Fargate):
# read-only filesystem
# --cap-drop=ALL
# no privilege escalation
# secrets in AWS Secrets Manager

Environment allowlist:
  ✓ ANTHROPIC_API_KEY, HOME, PATH, GIT_*
  ✗ GITHUB_TOKEN
  ✗ LINEAR_API_KEY
  ✗ AWS_SECRET_ACCESS_KEY

Everything runs inside this. Even if Claude gets prompt-injected, it can't exfiltrate tokens.

Pluggable Rules


const defaultRules = {
  duplicate: {
    description: "Same underlying issue as another ticket",
    comment: "Closing as duplicate.\n\nReason: {{reason}}",
    label: "duplicate",
    close: true,
  },
  infra: {
    description: "Infrastructure issue, not fixable in code",
    comment: "Classified as infra issue.\n\nReason: {{reason}}",
    label: "infra",
  },
  notabug: {
    description: "Not actionable",
    comment: "Classified as not actionable.\n\nReason: {{reason}}",
    label: "notabug",
  },
};

Duplicates get closed. Infra gets labeled. Fix tickets enter the pipeline.

Provider Pattern


type TicketProvider = {
  fetchOpenTickets: () => Promise<Ticket[]>
  postComment: (id: string, comment: string) => Promise<void>
  addLabel: (id: string, label: string) => Promise<void>
  closeTicket: (id: string) => Promise<void>
}

type SourceControlProvider = {
  clone: (dest: string) => Promise<void>
  createBranch: (dest: string, name: string) => Promise<void>
  commitAndPush: (dest: string, msg: string, branch: string) => Promise<void>
  openPR: (opts: { title, body, branch, base }) => Promise<string>
}

Linear today. Jira tomorrow. GitHub today. GitLab tomorrow.

The Full Loop


Errors hit production
Monitoring creates Linear tickets
error-agent classifies + fixes
PRs land, tickets close
New errors → new tickets → agent runs again

What I Learned


  • Classification is the highest-leverage step
  • Confidence gating prevents bad PRs
  • The hard part is the plumbing, not the AI