With great power...

Jonathan Hall

Builder Night · May 11, 2026

The first time I tried Claude Code...

With great power
comes great stupidity

Who I am


  • Fractional Gopher
  • Daily Blogger • https://boldlygo.tech
  • Host of Cup o' Go podcast • https://cupogo.dev

Agenda


  • How I work today (Live demo! 🎉)
  • Where it still falls short
  • What I'm building past it

The setup


  • 3 Claude Code sessions, 1 git repo
  • 3 skills: /todo, /tdd-go, /commit
  • Each session in its own git worktree
  • Auto-accept on edits — review per-cycle, not per-edit
  • Plan mode rarely used — plans emerge while working
  • House rules in CLAUDE.md (project + global)

Backlog management

/todo


  • TODO.md, versioned with the code
  • Source of truth for what's next
  • "TODO means TODO" — done items are deleted
  • Query: /todo what's next? · /todo quick wins

Memory in the system, not the agent.

Test-first discipline

/tdd-go


  • Triggered automatically — a hook reminds Claude on every prompt
  • Red → green → refactor as three subagents
  • Each in fresh context, gated handoff between them
  • The agent can't skip the failing test

Declare success before starting.

Commit gating

/commit + hook


  • Lint + tests must pass to commit
  • Human-invoked only — Claude can't auto-trigger
  • Hook blocks every other path to commit

The only door, every other window locked.

Where it still falls short

The orchestrator is human


  • Tool approval on every command
  • Serial review of parallel work
  • Dispatch, routing, every interruption through me
  • Waiting

The bottleneck isn't the agent. It's me.

Telling isn't learning


  • Rules in CLAUDE.md get forgotten or overridden
  • Long contexts drift from instructions
  • The agent stays the same. I get smarter.
  • "Memory" features just bloat the prompt

Prompt engineering is fragile at best.

The agent is fallible


  • Says wrong things with the same confidence as right things
  • Hallucinated APIs, plausible-but-broken reasoning
  • I can only catch what I know to check for
  • The scary errors are the ones I can't catch

...so are humans.

Replacing stupidity with responsibility

What they're selling


  • The magic: "general intelligence"
  • The fix: always "more" — bigger context, more prompt, more instruction, ultimately more GPUs, more power plants, more more MORE!

My take


  • The magic: pattern matching on steroids
  • The fix: specialization — smaller agents, narrower scope, single-purpose tools. Less is more.

What I've learned. What I'm betting on.


Learned

  • LLMs amplify the system you put around them
  • Declare success before starting
  • Make the right path the only path

Betting

  • The orchestrator doesn't have to be human
  • Keep the agent dumb; make the system smart
  • Human gates are scaffolding, not structural

Why Lindy


  • Started exploring — nothing coherent existed
  • Pieces exist: Finster's reviewer swarm, Yegge's Gas Town, the Wiggum loop, Claude skills
  • Nothing puts them together
  • And nothing takes antifragility seriously — which I think is foundational

An experiment in code. Some of it will turn out wrong.

Why "Lindy"?


  • The Lindy effect: things that last tend to keep lasting
  • Antifragile systems get stronger from use, not despite it (Taleb)
  • Uses theater terminology internally (scene, beat, take, etc.)

Lindy directs; I produce


  • I create a backlog of scenes (potentially prioritized)
  • Lindy schedules and dispatches; scenes run autonomously
  • I review post-scene

Build the rails


  • Hooks and scene gates, not prompt rules
  • Minimal prompts, isolated context per take
  • Built in: only what's foundational — RED/GREEN (adversarial review for agents)
  • Configurable: everything else — evolves through use

To err is ... also human


  • Humans are fallible too — we've built systems for that
    • Adversarial review
    • Define success first
    • Static analysis
    • Fast recovery
  • Lindy adapts them for AI agents

The collaborator changes. The discipline doesn't.

Lindy today


  • Sandwiched between an OpenCode skill and raw OpenCode
  • Per-scene workflow runs end-to-end
  • Has completed real bug fixes — without tests

A scene, today

Per-scene workflow diagram for current Lindy implementation

What's coming


In progress

  • TDD loop: produces tests, proves the fix, prevents regression

Then: start building Lindy with Lindy.

Backlog

  • Review swarm
  • Model selection per beat
  • Decomposition
  • Multi-scene scheduler
  • Web UI

A scene, tomorrow

Per-scene workflow diagram for aspirational Lindy implementation

Other goals


  • Run on cheaper models where possible
  • Not locked to a single provider
  • Home lab deployment
  • Multi-project orchestration
  • Eventually: multi-tenant

Thanks

gitlab.com/flimzy/lindy

What did I get wrong?