With great power...

Jonathan Hall

Builder Night · May 11, 2026

The first time I tried Claude Code...

With great power
comes great stupidity

Who I am

Fractional Gopher
Daily Blogger • https://boldlygo.tech
Host of Cup o' Go podcast • https://cupogo.dev

Agenda

How I work today (Live demo! 🎉)
Where it still falls short
What I'm building past it

The setup

3 Claude Code sessions, 1 git repo
3 skills: /todo, /tdd-go, /commit
Each session in its own git worktree
Auto-accept on edits — review per-cycle, not per-edit
Plan mode rarely used — plans emerge while working
House rules in CLAUDE.md (project + global)

Backlog management

/todo

TODO.md, versioned with the code
Source of truth for what's next
"TODO means TODO" — done items are deleted
Query: /todo what's next? · /todo quick wins

Memory in the system, not the agent.

Test-first discipline

/tdd-go

Triggered automatically — a hook reminds Claude on every prompt
Red → green → refactor as three subagents
Each in fresh context, gated handoff between them
The agent can't skip the failing test

Declare success before starting.

Commit gating

/commit + hook

Lint + tests must pass to commit
Human-invoked only — Claude can't auto-trigger
Hook blocks every other path to commit

The only door, every other window locked.

Where it still falls short

The orchestrator is human

Tool approval on every command
Serial review of parallel work
Dispatch, routing, every interruption through me
Waiting

The bottleneck isn't the agent. It's me.

Telling isn't learning

Rules in CLAUDE.md get forgotten or overridden
Long contexts drift from instructions
The agent stays the same. I get smarter.
"Memory" features just bloat the prompt

Prompt engineering is fragile at best.

The agent is fallible

Says wrong things with the same confidence as right things
Hallucinated APIs, plausible-but-broken reasoning
I can only catch what I know to check for
The scary errors are the ones I can't catch

...so are humans.

Replacing stupidity with responsibility

What they're selling

The magic: "general intelligence"
The fix: always "more" — bigger context, more prompt, more instruction, ultimately more GPUs, more power plants, more more MORE!

My take

The magic: pattern matching on steroids
The fix: specialization — smaller agents, narrower scope, single-purpose tools. Less is more.

What I've learned. What I'm betting on.

Learned

LLMs amplify the system you put around them
Declare success before starting
Make the right path the only path

Betting

The orchestrator doesn't have to be human
Keep the agent dumb; make the system smart
Human gates are scaffolding, not structural

Why Lindy

Started exploring — nothing coherent existed
Pieces exist: Finster's reviewer swarm, Yegge's Gas Town, the Wiggum loop, Claude skills
Nothing puts them together
And nothing takes antifragility seriously — which I think is foundational

An experiment in code. Some of it will turn out wrong.

Why "Lindy"?

The Lindy effect: things that last tend to keep lasting
Antifragile systems get stronger from use, not despite it (Taleb)
Uses theater terminology internally (scene, beat, take, etc.)

Lindy directs; I produce

I create a backlog of scenes (potentially prioritized)
Lindy schedules and dispatches; scenes run autonomously
I review post-scene

Build the rails

Hooks and scene gates, not prompt rules
Minimal prompts, isolated context per take
Built in: only what's foundational — RED/GREEN (adversarial review for agents)
Configurable: everything else — evolves through use

To err is ... also human

Humans are fallible too — we've built systems for that
- Adversarial review
- Define success first
- Static analysis
- Fast recovery
Lindy adapts them for AI agents

The collaborator changes. The discipline doesn't.

Lindy today

Sandwiched between an OpenCode skill and raw OpenCode
Per-scene workflow runs end-to-end
Has completed real bug fixes — without tests

A scene, today

Per-scene workflow diagram for current Lindy implementation

What's coming

In progress

TDD loop: produces tests, proves the fix, prevents regression

Then: start building Lindy with Lindy.

Backlog

Review swarm
Model selection per beat
Decomposition
Multi-scene scheduler
Web UI

A scene, tomorrow

Per-scene workflow diagram for aspirational Lindy implementation

Other goals

Run on cheaper models where possible
Not locked to a single provider
Home lab deployment
Multi-project orchestration
Eventually: multi-tenant

Thanks

gitlab.com/flimzy/lindy

What did I get wrong?