An applied AI research & product lab
Agents that finish what they start.
Bombadil Labs is an independent lab working on long-horizon autonomy — systems that hold up over thousands of steps, and the instruments that prove it. We publish the math, build the products, and take a small number of serious engagements.
Trusted by teams at
What we do
Three practices, one thesis: long horizons are won, not waited for.
Research
Long-horizon reliability is arithmetic before it's engineering. We work out the math of agents that hold up — horizons, verifiers, failure economics — and publish the instruments next to the claims.
Agentic R&D
Warden, Loremaster, Mirrormere, Goldberry: products built end to end in the lab, run on our own harnesses, pointed at problems where autonomy compounds instead of merely impressing.
Consulting
A few embedded engagements at a time, for teams whose agentic systems have to work. Architecture, evaluation, and the judgment calls between them.
In the workshop
What we're building.
Four systems in motion, each a bet on the same thesis: the next order of magnitude in software comes from agents you can leave alone.
How we work
House rules, written in ink.
- 01
- Long horizons or nothing.
- Demos sprint; value runs marathons. We build for the thousandth step, where the interesting failures and the interesting products both live.
- 02
- Every claim ships with its instrument.
- If you can't check us, we haven't finished. The math in our field notes runs live on the page, against the same code we unit-test.
- 03
- Buy nines where they're cheap.
- Verifier nines cost an engineering sprint; actor nines cost a training run. Knowing which store to shop in is half the discipline.
- 04
- Small, sharp tools.
- Capability is subtraction done well. The best systems carry the fewest moving parts that still clear the bar.
- 05
- Leave things better.
- Codebases, datasets, teams — we hand back the keys in better shape than we found them.
Field notes
Dispatches from the lab notebook.
What we learn, written down while it's still sharp — with the instruments embedded, so you can check the math as you read.
Building something with a long horizon?
Tell us what you want to leave running unattended. We read everything that isn't a template, and we reply to everything we read.