Harness-First Agent System
Harness-First Runtime for AI Agents
PonyBunny gives AI agents a controlled execution harness — so they can plan, act, replay, and audit work like real software systems.
pb run "analyse company financials"
runId: pb_01H8xK9...
planner: default
executor: evented
agents: scout, analyst
tools: companies-house, web, file-parser
✓ execution completed
✓ artifact created
✓ audit log saved
✓ replay available
Why most agent systems break in production
Opaque execution
Most agents can act, but you can’t reliably see why they made a decision.
Fragile tool use
A small model or tool change can silently break the workflow.
No replayability
When an agent fails, reproducing the exact run is often impossible.
No audit trail
In real-world work, “it seemed to work” is not good enough.
PonyBunny solves this with a harness-first architecture for AI execution.
Built as an execution harness, not just an agent wrapper
PonyBunny is designed to make agent behaviour controllable, observable, replayable, and auditable.
Controlled Execution
Run tasks inside a bounded runtime, not a free-floating model loop.
Planner–Executor Separation
Keep decision-making and execution distinct for better reliability and inspection.
Tool Harness
Connect skills, MCP servers, scripts, and external systems through a controlled layer.
Replayable Runs
Re-run tasks with traceable context and compare outcomes across changes.
Audit Logs
Keep a structured record of what happened, what was used, and what was produced.
Model Flexibility
Move reliability into the harness layer instead of depending on one model vendor.
How PonyBunny Works
The model can reason. PonyBunny makes execution structured.
Intent Intake
User intent enters a controlled task boundary.
Planning
A planner produces a structured execution plan.
Agent Execution
Agents perform scoped work through the runtime.
Tool Orchestration
Skills, MCP tools, and external systems are invoked through a harness layer.
Artifacts, Audit, Replay
Outputs are produced with logs, audit trails, and replay-ready execution records.
Architecture Overview
Layer 1 — Intent & Planning
- Intent Interface
- Planner
- Task Contract
Layer 2 — Runtime & Agents
- Agent Kernel
- Execution Runtime
- Agent Mesh
- State / Event Flow
Layer 3 — Tooling & Systems
- Tool Bus (MCP)
- Skills
- External APIs
- Local Scripts
- Artifact Outputs
- Audit Store
PonyBunny separates reasoning from execution so the system can be inspected, improved, and trusted.
Composable Agent Roles
Scout
Researches, gathers context, and explores external sources.
Forge
Builds outputs, artifacts, and structured deliverables.
Keeper
Maintains memory, continuity, and execution state.
Guard
Enforces boundaries, checks safety, and supports trustworthy execution.
Use built-in roles or define your own domain-specific agents.
Built for real operational workflows
Research & Analysis
Turn a high-level request into sourced, structured outputs.
Compliance Workflows
Run traceable, auditable multi-step compliance processes.
Developer Automation
Connect tools, files, APIs, and scripts into controllable execution flows.
Internal Ops
Coordinate repeatable business tasks without turning everything into brittle scripts.
AI Product Prototyping
Build and test domain agents with visible execution paths.
Agent Reliability Engineering
Replay, inspect, and improve agent behaviour across iterations.
Why not just use a workflow builder or an agent wrapper?
Workflow Builders
Good for fixed automation. Weak when tasks require dynamic planning and execution.
Agent Wrappers
Good for demos. Weak when you need replay, audit, and engineering reliability.
PonyBunny
Built for AI work that needs both adaptability and control.
Execution Trace
Every run produces a structured, replayable trace.
Intent parsed
"analyse company financials" → task contract created
Plan generated
planner: default — 3 subtasks identified
Scout used web/company tools
tools: companies-house, web-search
Analyst processed data
agent: analyst — structured extraction complete
Report generated
artifact: financial-report.md (12.4 KB)
Audit saved
audit log: 47 entries, 0 warnings
Replay snapshot available
replay: pb replay pb_01H8...
Built for serious agent engineering
Open Architecture
Built around MCP, tools, runtime boundaries, and inspectable execution.
Developer-First
Designed for builders who need systems, not just prompts.
Audit-Ready Thinking
Created with replayability, traceability, and controlled execution in mind.
Build agents you can actually trust
PonyBunny helps you move from agent demos to controlled, replayable, auditable execution.
Start with the architecture. Then build your first harnessed workflow.