Harness-First Agent System

Harness-First Runtime for AI Agents

PonyBunny gives AI agents a controlled execution harness — so they can plan, act, replay, and audit work like real software systems.

Explore on GitHub Read the Docs See Architecture

pb run "analyse company financials"

runId: pb_01H8xK9...

planner: default

executor: evented

agents: scout, analyst

tools: companies-house, web, file-parser

✓ execution completed

✓ artifact created

✓ audit log saved

✓ replay available

Intenttask boundary

\u2193

Plannerstructured plan

\u2193

Agentsscoped execution

\u2193

Toolsharness layer

\u2193

Artifactsoutputs + audit

Why most agent systems break in production

Opaque execution

Most agents can act, but you can’t reliably see why they made a decision.

Fragile tool use

A small model or tool change can silently break the workflow.

No replayability

When an agent fails, reproducing the exact run is often impossible.

No audit trail

In real-world work, “it seemed to work” is not good enough.

PonyBunny solves this with a harness-first architecture for AI execution.

Built as an execution harness, not just an agent wrapper

PonyBunny is designed to make agent behaviour controllable, observable, replayable, and auditable.

Controlled Execution

Run tasks inside a bounded runtime, not a free-floating model loop.

Planner–Executor Separation

Keep decision-making and execution distinct for better reliability and inspection.

Tool Harness

Connect skills, MCP servers, scripts, and external systems through a controlled layer.

Replayable Runs

Re-run tasks with traceable context and compare outcomes across changes.

Audit Logs

Keep a structured record of what happened, what was used, and what was produced.

Model Flexibility

Move reliability into the harness layer instead of depending on one model vendor.

How PonyBunny Works

The model can reason. PonyBunny makes execution structured.

Intent Intake

User intent enters a controlled task boundary.

Planning

A planner produces a structured execution plan.

Agent Execution

Agents perform scoped work through the runtime.

Tool Orchestration

Skills, MCP tools, and external systems are invoked through a harness layer.

Artifacts, Audit, Replay

Outputs are produced with logs, audit trails, and replay-ready execution records.

Architecture Overview

Layer 1 — Intent & Planning

Intent Interface
Planner
Task Contract

Layer 2 — Runtime & Agents

Agent Kernel
Execution Runtime
Agent Mesh
State / Event Flow

Layer 3 — Tooling & Systems

Tool Bus (MCP)
Skills
External APIs
Local Scripts
Artifact Outputs
Audit Store

PonyBunny separates reasoning from execution so the system can be inspected, improved, and trusted.

Composable Agent Roles

Scout

Researches, gathers context, and explores external sources.

Forge

Builds outputs, artifacts, and structured deliverables.

Keeper

Maintains memory, continuity, and execution state.

Guard

Enforces boundaries, checks safety, and supports trustworthy execution.

Use built-in roles or define your own domain-specific agents.

Built for real operational workflows

Research & Analysis

Turn a high-level request into sourced, structured outputs.

Compliance Workflows

Run traceable, auditable multi-step compliance processes.

Developer Automation

Connect tools, files, APIs, and scripts into controllable execution flows.

Internal Ops

Coordinate repeatable business tasks without turning everything into brittle scripts.

AI Product Prototyping

Build and test domain agents with visible execution paths.

Agent Reliability Engineering

Replay, inspect, and improve agent behaviour across iterations.

Why not just use a workflow builder or an agent wrapper?

Workflow Builders

Good for fixed automation. Weak when tasks require dynamic planning and execution.

Agent Wrappers

Good for demos. Weak when you need replay, audit, and engineering reliability.

PonyBunny

Built for AI work that needs both adaptability and control.

Execution Trace

Every run produces a structured, replayable trace.

pb run "analyse company financials" — runId: pb_01H8x...

[1]

Intent parsed

"analyse company financials" → task contract created

[2]

Plan generated

planner: default — 3 subtasks identified

[3]

Scout used web/company tools

tools: companies-house, web-search

[4]

Analyst processed data

agent: analyst — structured extraction complete

[5]

Report generated

artifact: financial-report.md (12.4 KB)

[6]

Audit saved

audit log: 47 entries, 0 warnings

[7]

Replay snapshot available

replay: pb replay pb_01H8...

Built for serious agent engineering

Open Architecture

Built around MCP, tools, runtime boundaries, and inspectable execution.

Developer-First

Designed for builders who need systems, not just prompts.

Audit-Ready Thinking

Created with replayability, traceability, and controlled execution in mind.

Build agents you can actually trust

PonyBunny helps you move from agent demos to controlled, replayable, auditable execution.

Explore on GitHub Read the Docs

Start with the architecture. Then build your first harnessed workflow.