How It Works

Every Eiryx task — whether a bugfix or a feature — follows the same four-stage pipeline.

The Pipeline: Detect → Isolate → Verify → Ship

1. Detect

Errors arrive through one of three paths:

Webhook — Sentry, GitHub, or any JSON webhook sends an error payload. Eiryx parses it, normalizes it to a standard ErrorEvent, and classifies severity using Gemini Flash (AI triage).
Manual — A user creates a task in the UI, describing the bug or feature with a description and optional reference files.
Pipeline — A configured pipeline triggers a task based on rules (coming in Pipelines V1).

The Smart Router evaluates the task and selects the appropriate AI model and iteration budget based on complexity.

2. Isolate

A disposable Docker container is created (based on node:20-bullseye or a custom image). Inside this sandbox:

The repository is cloned using a GitHub App Installation Access Token (valid 60 minutes, x-access-token auth — no personal access tokens).
Dependencies are installed using the detected package manager.
The .ai-agent.yml file is read to configure the agent’s behavior.

The agent then enters its exploration phase:

AST-based exploration — The agent uses explore_file (shows class/function signatures) and read_symbol (reads a specific function body) to navigate the codebase structurally, not by grep.
Fault localization — For bugfixes, the fault_localization module pre-extracts file paths, error types, line numbers, and function names from the issue description using regex, giving the agent a head start.
For features — The agent studies the codebase architecture first (mandatory Phase 1), then plans the implementation bottom-up: schema → services → endpoints → UI.

3. Verify

This is where Eiryx differs from “AI says it’s fixed” tools. Execution-based evaluation means:

The agent writes a fix using edit_symbol (modifies a function) or edit_file (modifies config files).
The container is reset to a clean state — the fix is applied as a patch.
The full test suite runs (commands.test from .ai-agent.yml).
Only if exit code is 0 (all tests pass) does the fix proceed.

This is the same methodology used in SWE-bench, the industry standard for evaluating AI coding agents.

If tests fail, the agent iterates: it reads the test output, adjusts the fix, and retries — up to max_iterations.

4. Ship

When tests pass, the agent:

Creates a new branch (e.g., eiryx/fix-null-email-validation).
Commits the changes with a structured message.
Opens a Pull Request on GitHub with:
- Root cause analysis (what caused the bug)
- Files modified and why
- Test results (which tests ran, pass count)
- Cost breakdown (model used, tokens, cost in USD)
The PR enters the Data Moat pipeline: if it survives 14 days without revert or regression, the agent’s trajectory is promoted to golden data and used to improve accuracy for similar bugs across all tenants.

Architecture Diagram

┌──────────────┐     ┌──────────────────┐     ┌──────────────────┐
│   Webhook    │────▶│  Auto-Triage     │────▶│  Task Queue      │
│  (Sentry)    │     │  (Gemini Flash)  │     │  (asyncio.Queue) │
└──────────────┘     └──────────────────┘     └────────┬─────────┘
                                                       │
┌──────────────┐                                       ▼
│   Manual     │────────────────────────────▶┌──────────────────┐
│  (UI Form)   │                             │  Task Worker     │
└──────────────┘                             │  (ThreadPool)    │
                                             └────────┬─────────┘
                                                      │
                                                      ▼
                                             ┌──────────────────┐
                                             │  Smart Router    │
                                             │  (model select)  │
                                             └────────┬─────────┘
                                                      │
                                                      ▼
                                             ┌──────────────────┐
                                             │  Docker Sandbox  │
                                             │  (disposable)    │
                                             │                  │
                                             │  ┌────────────┐  │
                                             │  │  AI Agent   │  │
                                             │  │  (CodingAg) │  │
                                             │  └────────────┘  │
                                             └────────┬─────────┘
                                                      │
                                                      ▼
                                             ┌──────────────────┐
                                             │  GitHub PR       │
                                             │  + Data Moat     │
                                             └──────────────────┘

Smart Router Model Tiers

The Smart Router selects the cheapest model that can handle the task. It escalates only when needed.

Tier	Models	Use Case	Cost/iteration
Economy	Gemini 3 Flash, Claude Haiku 4.5	Typos, missing imports, config fixes	~$0.002
Standard	Claude Sonnet 4.6, Gemini 3.1 Pro, GPT-4o	Moderate bugs, small features	~$0.02
Premium	Claude Opus 4.6, o3	Complex architecture, multi-file refactors	~$0.05

If the economy model fails after several attempts, the router automatically escalates to standard, then premium.

Data Moat

The Data Moat is Eiryx’s compounding advantage. Every successful fix generates a trajectory — a record of which tools the agent called, which files it explored, what it tried, and what worked.

Trajectories are validated through Survivorship Bias Validation: a PR must survive 14 days in production without revert, reopen, or regression (checked via the Regression Guardian cron job). Only then is it promoted to golden data.

Golden trajectories are indexed by stack fingerprint (language + framework + test runner). When a new task arrives on a similar stack, the agent can reference past successful approaches — making it faster and more accurate over time.

This is a cross-tenant advantage: every customer’s successful fixes make the system better for all customers.