The Outcome Engineering Framework: Evolving Software at the Speed of Human Intent

The Context Window Trap

If you've spent any serious time building software with AI agents recently, you know the cycle:

The Honeymoon: You type "build me a React app," and magic happens, if only it were that simple with real-world projects and codebases, let alone running a business on top of them.
The Drift: You ask for a new routing feature, you have to bear the burden of thinking of how to evolve the codebase, what needs fixing and what to type in the big prompt box staring at you to feed it. The cognitive load shifts back to you. You have to think of the next feature, keep track to make sure the model did what you asked, and ensure nothing falls through the cracks, all while trying to stick to the plan and prevent scope creep.
The Collapse: The context window fills up, the agent hallucinates a library that doesn't exist, and suddenly you're reverting 15 files while yelling at your screen. Then you realize you forgot to add a feature you needed, and you have to go back to the drawing board and plan it out again. Now you're doing two things at once. Your productive use of time plunges as you get mentally fatigued, and you're left with only a few hours of sleep. That's the bottleneck to outcomes.

As coding agents become more capable, developers fall into the trap of over-depending on the model's transient memory. We throw massive prompts at an LLM, hoping it remembers the architecture, the goals, and the constraints. We go on a tool spree to try to appease our desire to feel productive, and I'm sure you've noticed that AI just becomes a cool tool and we delay achieving real-world outcomes because we somehow think that an AGI will do it for us and we never have to be involved. I could slide into doomsday mode and tell you why that's not a good idea; it turns out if we seek God, it's better to stick to praying instead of attempting to build one.

When that memory breaks, progress stalls. Software decays because the model lacks a rigid, externalized "brain" to ground its execution, and the human lacks a mechanism to tightly control the model's impulses without micromanaging code and prompts line-by-line.

We needed a light-weight system that anchored the model entirely outside of its own context window without introducing new tool dependencies that add to cost and complexity.

Enter the Outcome Engineering Framework (OEF).

I came up with this from an old idea I've held for the past 3 years which I endearingly termed FIONA which stands for Framework for the Implementation and Optimization of Novel AI. The goal was to systematize how I implement AI and optimize it for real world projects. And then one night it was the concept of Medusa that clicked. What if I could always freeze the model to observe a certain protocol as if it caught Medusa's stare hence the idea was fully formed.

What is OEF?

OEF is a radical departure from traditional Agile or ad-hoc AI prompting. It is a documentation-as-control-plane methodology that begins with ascertaining whether the project is brownfield or greenfield as in other domains of human development be it software or urban planning.

By maintaining three exact, continuously updated Markdown artifacts in the root of your repository, OEF flips the traditional dynamic: The human acts as the Director, and the model acts as the Chief Engineer.

Crucially, the human does not manually maintain these documents; the model does. The human simply dictates the pace, pivots the strategy, and commands execution. The model handles the documentation and the code, even becoming an autonomous engineer for your project or codebase due to the recursive nature of the prompts once the framework is locked in for example the "Generate Outcomes" feature of OutcomeDev hinges around this framework and system of prompting and execution.

The Three Pillars of Execution

OEF forces the model to read the room, acknowledge its constraints, and execute strictly against a checklist. It relies on these three files:

The ENGINEERING_PROPOSAL.md (The Grounding Truth)
- What it is: A brutally honest assessment of current reality, written by the model after auditing your codebase.
- Why it matters: It identifies technical debt and explicitly declares a "What to Deliberately Not Do (Yet)" section. This prevents the model (and the enthusiastic founder) from chasing shiny features when the foundation is broken.
The IMPROVEMENTS_PLAN.md (The Vision & Phases)
- What it is: The model's translation of your high-level human intent into a structured, phased architectural plan.
- Why it matters: It ensures the model doesn't try to build the entire app in one prompt. It forces sequential, methodical progress (Phase 1, Phase 2, etc.).
The CHECKLIST.md (The Execution Engine)
- What it is: A granular markdown checklist derived directly from the active phase of the plan.
- Why it matters: This is the literal punch-list. The model relies on this checklist to track progress across sessions, guaranteeing zero hallucination drift.
P.S I have since renamed these files to just be PROPOSAL.md, PLAN.md, and CHECKLIST.md respectively because I'm lazy and also as a nod to the ubiquitous AGENTS.md more to come on that later on this blog.

How the Loop Actually Works

This isn't a theory; it's an infinite, iterative loop that can govern the entire lifecycle of a product, that I've developed out of necessity of having so many requests to build software but my time being limited becoming the bottleneck, yet for each project I essentially end up prompting more or less of the same thing and with coding agents usually have to correct for the same mistakes. Because I was suffering from IDE and LLM fatigue I wanted to make it easy to just be able to execute tasks without the human, me in this case having to carry some of the cognitive load for the model afterall it's supposed to make my life easier not suck the life out of my neurons which I can then preserve for higher order and more complex strategic and long-term reasoning that silicon just can't do. Turns out generative AI is (ta ta) it's "generative" so making the process of software and outcome development generative not only natural language driven is the major unlock of the OEF framework.

Initialization: You point a model at your repo and fire the "Directive Zero" superprompt (we call this The Medusa Protocol because it freezes AI chaos into stone). The model audits the code and generates the three pillars.
The Command: You sit back and shout a directive like, "Execute Phase 1," or "Suspend Priority 3," or "Pivot our focus entirely to mobile-first."
The Re-alignment: The model absorbs the command, recalculates dependencies, and autonomously rewrites the three markdown documents to reflect your new reality.
The Execution: The model reads the newly updated CHECKLIST.md, writes the code, verifies it, checks the box, and waits for your next command.

You never write a line of code. You never maintain a Jira board. You just stand over the map table and direct the engine.

Regaining Agency

The fundamental flaw in current AI coding tools is Lost Agency. Right now, developers go prompt by prompt which is better considering they used to write line-by-line of code (which is too slow) or God forbid vibe it out and hope for the best which is scary and chaotic considering how LLMs have a "mind of their own" sometimes which may not align with the human's intent.

The Outcome Engineering Framework creates the perfect middle ground.

Stop fighting your codebase. Dictate the outcome, let the model document the reality, and evolve your software at the exact speed of human intent.

(Want to try it yourself? Check out our open-source Medusa Protocol Superprompts to inject OEF into any LLM environment today).