Skip to content
  • About
  • Friends
  • About
  • Friends
The Blog of phausmy personal Site of Things
  • About
  • Friends
Written by Philipp on 2026-03-17

Prompt Engineering Is Dead. Long Live Context Engineering.

Personal

I stopped optimizing my prompts. Not because it doesn’t work – but because it’s the wrong question.

The right question is: What’s in the context when the agent starts working?


The Problem with Long Conversations

Anyone who works regularly with LLMs knows the pattern: at the beginning of a chat, everything runs smoothly. After twenty, thirty messages, the model starts ignoring earlier instructions, contradicting itself, or forgetting details it actually knows. This is called Context Rot.

That’s not a bug. It’s a structural problem.

Meredith Whittaker, President of the Signal Foundation, put it into words at 39C3: Exponential Decay of Success. The math behind it is sobering. If a model has an error rate of just one percent at each step (meaning it’s 99% correct), the probability of success after 100 steps is still around 37 percent. After 1,000 steps: 0.004 percent. Current top models lose significant reliability after roughly 60 steps – even at a nominal accuracy above 85 percent.

Long conversations are therefore not just inconvenient. They are systematically unreliable.


What Newer Models Do Differently

Early models were optimized for single requests – short context, one answer, done. Newer models are designed differently. They’re not built for long dialogues, but for loading a complete context once and then acting autonomously: launching sub-agents, calling tools, delegating sub-problems.

That’s not a gradual difference. That’s a different paradigm.

The open-weights model GPT-OSS-20B represents the old school: a model that was primarily supplied with information through a carefully crafted prompt – large context was neither the goal nor the strength. That’s precisely why it’s explicitly documented as not suitable for long context recall and tool calling. That wasn’t a weakness of the model, but a reflection of the assumptions at the time. Today it’s becoming clear that this can be changed with reasonable effort: newer models like Nvidia Nemotron 3 Super or the fine-tuned Persona Kappa (20.9B MoE, 131K token context, RULER benchmark 100 percent across all context lengths) are specifically designed for large contexts and tool calling – and Kappa was trained on a single workstation with four desktop GPUs, no data center, no InfiniBand.


Context Engineering Instead of Prompt Engineering

Anthropic puts it succinctly in their engineering blog:

“Find the smallest possible set of high-signal tokens that maximize the likelihood of desired outcomes.”

High-signal tokens are tokens that actually provide the model with useful information – as opposed to filler text that bloats the context without contributing anything. A precise function name is a high-signal token. A lengthy introduction that prepares the model for what it already knows is not.

Context is a finite, valuable resource – not a free-text field. What ends up in the context window determines the quality of the result: which documents, which tool definitions, which system prompt, which artifacts from previous steps.

Prompt Engineering was the art of getting the best out of a poor context. Context Engineering is the discipline of building the context correctly from the start.


Ralph Loops: Short Cycles Instead of Long Chains

A practical answer to Context Rot is Ralph Loops: instead of one long, increasingly degrading conversation, you work in short, focused iterations. Each loop gets a freshly built, targeted context. Errors are resolved, then you move to the next loop – with a clean starting state.

That sounds more effort than a long chat. In practice, it’s more reliable.


Two Phases, Not One

Anyone working with agents today essentially has two phases – even though most don’t explicitly treat it that way yet.

Phase 1: Build the context. I work interactively with an agent to develop a project’s specification. Not through a single long prompt, but in dialogue: roughly outline the project, define technical requirements, create individual spec files – based on PDFs, existing scripts, requirements. I clarify open questions in interview mode: the agent asks, I answer. The result is an implementation-plan.md – the document that starts the next agent.

Phase 2: Let the agents loose. Hand over the finished context, start one or more agents, Ralph Loop style, and then – largely autonomously – let them run. No readjusting via prompt. No magic.


What This Means

All the tricks from the Prompt Engineering era are becoming obsolete. “Act as an expert in…” – unnecessary. Magical phrasings meant to put the model in the right mode – workarounds for a poorly built context.

The actual work shifts forward: What information does the agent really need? What do I leave out? How do I structure the spec so the next step can start cleanly?

That’s less wizardry. And significantly more engineering.


Sources and Further Reading

  • Anthropic: Effective Context Engineering for AI Agents
  • arxiv: Beyond Exponential Decay
  • Geoffrey Huntley: Ralph Loops
  • Level1Techs (Wendel): Best 120b Model for Offline Use? Nemotron 3 Super Out Now
  • Level1Techs Forum: Persona Kappa
  • Meredith Whittaker, 39C3: youtube.com

Share this:

  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook

Like this:

Like Loading...

Related

Leave a ReplyCancel reply

Archives

  • April 2026
  • March 2026
  • August 2025
  • November 2023
  • February 2023
  • January 2023
  • April 2020
  • January 2018
  • December 2017
  • May 2017
  • February 2016
  • September 2015
  • December 2014
  • August 2014
  • June 2014
  • March 2014
  • February 2014
  • September 2013
  • August 2013
  • July 2013
  • November 2012
  • October 2012
  • September 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • August 2011
  • July 2011
  • June 2011
  • May 2011
  • January 2011
  • August 2010
  • July 2010
  • June 2010
  • May 2010
  • January 2010
  • November 2009
  • October 2009
  • September 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • July 2008
  • June 2008
  • May 2008
  • March 2008
  • February 2008
  • January 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • March 2007
  • February 2007
  • January 2007
  • December 2006
  • November 2006
  • September 2006
  • June 2006
  • May 2006
  • April 2006
  • March 2006
  • February 2006
  • January 2006

Calendar

March 2026
M T W T F S S
 1
2345678
9101112131415
16171819202122
23242526272829
3031  
« Aug   Apr »

Categories

  • Bash
  • Bochum
  • Build
  • CCC
  • CLI
  • Coderwall
  • Coventry
  • DB
  • Edu
  • Freenas
  • Gitlab
  • Graphics
  • Hacking
  • iOS
  • Java
  • Javascript
  • Mac
  • NAS
  • Network
  • nexenta
  • Perl
  • Personal
  • PHP
  • Play! Framework
  • Proxmox
  • ruby
  • Ruby on Rails
  • Security
  • SmartOS
  • Snippets
  • Sound
  • Tech
  • Testing
  • Tooling
  • Twitter
  • UI
  • Uncategorized
  • Video
  • Virtualisierung
  • ZFS

Copyright The Blog of phaus 2026 | Theme by ThemeinProgress | Proudly powered by WordPress

%d