Context engineering: How to give AI exactly what it needs

Rohit Biswal

Published: August 28, 2025

If you’ve spent any time around AI development circles recently, you’ve probably noticed a shift from simple prompting to something people are calling 'context engineering.'

Whether you’re working with Claude 4, GPT-4o or Gemini 2.5, the difference between mediocre and exceptional results often hinges on one thing: how intelligently you design your context window.

What is context engineering?

Context engineering isn’t about stuffing more into the prompt — it’s about curating smarter. Think of it as the art of structuring, optimizing and trimming information so large language models respond faster, cheaper and better.

Instead of dumping an entire codebase or dataset into the context window, Context Engineering strategically gives the model only what matters — the right information, in the right format, at the right time.

Tools like MCP, CaC, and Context Engines are making this easier — bridging data sources to AI, turning codebases into living docs, and creating structured summaries from complex projects.

The context paradox: Why more isn’t always better

Overwhelming your AI with information often makes it perform worse. That might sound counterintuitive but it's the reality of building AI systems and applications.

Imagine you’re asking an LLM to generate a UserService in Spring Boot. A rookie move is to dump the entire codebase — controllers, full repository logic, configuration classes, utility layers, even README files — into the prompt.

But what happens?

Token burn: 10,000+ tokens for mostly irrelevant data
Slower inference: Long inputs increase latency
Confused output: Model struggles to find signal in the noise

A smarter alternative

It’s not about less context — it’s about the right context. Instead of dumping everything, context engineering focuses on strategic inclusion.

If you’re generating a UserService, you might only need:

The user model (fields only)
The repository interface signature
The controller endpoint patterns

That’s just 300 tokens instead of 10,000 with superior results.

Three context engineering techniques

There are a number of context engineering techniques that can help us get more from AI. In this blog post I'll look specifically at three particular examples:

Skeleton trimming — where you keep only the essential structure, like method signatures, class declarations and annotations, and strip away implementation details.
Relevance-first file selection — where you start by identifying only the files that are directly relevant to the task when preparing LLM inputs.
Context phasing — where, instead of providing all context in one go, you deliver it in stages. Each step contains only the information relevant for that point in the process.

It's worth noting that these three techniques aren’t official standards — they come from hands-on experimentation in backend code generation (mainly Java/Spring Boot). Think of them as adaptable patterns.

Let's now look at each one in more detail.

Skeleton trimming

Here's what our code looks like before we implement skeleton trimming (full controller):

...And here's what it looks like after:

Relevance-first file selection

As mentioned above, this is where you start by identifying the files that are relevant to the tast. Here's how this is done...

i) Must-Have inputs:

These are files that define the task directly.

For instance, if you’re generating OrderService, you’ll always want:

OrderRepository interface (just the method declarations)
Order model (fields and annotations only)
OrderController (only endpoint mappings)

ii) Conditional extras:

Include these only when your task depends on them —

If OrderService throws OrderNotFoundException, include that class.
If it uses a CreateOrderRequest DTO, include that too.

iii) Irrelevant files:

Files that don’t affect the current task should be ignored.

You don’t need SecurityConfig, Application.java, or unrelated service classes unless your logic touches them.

Context phasing

Each step containing only the information relevant for that point in the process — we don't need to provide all the context at once.

Here's how it works...

Setup: Define the high-level goal and constraints — e.g., “implement CRUD operations for OrderService.”
Structure: Provide just the file structures, models and interfaces needed to outline the solution — e.g., the order model fields, repository interface and DTOs.
Detail: Add specific implementation elements such as exception classes, constants and edge cases.

By pacing the flow of information, you guide the model’s focus step by step, leading to cleaner and more coherent outputs.

Final thoughts

Effective context management is a cornerstone of building high-performance AI systems. By delivering only the most relevant information one can reduce latency, control costs and improve the precision of model outputs.

By combining standards like MCP with approaches such as code-as-context and context engines (CTX), you give your LLMs exactly the information they need and nothing more. Simple tricks like skeleton trimming can keep token counts low, responses fast and quality high.

Your LLM’s output is only as good as the context you give it. Treat your token budget like premium real estate and fill it with high-value content, not clutter.

An earlier version of this piece was published on Medium.