Four retrieval techniques to improve RAG you need to know

Richard Gall

Published: April 14, 2025

Retrieval-augmented generation (RAG) is a key technique in improving the reliability and accuracy of generative AI. However, it does have some limitations, not least around context, cost and handling particularly large data sets.

It’s not surprising, then, that we’ve seen a range of different approaches emerge that attempt to address RAG’s limitations over the last year or so. In the process of putting together the latest volume of the Thoughtworks Technology Radar (32) a number of techniques came up during our conversations. Some of them even appeared on the final version. As if that weren’t enough, we also thought retrieval was so significant in today’s technology landscape that we made it a theme, too.

But let’s take a deeper look at what’s actually happening in retrieval-augmented generation in 2025. Although this list certainly isn’t exhaustive, it does spotlight some key approaches and explore how they compare — and when they should be used.

How RAG works — and its limitations

Retrieval-augmented generation is a technique that augments LLMs by making it possible for the LLM to retrieve (as the name suggests) additional external information. External knowledge sources — documents or databases, for instance — are ingested and then chunked and then vectorized to create what are called vector embeddings. These are then stored somewhere — often a vector database, but not always — which can then be accessed by the system when a user inputs a prompt. This means the LLM has more accurate, up-to-date or contextual information.

RAG is incredibly powerful but it does have a number of limitations. For instance, successful retrieval is only as effective as the data it retrieves. That means it requires well-organized, up-to-date data. But there are also challenges around complex queries, retrieving information from large data sets. In particular, RAG — sometimes called “naive RAG” — can either confuse similar meanings in a data set or lack the necessary nuance to retrieve information relevant to a given query.

Corrective RAG

Corrective RAG (CRAG) is one of the most popular new approaches to retrieval-augmented generation. The central idea behind it is the introduction of an evaluation step in the process through what’s described as a “self-reflection” or “self-grading” mechanism. This is where the evaluator checks the accuracy of what it’s retrieving; if what it retrieves doesn’t hit a certain threshold, the system will look elsewhere. (This threshold is implemented through a lightweight retrieval evaluator, which calculates how relevant what’s retrieved.) Sometimes this means returning to the data set or searching the web.

What challenge does it tackle?

Corrective RAG is a technique that tackles inaccurate retrievals. For instance, confusion about semantically similar information can sometimes happen in a RAG system; the introduction of an evaluation step is useful in bolstering the reliability of what’s retrieved.

What are the limitations of corrective RAG?

Corrective RAG does have some limitations — that may affect whether it’s the right option on your generative AI project. For instance, the introduction of the evaluation step inevitably impacts latency, as it requires additional computational resources. This could affect performance downstream (particularly important if it’s supporting a live customer-facing application, for instance). It also adds complexity to your AI pipelines which can slow down team productivity and make it more difficult to correct issues if they arise.

It’s also important to note that corrective RAG can’t, of course, correct issues in the data, whether it’s inaccurate, out-of-date or just poorly organized and chunked.

When should you use corrective RAG?

Corrective RAG is a good option where you want to balance accuracy and real-time data integration.

Self-RAG

Self-RAG is closely related to corrective RAG. The “self” in its name refers to self-reflection, which, as we saw above, is a feature of corrective RAG.

However, it goes further than evaluating each instance of retrieval by expanding self-reflection: first to the decision to actually retrieve additional data, and then actually learning from its evaluations in an iterative manner. It does this by using three models in training: a retriever, a critic and then a generator. This tripartite approach allows self-RAG to employ something called a “reflection token.” As the team that developed self-RAG explain, “generating reflection tokens makes the LM [language model] controllable during the inference phase, enabling it to tailor its behavior to diverse task requirements.”

In short, self-RAG involves a kind of feedback loop in which the decisions it makes at the retrieval step reinforce the system’s understanding. This ultimately improves its overall performance.

What challenges does it tackle?

Like corrective RAG, self-RAG can tackle accuracy challenges that we can sometimes encounter when using naive RAG. The iterative feature is also valuable insofar as it leads to improvements over time.

What are the limitations of self-RAG?

The limitations of self-RAG aren’t dissimilar to corrective RAG. However, it has some additional issues. The self-reflection mechanism can, for instance, sometimes lead to outputs that aren’t actually borne out in the data (the system essentially “overthinking”).

Implementing self-RAG also comes with some trade-offs. If tokens used for training are used for self-reflection, this may reduce the quality or fluency of the system's outputs. Ultimately, then, it’s a question of what’s most important to you and the nature of the data your AI system is dealing with.

When should you use it?

Self-RAG is particularly useful when you want an LLM to be adaptive. It’s particularly useful for open-domain questions and sophisticated reasoning.

RAG-fusion

RAG-fusion is different to corrective RAG and self-RAG. While those approaches are oriented toward self-reflection, RAG-fusion is a technique in which retrieved sources of data (like, for instance, documents or entries in a wiki) are fused into a single batch. This is done through something called reciprocal rank fusion (RRF), which encompasses multiple steps, generating multiple queries from the original prompt which are then ranked.

In effect, this expands what the model can retrieve, allowing it to grasp more context and nuance in the data.

What challenges does it tackle?

RAG-fusion can help tackle some of the problems RAG has with context and nuance. It will not only help the model supply more consistent and detailed responses to prompts, it also helps the model better manage difficult or multifaceted prompts.

What are the limitations of RAG-fusion?

RAG-fusion does add substantial complexity to your LLM architecture and pipelines (and, by extension, costs) — more than the two mentioned above. As with other techniques, it can cause performance problems downstream due to the additional steps that are implemented.

When should you use RAG-fusion?

RAG-fusion’s ability to improve a model’s ability to negotiate nuance and create more consistent outputs makes RAG-fusion a useful technique in fields like customer support. In fact, where the specificity and depth of outputs are particularly important, RAG-fusion is a good technique to use.

Fast GraphRAG

Fast GraphRAG is an open-source implementation of GraphRAG, a RAG technique built on knowledge graphs. GraphRAG works not by retrieving “chunks” of data, but instead by extracting data and placing it into a knowledge graph — making something that’s a bit like a map of the data that can be retrieved. The benefit of such an approach is that it makes the connections between data visible and accessible to the LLM. This, in theory, means it can retrieve information with greater nuance and depth than it otherwise would.

Fast GraphRAG builds on GraphRAG’s core idea but introduces PageRank (the algorithm famously developed by Larry Page and Sergey Brin at Google) to help the system identify the most relevant information in the knowledge graph faster.

What challenges does it solve?

Fast GraphRAG solves a number of common challenges with RAG — the most obvious being around interpretation and nuance. The use of knowledge graphs in particular give the AI system a richer “understanding” of the retrieval data. As well as that, though, it’s also better suited to larger data sets and can better adapt to dynamic data. While naive RAG — and some of the other techniques mentioned above — are designed with a relatively static data set in mind, Fast Graph RAG is designed to manage change in data as new information is added or as information becomes outdated.

Fast GraphRAG is also a more cost efficient alternative to GraphRAG — potentially up to six times cheaper — while also being faster.

What are the limitations?

Despite the benefits of Fast GraphRAG, it’s slower than other RAG techniques that depend on vector databases rather than knowledge graphs. Indeed, there’s also additional complexity that means it might not be worth it for many use cases.

When should you use Fast GraphRAG?

Fast GraphRAG might sometimes be overkill, but if you’re trying to tackle a particularly large data set or if accuracy is critical, then it is a very good option.

What’s next in RAG?

The techniques discussed above aren’t exhaustive. There are other approaches emerging as teams seek to tackle the trade-offs of working with LLMs.

For instance, there’s some work happening on multimodal RAG, which, as the name suggests, goes beyond just text data to incorporate images, charts and tables — even audio.

There’s also a more significant alternative to RAG being explored called cache-augmented generation. This is where the retrieval step is circumvented by preloading data into a model’s context window from a cache — so it can be accessed faster and more easily. Cache-augmented generation isn’t necessarily a technique that will improve accuracy and quality, but it can make the model more efficient.

Pay attention: The RAG space is dynamic and evolving

It should be clear by now that there’s a lot happening in the world of RAG. While generative AI and LLMs are the terms that usually take the headlines, it’s the experimentation and innovation happening in retrieval right now that’s really shaping the effectiveness of AI-driven products.

But it’s important to be open-minded about the approach you take. It might be obvious, but there’s no single “best” RAG technique — there’s always a trade-off between complexity, speed and cost, for instance.

What matters most is understanding what’s important to your use case and then assessing the options thoroughly so you make an informed and effective decision.

Thanks to Jem Elias for his support on this piece.