åäøŖåæ é”»ęę”ē RAG ę£ē“¢å¢å¼ŗęęÆ
Source: Thoughtworks Tech
Four retrieval techniques to improve RAG you need to know
Retrieval-augmented generation (RAG) is a key technique in improving the reliability and accuracy of generative AI. However, it does have some limitations, not least around context, cost and handling particularly large data sets.
Itās not surprising, then, that weāve seen a range of different approaches emerge that attempt to address RAGās limitations over the last year or so. In the process of putting together the latest volume of the Thoughtworks Technology Radar (32) a number of techniques came up during our conversations. Some of them even appeared on the final version. As if that werenāt enough, we also thought retrieval was so significant in todayās technology landscape that we made it a theme, too.
But letās take a deeper look at whatās actually happening in retrieval-augmented generation in 2025. Although this list certainly isnāt exhaustive, it does spotlight some key approaches and explore how they compare ā and when they should be used.
How RAG works ā and its limitations
Retrieval-augmented generation is a technique that augmentsĀ LLMs by making it possible for the LLM to retrieve (as the name suggests) additional external information. External knowledge sources ā documents or databases, for instance ā are ingested and then chunked and then vectorized to create what are called vector embeddings. These are then stored somewhere ā often a vector database, but not always ā which can then be accessed by the system when a user inputs a prompt. This means the LLM has more accurate, up-to-date or contextual information.
RAG is incredibly powerful but it does have a number of limitations. For instance, successful retrieval is only as effective as the data it retrieves. That means it requires well-organized, up-to-date data. But there are also challenges around complex queries, retrieving information from large data sets. In particular, RAG ā sometimes called ānaive RAGā ā can either confuse similar meanings in a data set or lack the necessary nuance to retrieve information relevant to a given query.
Corrective RAG
Corrective RAG (CRAG) is one of the most popular new approaches to retrieval-augmented generation. The central idea behind it is the introduction of an evaluation step in the process through whatās described as a āself-reflectionā or āself-gradingā mechanism. This is where the evaluator checks the accuracy of what itās retrieving; if what it retrieves doesnāt hit a certain threshold, the system will look elsewhere. (This threshold is implemented through a lightweight retrieval evaluator, which calculates how relevant whatās retrieved.) Sometimes this means returning to the data set or searching the web.
Ā
What challenge does it tackle?
Corrective RAG is a technique that tackles inaccurate retrievals. For instance, confusion about semantically similar information can sometimes happen in a RAG system; the introduction of an evaluation step is useful in bolstering the reliability of whatās retrieved.
Ā
What are the limitations of corrective RAG?
Corrective RAG does have some limitations ā that may affect whether itās the right option on your generative AI project. For instance, the introduction of the evaluation step inevitably impacts latency, as it requires additional computational resources. This could affect performance downstream (particularly important if itās supporting a live customer-facing application, for instance). It also adds complexity to your AI pipelines which can slow down team productivity and make it more difficult to correct issues if they arise.
Itās also important to note that corrective RAG canāt, of course, correct issues in the data, whether itās inaccurate, out-of-date or just poorly organized and chunked.
Ā
When should you use corrective RAG?
Corrective RAG is a good option where you want to balance accuracy and real-time data integration.Ā
Self-RAG
Self-RAG is closely related to corrective RAG. The āselfā in its name refers to self-reflection, which, as we saw above, is a feature of corrective RAG.
However, it goes further than evaluating each instance of retrieval by expanding self-reflection: first to the decision to actually retrieve additional data, and then actually learning from its evaluations in an iterative manner. It does this by using three models in training: a retriever, a critic and then a generator. This tripartite approach allows self-RAG to employ something called a āreflection token.ā As the team that developed self-RAG explain, āgenerating reflection tokens makes the LM [language model] controllable during the inference phase, enabling it to tailor its behavior to diverse task requirements.ā
In short, self-RAG involves a kind of feedback loop in which the decisions it makes at the retrieval step reinforce the systemās understanding. This ultimately improves its overall performance.
Ā
What challenges does it tackle?
Like corrective RAG, self-RAG can tackle accuracy challenges that we can sometimes encounter when using naive RAG. The iterative feature is also valuable insofar as it leads to improvements over time.
Ā
What are the limitations of self-RAG?
The limitations of self-RAG arenāt dissimilar to corrective RAG. However, it has some additional issues. The self-reflection mechanism can, for instance, sometimes lead to outputs that arenāt actually borne out in the data (the system essentially āoverthinkingā).Ā
Implementing self-RAG also comes with some trade-offs. If tokens used for training are used for self-reflection, this may reduce the quality or fluency of the system's outputs. Ultimately, then, itās a question of whatās most important to you and the nature of the data your AI system is dealing with.
Ā
When should you use it?
Self-RAG is particularly useful when you want an LLM to be adaptive. Itās particularly useful for open-domain questions and sophisticated reasoning.
RAG-fusion
RAG-fusion is different to corrective RAG and self-RAG. While those approaches are oriented toward self-reflection, RAG-fusion is a technique in which retrieved sources of data (like, for instance, documents or entries in a wiki) are fused into a single batch. This is done through something called reciprocal rank fusion (RRF), which encompasses multiple steps, generating multiple queries from the original prompt which are then ranked.
In effect, this expands what the model can retrieve, allowing it to grasp more context and nuance in the data.
Ā
What challenges does it tackle?
RAG-fusion can help tackle some of the problems RAG has with context and nuance. It will not only help the model supply more consistent and detailed responses to prompts, it also helps the model better manage difficult or multifaceted prompts.
Ā
What are the limitations of RAG-fusion?
RAG-fusion does add substantial complexity to your LLM architecture and pipelines (and, by extension, costs) ā more than the two mentioned above. As with other techniques, it can cause performance problems downstream due to the additional steps that are implemented.Ā
Ā
When should you use RAG-fusion?
RAG-fusionās ability to improve a modelās ability to negotiate nuance and create more consistent outputs makes RAG-fusion a useful technique in fields like customer support. In fact, where the specificity and depth of outputs are particularly important, RAG-fusion is a good technique to use.
Fast GraphRAG
Fast GraphRAG is an open-source implementation of GraphRAG, a RAG technique built on knowledge graphs. GraphRAG works not by retrieving āchunksā of data, but instead by extracting data and placing it into a knowledge graph ā making something thatās a bit like a map of the data that can be retrieved. The benefit of such an approach is that it makes the connections between data visible and accessible to the LLM. This, in theory, means it can retrieve information with greater nuance and depth than it otherwise would.
Fast GraphRAG builds on GraphRAGās core idea but introduces PageRank (the algorithm famously developed by Larry Page and Sergey Brin at Google) to help the system identify the most relevant information in the knowledge graph faster.
Ā
What challenges does it solve?
Fast GraphRAG solves a number of common challenges with RAG ā the most obvious being around interpretation and nuance. The use of knowledge graphs in particular give the AI system a richer āunderstandingā of the retrieval data. As well as that, though, itās also better suited to larger data sets and can better adapt to dynamic data. While naive RAG ā and some of the other techniques mentioned above ā are designed with a relatively static data set in mind, Fast Graph RAG is designed to manage change in data as new information is added or as information becomes outdated.
Fast GraphRAG is also a more cost efficient alternative to GraphRAG ā potentially up to six times cheaper ā while also being faster.
Ā
What are the limitations?
Despite the benefits of Fast GraphRAG, itās slower than other RAG techniques that depend on vector databases rather than knowledge graphs. Indeed, thereās also additional complexity that means it might not be worth it for many use cases.
Ā
When should you use Fast GraphRAG?
Fast GraphRAG might sometimes be overkill, but if youāre trying to tackle a particularly large data set or if accuracy is critical, then it is a very good option.
Whatās next in RAG?
The techniques discussed above arenāt exhaustive. There are other approaches emerging as teams seek to tackle the trade-offs of working with LLMs.Ā
For instance, thereās some work happening on multimodal RAG, which, as the name suggests, goes beyond just text data to incorporate images, charts and tables ā even audio.Ā
Thereās also a more significant alternative to RAG being explored called cache-augmented generation. This is where the retrieval step is circumvented by preloading data into a modelās context window from a cache ā so it can be accessed faster and more easily. Cache-augmented generation isnāt necessarily a technique that will improve accuracy and quality, but it can make the model more efficient.Ā
Ā
Pay attention: The RAG space is dynamic and evolving
It should be clear by now that thereās a lot happening in the world of RAG. While generative AI and LLMs are the terms that usually take the headlines, itās the experimentation and innovation happening in retrieval right now thatās really shaping the effectiveness of AI-driven products.Ā
But itās important to be open-minded about the approach you take. It might be obvious, but thereās no single ābestā RAG technique ā thereās always a trade-off between complexity, speed and cost, for instance.Ā
What matters most is understanding whatās important to your use case and then assessing the options thoroughly so you make an informed and effective decision.
Thanks to Jem Elias for his support on this piece.