November 3, 2025

Optimizing RAG Systems with DSPy

What is DSPy?

DSPy is a framework that treats prompting as a programming paradigm rather than an art form. Instead of manually crafting and tweaking prompts, you define the structure of your task, and DSPy automatically optimizes the prompts and few-shot examples for you.

The framework introduces “signatures” that declare what your LM should do, and “teleprompters” that compile your program by optimizing these components against a metric.

The RAG Pattern

Retrieval-Augmented Generation (RAG) combines information retrieval with language generation. The system retrieves relevant documents, then uses them as context for generating answers. This approach grounds responses in factual information and reduces hallucinations.

A basic RAG pipeline follows three steps:

Retrieve relevant passages for a question
Pass these passages as context to the language model
Generate an answer based on the context

Building RAG in DSPy

DSPy lets you define RAG as a modular pipeline. First, declare your task signature:

1class GenerateAnswer(dspy.Signature):
2    """Answer questions with short factoid answers."""
3    context = dspy.InputField(desc="may contain relevant facts")
4    question = dspy.InputField()
5    answer = dspy.OutputField(desc="often between 1 and 5 words")

This signature tells DSPy what inputs your module receives and what outputs it produces. The descriptions guide prompt optimization.

Next, implement the RAG module:

 1class RAG(dspy.Module):
 2    def __init__(self, num_passages=3):
 3        super().__init__()
 4        self.retrieve = dspy.Retrieve(k=num_passages)
 5        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
 6
 7    def forward(self, question):
 8        context = self.retrieve(question).passages
 9        prediction = self.generate_answer(context=context, question=question)
10        return dspy.Prediction(context=context, answer=prediction.answer)

The module retrieves passages and generates answers using chain-of-thought reasoning. DSPy modules follow a define-by-run approach—you write normal Python code in the forward method.

Optimizing with BootstrapFewShot

The key advantage of DSPy emerges during optimization. Define a validation metric:

1def validate_context_and_answer(example, pred, trace=None):
2    answer_EM = dspy.evaluate.answer_exact_match(example, pred)
3    answer_PM = dspy.evaluate.answer_passage_match(example, pred)
4    return answer_EM and answer_PM

This metric checks two conditions: the predicted answer matches the correct answer, and the retrieved context actually contains that answer.

Now compile the program:

1teleprompter = BootstrapFewShot(metric=validate_context_and_answer)
2compiled_rag = teleprompter.compile(RAG(), trainset=trainset)

BootstrapFewShot automatically:

Runs your RAG on training examples
Selects successful traces that pass validation
Uses these traces as few-shot examples
Optimizes prompts to maximize your metric

The compiled program performs better than the original because DSPy learns which examples demonstrate correct behavior and incorporates them into the prompts.

Why This Matters

Traditional prompt engineering requires manual iteration: write prompts, test them, adjust wording, add examples, repeat. DSPy automates this process. You declare the task structure and success criteria, then the framework handles optimization.

This approach scales better as systems grow complex. When you add modules or change retrieval strategies, DSPy re-optimizes the entire pipeline automatically.

Tags: Dspy, Llm, Prompt-Optimization, Few-Shot-Learning, Rag, Machine-Learning