TDD 101: Test-Driven Development for Dummies

Test-Driven Development (TDD) is a software development approach that emphasizes writing tests before writing the actual implementation code. The concept was popularized by Kent Beck as part of Extreme Programming (XP) in the late 1990s. The core idea is simple: write tests first, then write just enough code to pass those tests, and finally, refine the implementation. It follows a simple cycle: Red-Green-Refactor.

The TDD Cycle

First, you write a failing test that defines an expected behavior—this is the "Red" phase because the test will fail initially. Then, you write just enough code to make the test pass, which is the "Green" phase. Finally, you clean up and optimize the code while making sure the tests still pass—this is the "Refactor" phase.

Why Use TDD?

TDD helps you write better code by making sure every piece of functionality is tested before it even exists. This means fewer bugs, cleaner design, and modular, maintainable code. It also gives you confidence when making changes since you’ll know right away if something breaks. Plus, it speeds up development in the long run by catching issues early and making debugging way easier. If you're working in a team or using continuous integration, having tests from the start makes deployment smoother and more reliable.

TDD in Action

TDD isn’t just for simple functions—it’s super useful in AI applications too, like Retrieval-Augmented Generation (RAG) models. Imagine you're building a chatbot that fetches relevant information before generating a response. With TDD, you can write tests to make sure the bot is retrieving the right data and structuring responses correctly before you even implement the retrieval system. In this case, we're building RAG that can generate multiple-choice question options and answers based on a piece of domain knowledge.

First Iteration: Writing the Initial Test

Before writing any implementation, we first define a test that will validate the expected structure of our response. This test ensures that our function returns a dictionary with multiple-choice options labeled as 'a', 'b', 'c', and 'd'.

test_haystack_pipeline.py

import unittest
import json
from unittest.mock import patch
from haystack_pipeline import generate_multiple_choice_response

class TestMultipleChoiceResponse(unittest.TestCase):
    @patch("haystack.pipelines.Pipeline.run")
    def test_generate_multiple_choice_response(self, mock_run):
        mock_run.return_value = json.dumps({"a": "Option A", "b": "Option B", "c": "Option C", "d": "Option D"})
        prompt = "What is the capital of France? Choose one: a, b, c, or d."
        result = generate_multiple_choice_response(prompt)
        
        self.assertIsInstance(result, dict)
        for key in ["a", "b", "c", "d"]:
            self.assertIn(key, result)
            self.assertTrue(result[key])  # Ensures values are not empty

if __name__ == "__main__":
    unittest.main()

First Iteration: Implementing the Haystack Function

Now, we implement the function just enough to make the test pass:

haystack_pipeline.py

import json
from haystack.pipelines import Pipeline

def generate_multiple_choice_response(prompt):
    pipeline = Pipeline()
    query_prompt = f"Generate a multiple-choice question based on the following prompt and return the answers as a JSON object with keys 'a', 'b', 'c', and 'd': {prompt}"
    response = pipeline.run(query=query_prompt)
    return json.loads(response)  # Convert string response to JSON

At this stage, we have a basic test and a corresponding implementation that ensures the generated multiple-choice options contain the expected keys.

Second Iteration: Expanding the Test to Include an Answer Key

Now that we have a basic implementation, we need to extend the test to check if an 'answer' key exists and matches one of the provided options.

test_haystack_pipeline.py

class TestMultipleChoiceResponse(unittest.TestCase):
    @patch("haystack.pipelines.Pipeline.run")
    def test_generate_multiple_choice_response(self, mock_run):
        mock_run.return_value = json.dumps({"a": "Option A", "b": "Option B", "c": "Option C", "d": "Option D", "answer": "Option B"})
        prompt = "What is the capital of France? Choose one: a, b, c, or d."
        result = generate_multiple_choice_response(prompt)
        
        self.assertIsInstance(result, dict)
        for key in ["a", "b", "c", "d"]:
            self.assertIn(key, result)
            self.assertTrue(result[key])  # Ensures values are not empty
        
        self.assertIn("answer", result)
        self.assertIn(result["answer"], result.values())  # Ensures answer is one of the options

Second Iteration: Updating the Implementation to Pass the Test

We modify the function to ensure the 'answer' key is correctly included in the response.

haystack_pipeline.py

import json
import random
from haystack.pipelines import Pipeline

def generate_multiple_choice_response(prompt):
    pipeline = Pipeline()
    query_prompt = f"Generate a multiple-choice question based on the following prompt and return the answers as a JSON object with keys 'a', 'b', 'c', and 'd'. Also, include an 'answer' key, which should contain the exact value of one of the options: {prompt}"
    response = pipeline.run(query=query_prompt)
    response_json = json.loads(response)
    
    # Ensure 'answer' key is included and valid
    response_json["answer"] = random.choice(list(response_json.values()))
    
    return response_json

Third Iteration: Adding Error Handling

Now, let's add validation to ensure the function handles missing or incorrect responses properly:

test_haystack_pipeline.py

class TestMultipleChoiceResponseWithErrors(unittest.TestCase):
    @patch("haystack.pipelines.Pipeline.run")
    def test_invalid_json_raises_exception(self, mock_run):
        mock_run.return_value = "not a json"
        with self.assertRaises(json.JSONDecodeError):
            generate_multiple_choice_response("Test invalid JSON")

    @patch("haystack.pipelines.Pipeline.run")
    def test_missing_option_key_raises_exception(self, mock_run):
        mock_run.return_value = json.dumps({"a": "Option A", "b": "Option B", "c": "Option C", "answer": "Option A"})
        with self.assertRaises(ValueError):
            generate_multiple_choice_response("Test missing key")

    @patch("haystack.pipelines.Pipeline.run")
    def test_answer_not_in_options_raises_exception(self, mock_run):
        mock_run.return_value = json.dumps({"a": "Option A", "b": "Option B", "c": "Option C", "d": "Option D", "answer": "Option E"})
        with self.assertRaises(ValueError):
            generate_multiple_choice_response("Test answer mismatch")

Third Iteration: Updating the Implementation

Then, we add error handling to ensure robustness. The updated implementation includes checks to handle invalid JSON responses, missing option keys, and cases where the 'answer' key is not valid. If the JSON response is malformed or missing expected keys, the function raises appropriate exceptions instead of silently failing.

haystack_pipeline.py

def generate_multiple_choice_response(prompt):
    response = pipeline.run(query=prompt)
    try:
        response_json = json.loads(response)
    except json.JSONDecodeError:
        raise
    
    required_keys = ["a", "b", "c", "d"]
    for key in required_keys:
        if key not in response_json or not response_json[key]:
            raise ValueError(f"Missing or empty key: {key}")
    
    if "answer" not in response_json or response_json["answer"] not in [response_json[k] for k in required_keys]:
        raise ValueError("Invalid 'answer' key")
    
    return response_json

Conclusion

By iterating through tests and implementations, we systematically build a robust AI-driven application. The error handling added ensures that unexpected responses—such as non-JSON outputs, missing multiple-choice keys, or invalid answers—are caught early. This makes the function more reliable and prevents silent failures. TDD ensures reliability and maintainability, making it an essential practice for modern software development.