Chapter 02: DSPy Core Concepts and Components

Haiyue

September 1, 2025

14min

Chapter 02: DSPy Core Concepts and Components

Learning Objectives

Master DSPy’s Signature mechanism
Understand the concept and usage of Modules
Learn the working principles of Predictors
Understand data types and structures in DSPy
Master basic input/output processing

Key Concepts

1. DSPy Signatures

Signatures are the most core concept in DSPy, defining the input/output specifications of language model programs.

Basic Structure of Signatures

class MySignature(dspy.Signature):
    """Task description docstring"""
    input_field = dspy.InputField(desc="Input field description")
    output_field = dspy.OutputField(desc="Output field description")

Field Types and Attributes

Field Type	Purpose	Example
`InputField`	Define input parameters	`question = dspy.InputField()`
`OutputField`	Define output results	`answer = dspy.OutputField()`

Field Attribute Parameters

# Detailed field attributes
field = dspy.InputField(
    desc="Field function description",     # Describe field purpose
    prefix="Prefix:",                       # Prefix for output
    format=lambda x: x.strip()             # Formatting function
)

2. DSPy Modules

Modules are the basic building blocks of DSPy, providing reusable functional components.

Basic Module Types

# Basic prediction module
class BasicPredict(dspy.Module):
    def __init__(self, signature):
        super().__init__()
        self.predict = dspy.Predict(signature)

    def forward(self, **kwargs):
        return self.predict(**kwargs)

Chain of Thought Module

# Chain of Thought module
class CoTModule(dspy.Module):
    def __init__(self, signature):
        super().__init__()
        self.predict = dspy.ChainOfThought(signature)

    def forward(self, **kwargs):
        return self.predict(**kwargs)

3. DSPy Predictors

Predictors are the components that perform actual reasoning, with different types suitable for different scenarios.

Predictor Hierarchy

Predictor Base Class
├── Predict - Basic prediction
├── ChainOfThought - Chain of thought reasoning
├── ProgramOfThought - Program thinking
├── ReAct - Reasoning-action loop
└── Retrieve - Retrieval augmentation

4. Data Types and Structures

DSPy Built-in Data Types

# Basic data types
text_field = dspy.InputField()        # Text type
list_field = dspy.OutputField()       # List type
dict_field = dspy.InputField()        # Dictionary type

# Complex data types
class ComplexData(dspy.Signature):
    structured_input = dspy.InputField(desc="Structured input")
    json_output = dspy.OutputField(desc="JSON format output")

Example Code

Signature Definition Examples

import dspy

# 1. Simple Q&A signature
class SimpleQA(dspy.Signature):
    """Answer user questions"""
    question = dspy.InputField(desc="User's question")
    answer = dspy.OutputField(desc="Concise answer")

# 2. Complex analysis signature
class TextAnalysis(dspy.Signature):
    """Text sentiment analysis and keyword extraction"""
    text = dspy.InputField(desc="Text to be analyzed")
    sentiment = dspy.OutputField(desc="Sentiment: positive/negative/neutral")
    keywords = dspy.OutputField(desc="Keyword list, comma-separated")
    confidence = dspy.OutputField(desc="Confidence score (0-1)")

# 3. Multi-step reasoning signature
class MathWordProblem(dspy.Signature):
    """Solve math word problems"""
    problem = dspy.InputField(desc="Math word problem")
    reasoning = dspy.OutputField(desc="Problem-solving approach and steps")
    answer = dspy.OutputField(desc="Final answer")

# Using signatures
def demo_signatures():
    # Configure model
    lm = dspy.OpenAI(model="gpt-3.5-turbo")
    dspy.settings.configure(lm=lm)

    # Simple Q&A
    qa_predictor = dspy.Predict(SimpleQA)
    result1 = qa_predictor(question="What is machine learning?")
    print(f"Q&A result: {result1.answer}")

    # Text analysis
    analysis_predictor = dspy.ChainOfThought(TextAnalysis)
    result2 = analysis_predictor(
        text="The weather is really nice today, I'm very happy!"
    )
    print(f"Sentiment analysis: {result2.sentiment}")
    print(f"Keywords: {result2.keywords}")
    print(f"Confidence: {result2.confidence}")

Module Building Examples

class SmartQAModule(dspy.Module):
    """Smart Q&A module"""

    def __init__(self):
        super().__init__()
        # Define internal components
        self.classifier = dspy.Predict("question -> question_type")
        self.simple_qa = dspy.Predict(SimpleQA)
        self.complex_qa = dspy.ChainOfThought(MathWordProblem)

    def forward(self, question):
        # 1. Question classification
        question_type = self.classifier(question=question).question_type

        # 2. Choose processing method based on type
        if "math" in question_type or "calculation" in question_type:
            # Use chain of thought for math problems
            result = self.complex_qa(problem=question)
            return dspy.Prediction(
                answer=result.answer,
                reasoning=result.reasoning
            )
        else:
            # Use simple prediction for general questions
            result = self.simple_qa(question=question)
            return dspy.Prediction(
                answer=result.answer,
                reasoning="Direct answer"
            )

# Using the module
def demo_modules():
    smart_qa = SmartQAModule()

    # Test general question
    result1 = smart_qa("What is artificial intelligence?")
    print(f"General Q&A: {result1.answer}")

    # Test math question
    result2 = smart_qa("Xiaoming has 10 apples, ate 3, how many left?")
    print(f"Math Q&A: {result2.answer}")
    print(f"Reasoning process: {result2.reasoning}")

Predictor Detailed Examples

# 1. Basic predictor
class BasicPredict(dspy.Module):
    def __init__(self, signature):
        super().__init__()
        self.predict = dspy.Predict(signature)

    def forward(self, **kwargs):
        return self.predict(**kwargs)

# 2. Chain of Thought predictor
class CoTPredict(dspy.Module):
    def __init__(self, signature):
        super().__init__()
        self.cot = dspy.ChainOfThought(signature)

    def forward(self, **kwargs):
        return self.cot(**kwargs)

# 3. Program of Thought predictor
class PoTPredict(dspy.Module):
    def __init__(self, signature):
        super().__init__()
        self.pot = dspy.ProgramOfThought(signature)

    def forward(self, **kwargs):
        return self.pot(**kwargs)

# Compare different predictors
def compare_predictors():
    # Define math problem signature
    class MathProblem(dspy.Signature):
        """Solve math problems"""
        problem = dspy.InputField()
        answer = dspy.OutputField()

    # Create different predictors
    basic = BasicPredict(MathProblem)
    cot = CoTPredict(MathProblem)
    pot = PoTPredict(MathProblem)

    problem = "Calculate 15 * 23 + 47"

    print("=== Predictor Comparison ===")

    # Basic prediction
    result1 = basic(problem=problem)
    print(f"Basic prediction: {result1.answer}")

    # Chain of thought prediction
    result2 = cot(problem=problem)
    print(f"Chain of thought prediction: {result2.answer}")

    # Program of thought prediction
    result3 = pot(problem=problem)
    print(f"Program of thought prediction: {result3.answer}")

Data Processing and Flow Examples

class DataProcessor(dspy.Module):
    """Data processing module"""

    def __init__(self):
        super().__init__()
        self.extractor = dspy.Predict(
            "text -> entities, relationships"
        )
        self.summarizer = dspy.ChainOfThought(
            "entities, relationships -> summary"
        )

    def forward(self, raw_text):
        # 1. Data extraction
        extraction = self.extractor(text=raw_text)

        # 2. Data processing
        processed_entities = self._process_entities(
            extraction.entities
        )
        processed_relationships = self._process_relationships(
            extraction.relationships
        )

        # 3. Generate summary
        summary = self.summarizer(
            entities=processed_entities,
            relationships=processed_relationships
        )

        return dspy.Prediction(
            entities=processed_entities,
            relationships=processed_relationships,
            summary=summary.summary
        )

    def _process_entities(self, entities_text):
        """Process entity data"""
        # Simple text processing
        entities = [e.strip() for e in entities_text.split(',')]
        return entities

    def _process_relationships(self, relationships_text):
        """Process relationship data"""
        relationships = [r.strip() for r in relationships_text.split(';')]
        return relationships

# Using the data processor
def demo_data_processing():
    processor = DataProcessor()

    text = """
    Apple Inc. is an American multinational technology company.
    The company was founded by Steve Jobs in 1976.
    Apple mainly produces iPhone, iPad and other products.
    """

    result = processor(raw_text=text)

    print("Entities:", result.entities)
    print("Relationships:", result.relationships)
    print("Summary:", result.summary)

Advanced Composition Examples

class AdvancedPipeline(dspy.Module):
    """Advanced processing pipeline"""

    def __init__(self):
        super().__init__()

        # Multiple processing stages
        self.preprocessor = dspy.Predict(
            "raw_input -> cleaned_input, metadata"
        )

        self.analyzer = dspy.ChainOfThought(
            "cleaned_input, metadata -> analysis, insights"
        )

        self.postprocessor = dspy.Predict(
            "analysis, insights -> final_output, confidence"
        )

    def forward(self, raw_input, context=None):
        # Stage 1: Preprocessing
        stage1 = self.preprocessor(raw_input=raw_input)

        # Stage 2: Analysis (considering context)
        if context:
            analysis_input = f"{stage1.cleaned_input}\n\nContext: {context}"
        else:
            analysis_input = stage1.cleaned_input

        stage2 = self.analyzer(
            cleaned_input=analysis_input,
            metadata=stage1.metadata
        )

        # Stage 3: Post-processing
        stage3 = self.postprocessor(
            analysis=stage2.analysis,
            insights=stage2.insights
        )

        return dspy.Prediction(
            raw_input=raw_input,
            processed_input=stage1.cleaned_input,
            metadata=stage1.metadata,
            analysis=stage2.analysis,
            insights=stage2.insights,
            final_output=stage3.final_output,
            confidence=stage3.confidence
        )

# Using the advanced pipeline
def demo_advanced_pipeline():
    pipeline = AdvancedPipeline()

    result = pipeline(
        raw_input="    This is a test text, with some noise data!!!",
        context="This is a text processing example"
    )

    print("Raw input:", result.raw_input)
    print("Processed input:", result.processed_input)
    print("Metadata:", result.metadata)
    print("Analysis:", result.analysis)
    print("Insights:", result.insights)
    print("Final output:", result.final_output)
    print("Confidence:", result.confidence)

Practice Exercises

Exercise 1: Custom Signatures

Create an article summarization signature that includes article content, summary, keywords, and other fields.

# Your exercise code
class ArticleSummary(dspy.Signature):
    """Your article summary signature"""
    # Define your fields
    pass

Exercise 2: Composite Modules

Build a complex module containing multiple processing steps.

class ComplexModule(dspy.Module):
    def __init__(self):
        super().__init__()
        # Define your sub-modules
        pass

    def forward(self, **kwargs):
        # Implement processing logic
        pass

Exercise 3: Predictor Comparison

Compare the performance differences of different predictors on the same task.

def compare_my_predictors():
    # Implement your comparison experiment
    pass

Best Practices

Make signature descriptions clear and specific to help the model understand the task
Use different types of predictors appropriately - Predict for simple tasks, ChainOfThought for complex reasoning
Module design should consider reusability and maintainability
Pay attention to data flow and format conversion between modules

Performance Tips

Avoid creating predictor instances in loops
Cache repeated computation results
Set max_tokens parameter appropriately
Consider using batch processing to improve efficiency

Chapter Summary

Through this chapter, you have mastered:

Signature Mechanism: Defining clear input/output specifications
Module Building: Creating reusable functional components
Predictor Usage: Choosing appropriate reasoning methods
Data Processing: Handling different types of inputs and outputs
Composition Patterns: Building complex processing pipelines

These core concepts lay the foundation for building more complex DSPy applications. In the next chapter, we will dive deeper into the detailed usage and best practices of various predictors.

P Info

Project Cards Demo

Trading Chart Demo

Basic Skills

001. Genix Ventures Overview

P Info

Project Cards Demo

Trading Chart Demo

Basic Skills

001. Genix Ventures Overview

Chapter 02: DSPy Core Concepts and Components

Chapter 02: DSPy Core Concepts and Components

Key Concepts

1. DSPy Signatures

Basic Structure of Signatures

Field Types and Attributes

Field Attribute Parameters

2. DSPy Modules

Basic Module Types

Chain of Thought Module

3. DSPy Predictors

Predictor Hierarchy

4. Data Types and Structures

DSPy Built-in Data Types

Example Code

Signature Definition Examples

Module Building Examples

Predictor Detailed Examples

Data Processing and Flow Examples

Advanced Composition Examples

Practice Exercises

Exercise 1: Custom Signatures

Exercise 2: Composite Modules

Exercise 3: Predictor Comparison

Chapter Summary