Chapter 02: DSPy Core Concepts and Components
Chapter 02: DSPy Core Concepts and Components
- Master DSPy’s Signature mechanism
- Understand the concept and usage of Modules
- Learn the working principles of Predictors
- Understand data types and structures in DSPy
- Master basic input/output processing
Key Concepts
1. DSPy Signatures
Signatures are the most core concept in DSPy, defining the input/output specifications of language model programs.
Basic Structure of Signatures
class MySignature(dspy.Signature):
"""Task description docstring"""
input_field = dspy.InputField(desc="Input field description")
output_field = dspy.OutputField(desc="Output field description")
Field Types and Attributes
| Field Type | Purpose | Example |
|---|---|---|
InputField | Define input parameters | question = dspy.InputField() |
OutputField | Define output results | answer = dspy.OutputField() |
Field Attribute Parameters
# Detailed field attributes
field = dspy.InputField(
desc="Field function description", # Describe field purpose
prefix="Prefix:", # Prefix for output
format=lambda x: x.strip() # Formatting function
)
2. DSPy Modules
Modules are the basic building blocks of DSPy, providing reusable functional components.
Basic Module Types
# Basic prediction module
class BasicPredict(dspy.Module):
def __init__(self, signature):
super().__init__()
self.predict = dspy.Predict(signature)
def forward(self, **kwargs):
return self.predict(**kwargs)
Chain of Thought Module
# Chain of Thought module
class CoTModule(dspy.Module):
def __init__(self, signature):
super().__init__()
self.predict = dspy.ChainOfThought(signature)
def forward(self, **kwargs):
return self.predict(**kwargs)
3. DSPy Predictors
Predictors are the components that perform actual reasoning, with different types suitable for different scenarios.
Predictor Hierarchy
Predictor Base Class
├── Predict - Basic prediction
├── ChainOfThought - Chain of thought reasoning
├── ProgramOfThought - Program thinking
├── ReAct - Reasoning-action loop
└── Retrieve - Retrieval augmentation
4. Data Types and Structures
DSPy Built-in Data Types
# Basic data types
text_field = dspy.InputField() # Text type
list_field = dspy.OutputField() # List type
dict_field = dspy.InputField() # Dictionary type
# Complex data types
class ComplexData(dspy.Signature):
structured_input = dspy.InputField(desc="Structured input")
json_output = dspy.OutputField(desc="JSON format output")
Example Code
Signature Definition Examples
import dspy
# 1. Simple Q&A signature
class SimpleQA(dspy.Signature):
"""Answer user questions"""
question = dspy.InputField(desc="User's question")
answer = dspy.OutputField(desc="Concise answer")
# 2. Complex analysis signature
class TextAnalysis(dspy.Signature):
"""Text sentiment analysis and keyword extraction"""
text = dspy.InputField(desc="Text to be analyzed")
sentiment = dspy.OutputField(desc="Sentiment: positive/negative/neutral")
keywords = dspy.OutputField(desc="Keyword list, comma-separated")
confidence = dspy.OutputField(desc="Confidence score (0-1)")
# 3. Multi-step reasoning signature
class MathWordProblem(dspy.Signature):
"""Solve math word problems"""
problem = dspy.InputField(desc="Math word problem")
reasoning = dspy.OutputField(desc="Problem-solving approach and steps")
answer = dspy.OutputField(desc="Final answer")
# Using signatures
def demo_signatures():
# Configure model
lm = dspy.OpenAI(model="gpt-3.5-turbo")
dspy.settings.configure(lm=lm)
# Simple Q&A
qa_predictor = dspy.Predict(SimpleQA)
result1 = qa_predictor(question="What is machine learning?")
print(f"Q&A result: {result1.answer}")
# Text analysis
analysis_predictor = dspy.ChainOfThought(TextAnalysis)
result2 = analysis_predictor(
text="The weather is really nice today, I'm very happy!"
)
print(f"Sentiment analysis: {result2.sentiment}")
print(f"Keywords: {result2.keywords}")
print(f"Confidence: {result2.confidence}")
Module Building Examples
class SmartQAModule(dspy.Module):
"""Smart Q&A module"""
def __init__(self):
super().__init__()
# Define internal components
self.classifier = dspy.Predict("question -> question_type")
self.simple_qa = dspy.Predict(SimpleQA)
self.complex_qa = dspy.ChainOfThought(MathWordProblem)
def forward(self, question):
# 1. Question classification
question_type = self.classifier(question=question).question_type
# 2. Choose processing method based on type
if "math" in question_type or "calculation" in question_type:
# Use chain of thought for math problems
result = self.complex_qa(problem=question)
return dspy.Prediction(
answer=result.answer,
reasoning=result.reasoning
)
else:
# Use simple prediction for general questions
result = self.simple_qa(question=question)
return dspy.Prediction(
answer=result.answer,
reasoning="Direct answer"
)
# Using the module
def demo_modules():
smart_qa = SmartQAModule()
# Test general question
result1 = smart_qa("What is artificial intelligence?")
print(f"General Q&A: {result1.answer}")
# Test math question
result2 = smart_qa("Xiaoming has 10 apples, ate 3, how many left?")
print(f"Math Q&A: {result2.answer}")
print(f"Reasoning process: {result2.reasoning}")
Predictor Detailed Examples
# 1. Basic predictor
class BasicPredict(dspy.Module):
def __init__(self, signature):
super().__init__()
self.predict = dspy.Predict(signature)
def forward(self, **kwargs):
return self.predict(**kwargs)
# 2. Chain of Thought predictor
class CoTPredict(dspy.Module):
def __init__(self, signature):
super().__init__()
self.cot = dspy.ChainOfThought(signature)
def forward(self, **kwargs):
return self.cot(**kwargs)
# 3. Program of Thought predictor
class PoTPredict(dspy.Module):
def __init__(self, signature):
super().__init__()
self.pot = dspy.ProgramOfThought(signature)
def forward(self, **kwargs):
return self.pot(**kwargs)
# Compare different predictors
def compare_predictors():
# Define math problem signature
class MathProblem(dspy.Signature):
"""Solve math problems"""
problem = dspy.InputField()
answer = dspy.OutputField()
# Create different predictors
basic = BasicPredict(MathProblem)
cot = CoTPredict(MathProblem)
pot = PoTPredict(MathProblem)
problem = "Calculate 15 * 23 + 47"
print("=== Predictor Comparison ===")
# Basic prediction
result1 = basic(problem=problem)
print(f"Basic prediction: {result1.answer}")
# Chain of thought prediction
result2 = cot(problem=problem)
print(f"Chain of thought prediction: {result2.answer}")
# Program of thought prediction
result3 = pot(problem=problem)
print(f"Program of thought prediction: {result3.answer}")
Data Processing and Flow Examples
class DataProcessor(dspy.Module):
"""Data processing module"""
def __init__(self):
super().__init__()
self.extractor = dspy.Predict(
"text -> entities, relationships"
)
self.summarizer = dspy.ChainOfThought(
"entities, relationships -> summary"
)
def forward(self, raw_text):
# 1. Data extraction
extraction = self.extractor(text=raw_text)
# 2. Data processing
processed_entities = self._process_entities(
extraction.entities
)
processed_relationships = self._process_relationships(
extraction.relationships
)
# 3. Generate summary
summary = self.summarizer(
entities=processed_entities,
relationships=processed_relationships
)
return dspy.Prediction(
entities=processed_entities,
relationships=processed_relationships,
summary=summary.summary
)
def _process_entities(self, entities_text):
"""Process entity data"""
# Simple text processing
entities = [e.strip() for e in entities_text.split(',')]
return entities
def _process_relationships(self, relationships_text):
"""Process relationship data"""
relationships = [r.strip() for r in relationships_text.split(';')]
return relationships
# Using the data processor
def demo_data_processing():
processor = DataProcessor()
text = """
Apple Inc. is an American multinational technology company.
The company was founded by Steve Jobs in 1976.
Apple mainly produces iPhone, iPad and other products.
"""
result = processor(raw_text=text)
print("Entities:", result.entities)
print("Relationships:", result.relationships)
print("Summary:", result.summary)
Advanced Composition Examples
class AdvancedPipeline(dspy.Module):
"""Advanced processing pipeline"""
def __init__(self):
super().__init__()
# Multiple processing stages
self.preprocessor = dspy.Predict(
"raw_input -> cleaned_input, metadata"
)
self.analyzer = dspy.ChainOfThought(
"cleaned_input, metadata -> analysis, insights"
)
self.postprocessor = dspy.Predict(
"analysis, insights -> final_output, confidence"
)
def forward(self, raw_input, context=None):
# Stage 1: Preprocessing
stage1 = self.preprocessor(raw_input=raw_input)
# Stage 2: Analysis (considering context)
if context:
analysis_input = f"{stage1.cleaned_input}\n\nContext: {context}"
else:
analysis_input = stage1.cleaned_input
stage2 = self.analyzer(
cleaned_input=analysis_input,
metadata=stage1.metadata
)
# Stage 3: Post-processing
stage3 = self.postprocessor(
analysis=stage2.analysis,
insights=stage2.insights
)
return dspy.Prediction(
raw_input=raw_input,
processed_input=stage1.cleaned_input,
metadata=stage1.metadata,
analysis=stage2.analysis,
insights=stage2.insights,
final_output=stage3.final_output,
confidence=stage3.confidence
)
# Using the advanced pipeline
def demo_advanced_pipeline():
pipeline = AdvancedPipeline()
result = pipeline(
raw_input=" This is a test text, with some noise data!!!",
context="This is a text processing example"
)
print("Raw input:", result.raw_input)
print("Processed input:", result.processed_input)
print("Metadata:", result.metadata)
print("Analysis:", result.analysis)
print("Insights:", result.insights)
print("Final output:", result.final_output)
print("Confidence:", result.confidence)
Practice Exercises
Exercise 1: Custom Signatures
Create an article summarization signature that includes article content, summary, keywords, and other fields.
# Your exercise code
class ArticleSummary(dspy.Signature):
"""Your article summary signature"""
# Define your fields
pass
Exercise 2: Composite Modules
Build a complex module containing multiple processing steps.
class ComplexModule(dspy.Module):
def __init__(self):
super().__init__()
# Define your sub-modules
pass
def forward(self, **kwargs):
# Implement processing logic
pass
Exercise 3: Predictor Comparison
Compare the performance differences of different predictors on the same task.
def compare_my_predictors():
# Implement your comparison experiment
pass
- Make signature descriptions clear and specific to help the model understand the task
- Use different types of predictors appropriately - Predict for simple tasks, ChainOfThought for complex reasoning
- Module design should consider reusability and maintainability
- Pay attention to data flow and format conversion between modules
- Avoid creating predictor instances in loops
- Cache repeated computation results
- Set max_tokens parameter appropriately
- Consider using batch processing to improve efficiency
Chapter Summary
Through this chapter, you have mastered:
- Signature Mechanism: Defining clear input/output specifications
- Module Building: Creating reusable functional components
- Predictor Usage: Choosing appropriate reasoning methods
- Data Processing: Handling different types of inputs and outputs
- Composition Patterns: Building complex processing pipelines
These core concepts lay the foundation for building more complex DSPy applications. In the next chapter, we will dive deeper into the detailed usage and best practices of various predictors.