Chapter 4: Parameterized Tests and Data-Driven Testing

Haiyue

October 3, 2025

16min

Chapter 4: Parameterized Tests and Data-Driven Testing

Learning Objectives

Master the @pytest.mark.parametrize decorator
Learn parameterization with multiple parameters and complex data
Understand best practices for parameterized testing
Master dynamic parameter generation

Knowledge Points

Parameterized Testing Concept

Parameterized testing allows running the same test function multiple times with different input data, which:

Improves test coverage: Validate the same logic with multiple data sets
Reduces code duplication: Avoid writing duplicate code for similar tests
Data-driven testing: Separate test data from test logic
Boundary value testing: Easily test various boundary conditions

Parameterization Decorator Syntax

@pytest.mark.parametrize("parameter_name", [parameter_value_list])
@pytest.mark.parametrize("param_name1,param_name2", [(value1, value2), ...])
@pytest.mark.parametrize("parameter_name", [value_list], ids=[identifier_list])

Advantages of Parameterization

Advantage	Description
Code reuse	One test function handles multiple cases
Clear reports	Each parameter combination has independent test results
Easy to maintain	Adding new test data only requires modifying parameter list
Parallel execution	Tests with different parameters can run in parallel

Example Code

Basic Parameterized Tests

# test_parametrize_basic.py
import pytest

def add(a, b):
    """Simple addition function"""
    return a + b

@pytest.mark.parametrize("a,b,expected", [
    (2, 3, 5),
    (0, 0, 0),
    (-1, 1, 0),
    (10, -5, 5),
    (100, 200, 300)
])
def test_add_function(a, b, expected):
    """Parameterized test for addition function"""
    result = add(a, b)
    assert result == expected

# Single parameter parameterization
@pytest.mark.parametrize("number", [1, 2, 3, 4, 5])
def test_positive_numbers(number):
    """Test positive numbers"""
    assert number > 0
    assert isinstance(number, int)

# String parameterization
@pytest.mark.parametrize("text", [
    "hello",
    "world",
    "pytest",
    "parametrize"
])
def test_string_length(text):
    """Test string length"""
    assert len(text) > 0
    assert isinstance(text, str)

Complex Data Structure Parameterization

# test_parametrize_complex.py
import pytest

# Dictionary parameterization
@pytest.mark.parametrize("user_data", [
    {"name": "Alice", "age": 30, "email": "alice@example.com"},
    {"name": "Bob", "age": 25, "email": "bob@example.com"},
    {"name": "Charlie", "age": 35, "email": "charlie@example.com"}
])
def test_user_validation(user_data):
    """Test user data validation"""
    assert "name" in user_data
    assert "age" in user_data
    assert "email" in user_data
    assert user_data["age"] > 0
    assert "@" in user_data["email"]

# List parameterization
@pytest.mark.parametrize("numbers", [
    [1, 2, 3],
    [10, 20, 30, 40],
    [100],
    [5, 5, 5, 5, 5]
])
def test_list_operations(numbers):
    """Test list operations"""
    assert len(numbers) > 0
    assert sum(numbers) > 0
    assert max(numbers) >= min(numbers)

# Nested data structures
@pytest.mark.parametrize("test_case", [
    {
        "input": {"username": "alice", "password": "secret123"},
        "expected": {"status": "success", "user_id": 1}
    },
    {
        "input": {"username": "bob", "password": "password456"},
        "expected": {"status": "success", "user_id": 2}
    },
    {
        "input": {"username": "invalid", "password": "wrong"},
        "expected": {"status": "error", "message": "Invalid credentials"}
    }
])
def test_login_scenarios(test_case):
    """Test login scenarios"""
    def mock_login(username, password):
        """Mock login function"""
        users = {
            "alice": {"password": "secret123", "user_id": 1},
            "bob": {"password": "password456", "user_id": 2}
        }

        if username in users and users[username]["password"] == password:
            return {"status": "success", "user_id": users[username]["user_id"]}
        else:
            return {"status": "error", "message": "Invalid credentials"}

    input_data = test_case["input"]
    expected = test_case["expected"]

    result = mock_login(input_data["username"], input_data["password"])

    assert result["status"] == expected["status"]
    if "user_id" in expected:
        assert result["user_id"] == expected["user_id"]
    if "message" in expected:
        assert result["message"] == expected["message"]

Multiple Parameterization and Combination Tests

# test_multiple_parametrize.py
import pytest

# Multiple parameterization - Cartesian product
@pytest.mark.parametrize("x", [1, 2, 3])
@pytest.mark.parametrize("y", [10, 20])
def test_multiplication_combinations(x, y):
    """Test multiplication combinations (3×2=6 tests)"""
    result = x * y
    assert result > 0
    assert result == x * y

# Operating system and browser combination tests
@pytest.mark.parametrize("os", ["Windows", "macOS", "Linux"])
@pytest.mark.parametrize("browser", ["Chrome", "Firefox", "Safari"])
def test_browser_os_compatibility(os, browser):
    """Test browser and OS compatibility"""
    # Simulate compatibility check
    incompatible = [
        ("Linux", "Safari"),
        ("Windows", "Safari")
    ]

    if (os, browser) in incompatible:
        pytest.skip(f"{browser} not available on {os}")

    # Execute compatibility test
    assert f"Testing {browser} on {os}" is not None

# Data type and operation combination
@pytest.mark.parametrize("data_type", ["list", "tuple", "set"])
@pytest.mark.parametrize("operation", ["len", "bool", "iter"])
def test_data_type_operations(data_type, operation):
    """Test data type and operation combinations"""
    # Create test data
    test_data = {
        "list": [1, 2, 3],
        "tuple": (1, 2, 3),
        "set": {1, 2, 3}
    }

    data = test_data[data_type]

    if operation == "len":
        assert len(data) == 3
    elif operation == "bool":
        assert bool(data) is True
    elif operation == "iter":
        assert list(iter(data)) is not None

Custom Test IDs and Descriptions

# test_custom_ids.py
import pytest

# Use ids parameter to customize test identifiers
@pytest.mark.parametrize("input,expected", [
    (2, 4),
    (3, 9),
    (4, 16),
    (5, 25)
], ids=["square_of_2", "square_of_3", "square_of_4", "square_of_5"])
def test_square_with_ids(input, expected):
    """Square test with custom IDs"""
    assert input ** 2 == expected

# Use function to generate test IDs
def test_id_func(val):
    """Function to generate test IDs"""
    if isinstance(val, dict):
        return f"user_{val.get('name', 'unknown')}"
    elif isinstance(val, list):
        return f"list_len_{len(val)}"
    else:
        return str(val)

@pytest.mark.parametrize("test_data", [
    {"name": "Alice", "age": 30},
    {"name": "Bob", "age": 25},
    [1, 2, 3, 4],
    [10, 20],
    "simple_string"
], ids=test_id_func)
def test_with_id_function(test_data):
    """Test using function-generated IDs"""
    assert test_data is not None

# Use pytest.param to set marks for individual parameters
@pytest.mark.parametrize("value", [
    pytest.param(1, id="positive"),
    pytest.param(0, id="zero"),
    pytest.param(-1, id="negative"),
    pytest.param(1000, id="large_positive", marks=pytest.mark.slow),
    pytest.param(-1000, id="large_negative", marks=pytest.mark.slow)
])
def test_with_param_marks(value):
    """Test using pytest.param"""
    assert isinstance(value, int)

Dynamic Parameter Generation

# test_dynamic_params.py
import pytest

def generate_test_data():
    """Dynamically generate test data"""
    base_cases = []

    # Generate numeric test cases
    for i in range(1, 6):
        base_cases.append((i, i * 2, i + i))

    # Generate boundary value test cases
    boundary_cases = [
        (0, 0, 0),
        (-1, -2, -1 + -1),
        (100, 200, 100 + 100)
    ]

    return base_cases + boundary_cases

@pytest.mark.parametrize("a,b,expected", generate_test_data())
def test_dynamic_addition(a, b, expected):
    """Addition test with dynamically generated parameters"""
    assert a + a == expected
    assert a * 2 == b

# Read test data from external file
def read_test_data_from_config():
    """Read test data from configuration"""
    # Simulate reading from file or database
    return [
        {"url": "https://api.example.com/users", "method": "GET", "expected_status": 200},
        {"url": "https://api.example.com/posts", "method": "POST", "expected_status": 201},
        {"url": "https://api.example.com/invalid", "method": "GET", "expected_status": 404}
    ]

@pytest.mark.parametrize("api_config", read_test_data_from_config())
def test_api_endpoints(api_config):
    """API endpoint test"""
    def mock_api_call(url, method):
        """Mock API call"""
        if "invalid" in url:
            return 404
        elif method == "POST":
            return 201
        else:
            return 200

    result = mock_api_call(api_config["url"], api_config["method"])
    assert result == api_config["expected_status"]

# Conditional parameter generation
def get_math_test_cases():
    """Generate math test cases based on conditions"""
    import sys

    basic_cases = [
        (2, 3, 5),
        (10, 5, 15)
    ]

    # Add additional tests on specific Python versions
    if sys.version_info >= (3, 8):
        basic_cases.append((100, 200, 300))

    return basic_cases

@pytest.mark.parametrize("a,b,expected", get_math_test_cases())
def test_conditional_math(a, b, expected):
    """Conditional math test"""
    assert a + b == expected

Parameterization Combined with Fixtures

# test_params_with_fixtures.py
import pytest

@pytest.fixture
def database_connection():
    """Database connection fixture"""
    class MockDB:
        def __init__(self):
            self.data = {
                1: {"name": "Alice", "age": 30},
                2: {"name": "Bob", "age": 25},
                3: {"name": "Charlie", "age": 35}
            }

        def get_user(self, user_id):
            return self.data.get(user_id)

    return MockDB()

@pytest.mark.parametrize("user_id,expected_name", [
    (1, "Alice"),
    (2, "Bob"),
    (3, "Charlie")
])
def test_database_queries(database_connection, user_id, expected_name):
    """Parameterized database query test"""
    user = database_connection.get_user(user_id)
    assert user is not None
    assert user["name"] == expected_name

# Parameterized fixture
@pytest.fixture(params=["json", "xml", "yaml"])
def data_format(request):
    """Parameterized data format fixture"""
    return request.param

@pytest.fixture(params=[10, 100, 1000])
def sample_size(request):
    """Parameterized sample size fixture"""
    return request.param

def test_data_processing(data_format, sample_size):
    """Test data processing (will run 3×3=9 times)"""
    assert data_format in ["json", "xml", "yaml"]
    assert sample_size in [10, 100, 1000]

    # Simulate data processing
    processing_time = sample_size * 0.001  # Assumed processing time
    if data_format == "xml":
        processing_time *= 1.5  # XML processing is slower

    assert processing_time > 0

Conditional Skip and Marks

# test_conditional_params.py
import pytest
import sys

@pytest.mark.parametrize("platform,feature", [
    pytest.param("windows", "ntfs", marks=pytest.mark.skipif(
        sys.platform != "win32", reason="Windows only")),
    pytest.param("linux", "ext4", marks=pytest.mark.skipif(
        sys.platform == "win32", reason="Linux only")),
    pytest.param("macos", "apfs", marks=pytest.mark.skipif(
        sys.platform != "darwin", reason="macOS only")),
    ("any", "generic")
])
def test_platform_features(platform, feature):
    """Platform-specific feature test"""
    assert platform is not None
    assert feature is not None

# Slow test marks
@pytest.mark.parametrize("size", [
    10,
    100,
    pytest.param(10000, marks=pytest.mark.slow),
    pytest.param(100000, marks=pytest.mark.slow)
])
def test_performance_with_size(size):
    """Performance test (large datasets marked as slow)"""
    # Simulate processing time
    import time
    if size > 1000:
        time.sleep(0.01)  # Simulate slow operation

    assert size > 0

# Expected failure parameters
@pytest.mark.parametrize("input_val,expected", [
    (1, 1),
    (2, 4),
    (3, 9),
    pytest.param(4, 15, marks=pytest.mark.xfail(reason="Known bug in square calculation"))
])
def test_square_with_known_bug(input_val, expected):
    """Square test with known bug"""
    result = input_val ** 2
    assert result == expected

Parameterized Testing Best Practices

Test Data Organization

# test_data_organization.py
import pytest

# Separate test data at module level
VALID_EMAILS = [
    "user@example.com",
    "test.email+tag@domain.co.uk",
    "user123@test-domain.com"
]

INVALID_EMAILS = [
    "invalid-email",
    "@domain.com",
    "user@",
    ""
]

@pytest.mark.parametrize("email", VALID_EMAILS)
def test_valid_email_format(email):
    """Test valid email format"""
    assert "@" in email
    assert "." in email.split("@")[1]

@pytest.mark.parametrize("email", INVALID_EMAILS)
def test_invalid_email_format(email):
    """Test invalid email format"""
    def is_valid_email(email_str):
        return "@" in email_str and "." in email_str.split("@")[1] if "@" in email_str else False

    assert not is_valid_email(email)

Running Parameterized Tests

# Run all parameterized tests
pytest -v

# Run tests with specific parameters
pytest -k "square_of_3" -v

# Run slow tests
pytest -m slow -v

# Skip slow tests
pytest -m "not slow" -v

# Show detailed information about parameterized tests
pytest --collect-only

Parameterized Testing Tips

Meaningful naming: Use descriptive parameter names and test IDs
Separate data: Separate test data from test logic
Boundary testing: Include boundary values and exceptional cases
Performance considerations: Use appropriate marks for large datasets
Readability: Maintain readability and maintainability of parameterized tests

Considerations

Avoid over-parameterization: Don’t parameterize just for the sake of it
Memory usage: Many parameters may consume significant memory
Execution time: Parameterized tests increase total execution time
Debugging difficulty: Debugging parameterized tests may be more complex

Parameterized Test Output Example

$ pytest test_parametrize_basic.py -v

test_parametrize_basic.py::test_add_function[2-3-5] PASSED
test_parametrize_basic.py::test_add_function[0-0-0] PASSED
test_parametrize_basic.py::test_add_function[-1-1-0] PASSED
test_parametrize_basic.py::test_add_function[10--5-5] PASSED
test_parametrize_basic.py::test_add_function[100-200-300] PASSED
test_parametrize_basic.py::test_positive_numbers[1] PASSED
test_parametrize_basic.py::test_positive_numbers[2] PASSED
test_parametrize_basic.py::test_positive_numbers[3] PASSED
test_parametrize_basic.py::test_positive_numbers[4] PASSED
test_parametrize_basic.py::test_positive_numbers[5] PASSED

Parameterized testing is a powerful feature of pytest that can significantly improve test coverage and code reusability, making it a core tool for implementing data-driven testing.

Apache Airflow 3.x

Chapter 1: Introduction to Apache Airflow 3.x

Chapter 2: Core Concepts and Architecture

Chapter 3: Writing Your First DAG

Chapter 4: Built-in Operators and Sensors

Apache Airflow 3.x

Chapter 1: Introduction to Apache Airflow 3.x

Chapter 2: Core Concepts and Architecture

Chapter 3: Writing Your First DAG

Chapter 4: Built-in Operators and Sensors

Chapter 4: Parameterized Tests and Data-Driven Testing

Chapter 4: Parameterized Tests and Data-Driven Testing

Knowledge Points

Parameterized Testing Concept

Parameterization Decorator Syntax

Advantages of Parameterization

Example Code

Basic Parameterized Tests

Complex Data Structure Parameterization

Multiple Parameterization and Combination Tests

Custom Test IDs and Descriptions

Dynamic Parameter Generation

Parameterization Combined with Fixtures

Conditional Skip and Marks

Parameterized Testing Best Practices

Test Data Organization

Running Parameterized Tests

Parameterized Test Output Example