Chapter 1: Introduction to Apache Airflow 3.x

Haiyue
5min

Chapter 1: Introduction to Apache Airflow 3.x

Learning Objectives
  • Understand what Apache Airflow is and its use cases
  • Learn the key differences between Airflow 3.x and Airflow 2.x
  • Install and configure Airflow 3.x development environment
  • Navigate the Airflow Web UI and CLI

Knowledge Points

1.1 What is Apache Airflow?

Apache Airflow is an open-source platform for programmatically authoring, scheduling, and monitoring workflows. It allows you to define workflows as Directed Acyclic Graphs (DAGs) of tasks using Python code.

Common Use Cases:

  • ETL/ELT data pipelines
  • Machine learning model training and deployment
  • Infrastructure automation
  • Report generation and data analytics
  • Data quality monitoring

1.2 Key Features of Airflow 3.x

Airflow 3.x introduces several major improvements over 2.x:

FeatureAirflow 2.xAirflow 3.x
Task ExecutionTightly coupledTask Execution Interface (decoupled)
DAG VersioningNot supportedBuilt-in DAG versioning
UIFlask-basedModernized React UI
APIREST API v1Enhanced REST API
Multi-tenancyLimitedImproved isolation

1.3 Installing Airflow 3.x

Method 1: Using pip (Development)

# Create a virtual environment
python3 -m venv airflow_venv
source airflow_venv/bin/activate

# Set Airflow home directory
export AIRFLOW_HOME=~/airflow

# Install Airflow 3.x with constraints
pip install "apache-airflow==3.0.0" \
  --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-3.0.0/constraints-3.9.txt"

# Initialize the database
airflow db migrate

# Create an admin user
airflow users create \
  --username admin \
  --firstname Admin \
  --lastname User \
  --role Admin \
  --email admin@example.com \
  --password admin
# docker-compose.yaml
version: '3.8'
services:
  airflow-webserver:
    image: apache/airflow:3.0.0
    command: webserver
    ports:
      - "8080:8080"
    environment:
      - AIRFLOW__CORE__EXECUTOR=LocalExecutor
      - AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@postgres/airflow
    volumes:
      - ./dags:/opt/airflow/dags
    depends_on:
      - postgres

  airflow-scheduler:
    image: apache/airflow:3.0.0
    command: scheduler
    environment:
      - AIRFLOW__CORE__EXECUTOR=LocalExecutor
      - AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@postgres/airflow
    volumes:
      - ./dags:/opt/airflow/dags
    depends_on:
      - postgres

  postgres:
    image: postgres:15
    environment:
      - POSTGRES_USER=airflow
      - POSTGRES_PASSWORD=airflow
      - POSTGRES_DB=airflow
    volumes:
      - postgres-data:/var/lib/postgresql/data

volumes:
  postgres-data:
# Start the environment
docker compose up -d

# Check service status
docker compose ps

1.4 Navigating the Airflow UI

After starting Airflow, access the Web UI at http://localhost:8080.

Key UI Pages:

  • DAGs List - Overview of all DAGs with status and toggle switches
  • DAG Detail - Graph view, Grid view, and Code view of a specific DAG
  • Task Instance - Logs, rendered template, and task details
  • Admin - Connections, Variables, Pools, and XCom management

1.5 Essential CLI Commands

# List all DAGs
airflow dags list

# Trigger a DAG run
airflow dags trigger my_dag

# List tasks in a DAG
airflow tasks list my_dag

# Test a specific task
airflow tasks test my_dag my_task 2026-01-01

# Check Airflow version
airflow version

# View configuration
airflow config list

Practice Exercise

Exercise 1: Setup and Explore

  1. Install Airflow 3.x using either pip or Docker Compose
  2. Start the webserver and scheduler
  3. Log into the Web UI and explore the example DAGs
  4. Use the CLI to list available DAGs and check their status
# Start Airflow standalone (development only)
airflow standalone

# This starts webserver + scheduler + creates admin user
# Access at http://localhost:8080 (credentials shown in terminal)

Exercise 2: Explore the Example DAGs

# Enable example DAGs by setting this in airflow.cfg
# [core]
# load_examples = True

# List example DAGs
airflow dags list | head -20

# Check details of a specific example DAG
airflow dags show example_bash_operator

Summary

In this chapter, you learned:

  • Airflow is a workflow orchestration platform using Python-defined DAGs
  • Airflow 3.x introduces Task Execution Interface, DAG versioning, and improved UI
  • How to install and configure Airflow 3.x for development
  • How to navigate the Web UI and use essential CLI commands