Chapter 1: Introduction to Apache Airflow 3.x
Haiyue
5min
Chapter 1: Introduction to Apache Airflow 3.x
Learning Objectives
- Understand what Apache Airflow is and its use cases
- Learn the key differences between Airflow 3.x and Airflow 2.x
- Install and configure Airflow 3.x development environment
- Navigate the Airflow Web UI and CLI
Knowledge Points
1.1 What is Apache Airflow?
Apache Airflow is an open-source platform for programmatically authoring, scheduling, and monitoring workflows. It allows you to define workflows as Directed Acyclic Graphs (DAGs) of tasks using Python code.
Common Use Cases:
- ETL/ELT data pipelines
- Machine learning model training and deployment
- Infrastructure automation
- Report generation and data analytics
- Data quality monitoring
1.2 Key Features of Airflow 3.x
Airflow 3.x introduces several major improvements over 2.x:
| Feature | Airflow 2.x | Airflow 3.x |
|---|---|---|
| Task Execution | Tightly coupled | Task Execution Interface (decoupled) |
| DAG Versioning | Not supported | Built-in DAG versioning |
| UI | Flask-based | Modernized React UI |
| API | REST API v1 | Enhanced REST API |
| Multi-tenancy | Limited | Improved isolation |
1.3 Installing Airflow 3.x
Method 1: Using pip (Development)
# Create a virtual environment
python3 -m venv airflow_venv
source airflow_venv/bin/activate
# Set Airflow home directory
export AIRFLOW_HOME=~/airflow
# Install Airflow 3.x with constraints
pip install "apache-airflow==3.0.0" \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-3.0.0/constraints-3.9.txt"
# Initialize the database
airflow db migrate
# Create an admin user
airflow users create \
--username admin \
--firstname Admin \
--lastname User \
--role Admin \
--email admin@example.com \
--password admin
Method 2: Using Docker Compose (Recommended)
# docker-compose.yaml
version: '3.8'
services:
airflow-webserver:
image: apache/airflow:3.0.0
command: webserver
ports:
- "8080:8080"
environment:
- AIRFLOW__CORE__EXECUTOR=LocalExecutor
- AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@postgres/airflow
volumes:
- ./dags:/opt/airflow/dags
depends_on:
- postgres
airflow-scheduler:
image: apache/airflow:3.0.0
command: scheduler
environment:
- AIRFLOW__CORE__EXECUTOR=LocalExecutor
- AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@postgres/airflow
volumes:
- ./dags:/opt/airflow/dags
depends_on:
- postgres
postgres:
image: postgres:15
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
volumes:
- postgres-data:/var/lib/postgresql/data
volumes:
postgres-data:
# Start the environment
docker compose up -d
# Check service status
docker compose ps
1.4 Navigating the Airflow UI
After starting Airflow, access the Web UI at http://localhost:8080.
Key UI Pages:
- DAGs List - Overview of all DAGs with status and toggle switches
- DAG Detail - Graph view, Grid view, and Code view of a specific DAG
- Task Instance - Logs, rendered template, and task details
- Admin - Connections, Variables, Pools, and XCom management
1.5 Essential CLI Commands
# List all DAGs
airflow dags list
# Trigger a DAG run
airflow dags trigger my_dag
# List tasks in a DAG
airflow tasks list my_dag
# Test a specific task
airflow tasks test my_dag my_task 2026-01-01
# Check Airflow version
airflow version
# View configuration
airflow config list
Practice Exercise
Exercise 1: Setup and Explore
- Install Airflow 3.x using either pip or Docker Compose
- Start the webserver and scheduler
- Log into the Web UI and explore the example DAGs
- Use the CLI to list available DAGs and check their status
# Start Airflow standalone (development only)
airflow standalone
# This starts webserver + scheduler + creates admin user
# Access at http://localhost:8080 (credentials shown in terminal)
Exercise 2: Explore the Example DAGs
# Enable example DAGs by setting this in airflow.cfg
# [core]
# load_examples = True
# List example DAGs
airflow dags list | head -20
# Check details of a specific example DAG
airflow dags show example_bash_operator
Summary
In this chapter, you learned:
- Airflow is a workflow orchestration platform using Python-defined DAGs
- Airflow 3.x introduces Task Execution Interface, DAG versioning, and improved UI
- How to install and configure Airflow 3.x for development
- How to navigate the Web UI and use essential CLI commands