Chapter 11: Security, RBAC, and Multi-Tenancy

Haiyue
11min

Chapter 11: Security, RBAC, and Multi-Tenancy

Learning Objectives
  • Configure Role-Based Access Control (RBAC) in Airflow 3.x
  • Manage users, roles, and permissions
  • Secure connections and variables with secrets backends
  • Implement multi-tenancy patterns for team isolation

Knowledge Points

11.1 Role-Based Access Control (RBAC)

Airflow 3.x has RBAC enabled by default with a modernized permission model.

Built-in Roles:

RoleDescription
AdminFull access to all features
OpOperational access (DAGs, connections, pools)
UserDAG-level access (view, trigger, edit)
ViewerRead-only access to DAGs and logs
PublicNo access (used for unauthenticated users)

11.2 User Management

# Create users with different roles
airflow users create \
    --username admin \
    --firstname Admin \
    --lastname User \
    --role Admin \
    --email admin@example.com \
    --password 'secure_password'

airflow users create \
    --username data_engineer \
    --firstname Data \
    --lastname Engineer \
    --role Op \
    --email engineer@example.com \
    --password 'engineer_pass'

airflow users create \
    --username analyst \
    --firstname Data \
    --lastname Analyst \
    --role Viewer \
    --email analyst@example.com \
    --password 'analyst_pass'

# List all users
airflow users list

# Delete a user
airflow users delete --username old_user

# Export users
airflow users export users.json

# Import users
airflow users import users.json

11.3 Custom Roles and Permissions

"""
File: webserver_config.py (in AIRFLOW_HOME)
Configure custom roles with fine-grained permissions.
"""
# Custom role configuration
# In Airflow 3.x, manage roles through the Web UI or CLI

# Available permission types:
# - can_read       : View/read access
# - can_edit       : Modify/update access
# - can_create     : Create new items
# - can_delete     : Remove items

# Resource types include:
# - DAGs           : Individual DAGs or all DAGs
# - Connections    : Connection management
# - Variables      : Variable management
# - Pools          : Pool management
# - Config         : Airflow configuration
# - Audit Log      : Audit trail
# Create a custom role via CLI
airflow roles create "DataTeamLead"

# Add permissions to custom role
airflow roles add-perms "DataTeamLead" \
    --resource "DAGs" --action "can_read" \
    --resource "DAGs" --action "can_edit" \
    --resource "Connections" --action "can_read" \
    --resource "Task Instances" --action "can_read"

# List roles
airflow roles list

# Assign role to user
airflow users add-role --username data_engineer --role DataTeamLead

11.4 DAG-Level Access Control

"""
File: dags/restricted_dag.py
Restrict DAG access to specific roles.
"""
from airflow.decorators import dag, task
from datetime import datetime

@dag(
    dag_id="finance_pipeline",
    start_date=datetime(2026, 1, 1),
    schedule="@daily",
    catchup=False,
    tags=["finance", "restricted"],
    # Only users with 'finance_team' role can access this DAG
    access_control={
        "finance_team": {"can_read", "can_edit", "can_delete"},
        "Admin": {"can_read", "can_edit", "can_delete"},
        "Viewer": {"can_read"},
    },
)
def finance_pipeline():

    @task
    def process_financial_data():
        print("Processing sensitive financial data...")
        return {"revenue": 1000000, "status": "processed"}

    @task
    def generate_financial_report(data: dict):
        print(f"Generating financial report: revenue=${data['revenue']:,}")

    data = process_financial_data()
    generate_financial_report(data)

finance_pipeline()

11.5 Securing Connections with Secrets Backends

# airflow.cfg configuration for different secrets backends

# --- AWS Secrets Manager ---
# [secrets]
# backend = airflow.providers.amazon.aws.secrets.secrets_manager.SecretsManagerBackend
# backend_kwargs = {
#     "connections_prefix": "airflow/connections",
#     "variables_prefix": "airflow/variables",
#     "config_prefix": "airflow/config"
# }

# --- HashiCorp Vault ---
# [secrets]
# backend = airflow.providers.hashicorp.secrets.vault.VaultBackend
# backend_kwargs = {
#     "connections_path": "connections",
#     "variables_path": "variables",
#     "mount_point": "airflow",
#     "url": "http://vault:8200"
# }

# --- GCP Secret Manager ---
# [secrets]
# backend = airflow.providers.google.cloud.secrets.secret_manager.CloudSecretManagerBackend
# backend_kwargs = {
#     "connections_prefix": "airflow-connections",
#     "variables_prefix": "airflow-variables",
#     "gcp_project_id": "my-gcp-project"
# }
# Example: Store secrets in AWS Secrets Manager
aws secretsmanager create-secret \
    --name "airflow/connections/production_db" \
    --secret-string '{
        "conn_type": "postgres",
        "host": "prod-db.internal",
        "port": 5432,
        "login": "app_user",
        "password": "super_secret_password",
        "schema": "production"
    }'

# Example: Store in HashiCorp Vault
vault kv put airflow/connections/production_db \
    conn_type=postgres \
    host=prod-db.internal \
    port=5432 \
    login=app_user \
    password=super_secret_password \
    schema=production

11.6 Fernet Key Encryption

Airflow uses Fernet encryption to protect sensitive data in the metadata database:

# Generate a Fernet key
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"

# Set in airflow.cfg or environment
export AIRFLOW__CORE__FERNET_KEY='your_generated_fernet_key_here'

# Rotate Fernet keys (comma-separated, new key first)
export AIRFLOW__CORE__FERNET_KEY='new_key,old_key'

# After rotation, re-encrypt all connections
airflow connections export --file /tmp/connections.json
airflow connections import /tmp/connections.json

11.7 Multi-Tenancy Patterns

"""
File: dags/multi_tenant_setup.py
Implement team-based isolation patterns.
"""
from airflow.decorators import dag, task
from datetime import datetime


# Pattern 1: DAG ownership via tags and access_control
def create_team_dag(team_name: str, dag_id: str, schedule: str):
    """Factory that creates team-scoped DAGs."""

    @dag(
        dag_id=f"{team_name}_{dag_id}",
        start_date=datetime(2026, 1, 1),
        schedule=schedule,
        catchup=False,
        tags=[team_name, "team-managed"],
        default_args={"owner": team_name},
        access_control={
            f"{team_name}_role": {"can_read", "can_edit"},
            "Admin": {"can_read", "can_edit", "can_delete"},
        },
    )
    def team_dag():
        @task
        def team_task():
            print(f"Running task for team: {team_name}")

        team_task()

    return team_dag()


# Create DAGs for different teams
sales_etl = create_team_dag("sales", "daily_etl", "@daily")
marketing_report = create_team_dag("marketing", "weekly_report", "@weekly")
engineering_deploy = create_team_dag("engineering", "deploy_check", "0 9 * * 1-5")
"""
File: dags/pool_isolation.py
Use pools for resource isolation between teams.
"""
from airflow.decorators import dag, task
from datetime import datetime

# Pools limit concurrent task execution per resource group
# Create pools via CLI or UI:
# airflow pools set sales_pool 5 "Pool for sales team tasks"
# airflow pools set ml_pool 3 "Pool for ML team GPU tasks"

@dag(
    dag_id="pool_isolation_demo",
    start_date=datetime(2026, 1, 1),
    schedule="@daily",
    catchup=False,
    tags=["pools", "isolation"],
)
def pool_isolation_demo():

    @task(pool="sales_pool")
    def sales_task_1():
        """Limited by sales_pool concurrency."""
        print("Sales task running (uses sales_pool slot)")

    @task(pool="sales_pool")
    def sales_task_2():
        print("Another sales task (uses sales_pool slot)")

    @task(pool="ml_pool", pool_slots=2)
    def ml_training_task():
        """Uses 2 slots from ml_pool (GPU-intensive)."""
        print("ML training task (uses 2 ml_pool slots)")

    @task(pool="default_pool")
    def general_task():
        print("General task in default pool")

    sales_task_1()
    sales_task_2()
    ml_training_task()
    general_task()

pool_isolation_demo()

11.8 Authentication Backends

"""
File: webserver_config.py
Configure authentication methods.
"""
# LDAP Authentication
# AUTH_TYPE = AUTH_LDAP
# AUTH_LDAP_SERVER = "ldap://ldap.example.com"
# AUTH_LDAP_USE_TLS = True
# AUTH_LDAP_SEARCH = "ou=people,dc=example,dc=com"
# AUTH_LDAP_UID_FIELD = "uid"

# OAuth Authentication (e.g., Google)
# AUTH_TYPE = AUTH_OAUTH
# OAUTH_PROVIDERS = [{
#     "name": "google",
#     "token_key": "access_token",
#     "icon": "fa-google",
#     "remote_app": {
#         "client_id": "YOUR_CLIENT_ID",
#         "client_secret": "YOUR_CLIENT_SECRET",
#         "api_base_url": "https://www.googleapis.com/oauth2/v2/",
#         "client_kwargs": {"scope": "email profile"},
#         "access_token_url": "https://accounts.google.com/o/oauth2/token",
#         "authorize_url": "https://accounts.google.com/o/oauth2/auth",
#     },
# }]

Practice Exercise

Exercise: Set Up a Multi-Team Environment

#!/bin/bash
# setup_multi_tenant.sh
# Set up a multi-tenant Airflow environment

# Create teams and roles
airflow roles create "data_engineering"
airflow roles create "data_science"
airflow roles create "analytics"

# Create pools for resource isolation
airflow pools set data_eng_pool 10 "Data engineering tasks"
airflow pools set data_sci_pool 5 "Data science tasks (GPU)"
airflow pools set analytics_pool 8 "Analytics and reporting"

# Create users for each team
airflow users create \
    --username de_lead \
    --firstname Data \
    --lastname "Eng Lead" \
    --role data_engineering \
    --email de_lead@example.com \
    --password 'de_password'

airflow users create \
    --username ds_lead \
    --firstname Data \
    --lastname "Sci Lead" \
    --role data_science \
    --email ds_lead@example.com \
    --password 'ds_password'

airflow users create \
    --username analyst1 \
    --firstname Business \
    --lastname Analyst \
    --role analytics \
    --email analyst1@example.com \
    --password 'analyst_password'

echo "Multi-tenant setup complete!"
airflow users list
airflow roles list
airflow pools list

Summary

In this chapter, you learned:

  • Airflow 3.x has built-in RBAC with Admin, Op, User, Viewer, and Public roles
  • Custom roles enable fine-grained permission control at the DAG level
  • Secrets backends (AWS, Vault, GCP) securely store credentials outside Airflow
  • Fernet encryption protects sensitive data in the metadata database
  • Multi-tenancy is achieved through access control, pools, and team-scoped DAGs