Chapter 11: Security, RBAC, and Multi-Tenancy
Haiyue
11min
Chapter 11: Security, RBAC, and Multi-Tenancy
Learning Objectives
- Configure Role-Based Access Control (RBAC) in Airflow 3.x
- Manage users, roles, and permissions
- Secure connections and variables with secrets backends
- Implement multi-tenancy patterns for team isolation
Knowledge Points
11.1 Role-Based Access Control (RBAC)
Airflow 3.x has RBAC enabled by default with a modernized permission model.
Built-in Roles:
| Role | Description |
|---|---|
| Admin | Full access to all features |
| Op | Operational access (DAGs, connections, pools) |
| User | DAG-level access (view, trigger, edit) |
| Viewer | Read-only access to DAGs and logs |
| Public | No access (used for unauthenticated users) |
11.2 User Management
# Create users with different roles
airflow users create \
--username admin \
--firstname Admin \
--lastname User \
--role Admin \
--email admin@example.com \
--password 'secure_password'
airflow users create \
--username data_engineer \
--firstname Data \
--lastname Engineer \
--role Op \
--email engineer@example.com \
--password 'engineer_pass'
airflow users create \
--username analyst \
--firstname Data \
--lastname Analyst \
--role Viewer \
--email analyst@example.com \
--password 'analyst_pass'
# List all users
airflow users list
# Delete a user
airflow users delete --username old_user
# Export users
airflow users export users.json
# Import users
airflow users import users.json
11.3 Custom Roles and Permissions
"""
File: webserver_config.py (in AIRFLOW_HOME)
Configure custom roles with fine-grained permissions.
"""
# Custom role configuration
# In Airflow 3.x, manage roles through the Web UI or CLI
# Available permission types:
# - can_read : View/read access
# - can_edit : Modify/update access
# - can_create : Create new items
# - can_delete : Remove items
# Resource types include:
# - DAGs : Individual DAGs or all DAGs
# - Connections : Connection management
# - Variables : Variable management
# - Pools : Pool management
# - Config : Airflow configuration
# - Audit Log : Audit trail
# Create a custom role via CLI
airflow roles create "DataTeamLead"
# Add permissions to custom role
airflow roles add-perms "DataTeamLead" \
--resource "DAGs" --action "can_read" \
--resource "DAGs" --action "can_edit" \
--resource "Connections" --action "can_read" \
--resource "Task Instances" --action "can_read"
# List roles
airflow roles list
# Assign role to user
airflow users add-role --username data_engineer --role DataTeamLead
11.4 DAG-Level Access Control
"""
File: dags/restricted_dag.py
Restrict DAG access to specific roles.
"""
from airflow.decorators import dag, task
from datetime import datetime
@dag(
dag_id="finance_pipeline",
start_date=datetime(2026, 1, 1),
schedule="@daily",
catchup=False,
tags=["finance", "restricted"],
# Only users with 'finance_team' role can access this DAG
access_control={
"finance_team": {"can_read", "can_edit", "can_delete"},
"Admin": {"can_read", "can_edit", "can_delete"},
"Viewer": {"can_read"},
},
)
def finance_pipeline():
@task
def process_financial_data():
print("Processing sensitive financial data...")
return {"revenue": 1000000, "status": "processed"}
@task
def generate_financial_report(data: dict):
print(f"Generating financial report: revenue=${data['revenue']:,}")
data = process_financial_data()
generate_financial_report(data)
finance_pipeline()
11.5 Securing Connections with Secrets Backends
# airflow.cfg configuration for different secrets backends
# --- AWS Secrets Manager ---
# [secrets]
# backend = airflow.providers.amazon.aws.secrets.secrets_manager.SecretsManagerBackend
# backend_kwargs = {
# "connections_prefix": "airflow/connections",
# "variables_prefix": "airflow/variables",
# "config_prefix": "airflow/config"
# }
# --- HashiCorp Vault ---
# [secrets]
# backend = airflow.providers.hashicorp.secrets.vault.VaultBackend
# backend_kwargs = {
# "connections_path": "connections",
# "variables_path": "variables",
# "mount_point": "airflow",
# "url": "http://vault:8200"
# }
# --- GCP Secret Manager ---
# [secrets]
# backend = airflow.providers.google.cloud.secrets.secret_manager.CloudSecretManagerBackend
# backend_kwargs = {
# "connections_prefix": "airflow-connections",
# "variables_prefix": "airflow-variables",
# "gcp_project_id": "my-gcp-project"
# }
# Example: Store secrets in AWS Secrets Manager
aws secretsmanager create-secret \
--name "airflow/connections/production_db" \
--secret-string '{
"conn_type": "postgres",
"host": "prod-db.internal",
"port": 5432,
"login": "app_user",
"password": "super_secret_password",
"schema": "production"
}'
# Example: Store in HashiCorp Vault
vault kv put airflow/connections/production_db \
conn_type=postgres \
host=prod-db.internal \
port=5432 \
login=app_user \
password=super_secret_password \
schema=production
11.6 Fernet Key Encryption
Airflow uses Fernet encryption to protect sensitive data in the metadata database:
# Generate a Fernet key
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
# Set in airflow.cfg or environment
export AIRFLOW__CORE__FERNET_KEY='your_generated_fernet_key_here'
# Rotate Fernet keys (comma-separated, new key first)
export AIRFLOW__CORE__FERNET_KEY='new_key,old_key'
# After rotation, re-encrypt all connections
airflow connections export --file /tmp/connections.json
airflow connections import /tmp/connections.json
11.7 Multi-Tenancy Patterns
"""
File: dags/multi_tenant_setup.py
Implement team-based isolation patterns.
"""
from airflow.decorators import dag, task
from datetime import datetime
# Pattern 1: DAG ownership via tags and access_control
def create_team_dag(team_name: str, dag_id: str, schedule: str):
"""Factory that creates team-scoped DAGs."""
@dag(
dag_id=f"{team_name}_{dag_id}",
start_date=datetime(2026, 1, 1),
schedule=schedule,
catchup=False,
tags=[team_name, "team-managed"],
default_args={"owner": team_name},
access_control={
f"{team_name}_role": {"can_read", "can_edit"},
"Admin": {"can_read", "can_edit", "can_delete"},
},
)
def team_dag():
@task
def team_task():
print(f"Running task for team: {team_name}")
team_task()
return team_dag()
# Create DAGs for different teams
sales_etl = create_team_dag("sales", "daily_etl", "@daily")
marketing_report = create_team_dag("marketing", "weekly_report", "@weekly")
engineering_deploy = create_team_dag("engineering", "deploy_check", "0 9 * * 1-5")
"""
File: dags/pool_isolation.py
Use pools for resource isolation between teams.
"""
from airflow.decorators import dag, task
from datetime import datetime
# Pools limit concurrent task execution per resource group
# Create pools via CLI or UI:
# airflow pools set sales_pool 5 "Pool for sales team tasks"
# airflow pools set ml_pool 3 "Pool for ML team GPU tasks"
@dag(
dag_id="pool_isolation_demo",
start_date=datetime(2026, 1, 1),
schedule="@daily",
catchup=False,
tags=["pools", "isolation"],
)
def pool_isolation_demo():
@task(pool="sales_pool")
def sales_task_1():
"""Limited by sales_pool concurrency."""
print("Sales task running (uses sales_pool slot)")
@task(pool="sales_pool")
def sales_task_2():
print("Another sales task (uses sales_pool slot)")
@task(pool="ml_pool", pool_slots=2)
def ml_training_task():
"""Uses 2 slots from ml_pool (GPU-intensive)."""
print("ML training task (uses 2 ml_pool slots)")
@task(pool="default_pool")
def general_task():
print("General task in default pool")
sales_task_1()
sales_task_2()
ml_training_task()
general_task()
pool_isolation_demo()
11.8 Authentication Backends
"""
File: webserver_config.py
Configure authentication methods.
"""
# LDAP Authentication
# AUTH_TYPE = AUTH_LDAP
# AUTH_LDAP_SERVER = "ldap://ldap.example.com"
# AUTH_LDAP_USE_TLS = True
# AUTH_LDAP_SEARCH = "ou=people,dc=example,dc=com"
# AUTH_LDAP_UID_FIELD = "uid"
# OAuth Authentication (e.g., Google)
# AUTH_TYPE = AUTH_OAUTH
# OAUTH_PROVIDERS = [{
# "name": "google",
# "token_key": "access_token",
# "icon": "fa-google",
# "remote_app": {
# "client_id": "YOUR_CLIENT_ID",
# "client_secret": "YOUR_CLIENT_SECRET",
# "api_base_url": "https://www.googleapis.com/oauth2/v2/",
# "client_kwargs": {"scope": "email profile"},
# "access_token_url": "https://accounts.google.com/o/oauth2/token",
# "authorize_url": "https://accounts.google.com/o/oauth2/auth",
# },
# }]
Practice Exercise
Exercise: Set Up a Multi-Team Environment
#!/bin/bash
# setup_multi_tenant.sh
# Set up a multi-tenant Airflow environment
# Create teams and roles
airflow roles create "data_engineering"
airflow roles create "data_science"
airflow roles create "analytics"
# Create pools for resource isolation
airflow pools set data_eng_pool 10 "Data engineering tasks"
airflow pools set data_sci_pool 5 "Data science tasks (GPU)"
airflow pools set analytics_pool 8 "Analytics and reporting"
# Create users for each team
airflow users create \
--username de_lead \
--firstname Data \
--lastname "Eng Lead" \
--role data_engineering \
--email de_lead@example.com \
--password 'de_password'
airflow users create \
--username ds_lead \
--firstname Data \
--lastname "Sci Lead" \
--role data_science \
--email ds_lead@example.com \
--password 'ds_password'
airflow users create \
--username analyst1 \
--firstname Business \
--lastname Analyst \
--role analytics \
--email analyst1@example.com \
--password 'analyst_password'
echo "Multi-tenant setup complete!"
airflow users list
airflow roles list
airflow pools list
Summary
In this chapter, you learned:
- Airflow 3.x has built-in RBAC with Admin, Op, User, Viewer, and Public roles
- Custom roles enable fine-grained permission control at the DAG level
- Secrets backends (AWS, Vault, GCP) securely store credentials outside Airflow
- Fernet encryption protects sensitive data in the metadata database
- Multi-tenancy is achieved through access control, pools, and team-scoped DAGs