⚙️ Data Platform Engineer Guide

Deploy complete Databricks infrastructure in 15 minutes

🎯 What You'll Accomplish

Set up a complete Databricks learning environment where students get instant access to course content, new content automatically deploys, and everything is reproducible.

👥

8 Users
Role-based permissions configured automatically

🗄️

5 Catalogs
Complete Unity Catalog structure

📚

24 Schemas
Medallion architecture (bronze, silver, gold)

📓

21 Notebooks
Deployed to shared workspace

🚀 Quick Start (15 minutes)

Step 1 Get the Infrastructure Code

# Clone the repository git clone https://github.com/datatribe-collective-labs/databricks-infra && cd databricks-infra

Step 2 Configure Authentication

Generate Personal Access Token (PAT):

  1. Log into your Databricks workspace as admin
  2. Navigate to: Settings → Developer → Access Tokens
  3. Click "Generate New Token" (90 days recommended)
  4. Copy and save the token securely

Configure Databricks CLI:

# Interactive setup databricks configure --token --profile <profile_name> # Enter your workspace URL and PAT when prompted

Step 3 Customize for Your Organization

Edit terraform/users.json with your team:

{ "users": [ { "user_name": "instructor@yourcompany.com", "display_name": "Lead Instructor", "groups": ["admins"] }, { "user_name": "student@yourcompany.com", "display_name": "Student Name", "groups": ["students"] } ] }

Step 4 Create Catalogs

In Databricks UI, create these catalogs (use default storage):

  • sales_dev
  • sales_prod
  • marketing_dev
  • marketing_prod

Step 5 Deploy Infrastructure

cd terraform # Initialize Terraform terraform init # Import the manually created catalogs terraform import 'databricks_catalog.custom_catalogs["sales_dev"]' sales_dev # Review what will be created terraform plan # Deploy everything terraform apply

✅ What Gets Created

👥 Users & Groups

  • User accounts for all students
  • Groups with appropriate permissions
  • RBAC configured automatically

🗄️ Catalogs & Schemas

  • Reference catalogs (sales, marketing)
  • Course catalog for student work
  • Medallion architecture schemas

📚 Course Content

  • 19 notebooks deployed
  • Sample datasets in shared location
  • Utility notebooks for user isolation

🔒 Permissions

  • User-specific data isolation
  • Read-only access to shared catalogs
  • Write access to personal schemas

👥 Managing Users

Adding New Students

Add entry to terraform/users.json:

{ "user_name": "newstudent@yourcompany.com", "display_name": "New Student", "groups": ["students"] }

Then run:

cd terraform terraform apply

What Happens Automatically:

  • User account created in Databricks
  • Added to workspace group (platform_students)
  • Personal schema created: databricks_course.newstudent
  • Permissions configured based on group membership
  • Data isolation: all table writes go to isolated schema

Removing Users

⚠️ Warning: User account, personal schema, and ALL data will be permanently deleted.

To preserve data before removal:

# Transfer schema ownership first ALTER SCHEMA databricks_course.student_name OWNER TO `admin@yourcompany.com`;

Then remove user from terraform/users.json and apply.

🔄 Getting Course Updates

When new course content is released, updating is automatic:

# 1. Pull latest changes git pull origin main # 2. Review what's new (optional) git log --oneline --since="1 week ago" course/notebooks/ # 3. Deploy updates to student workspace cd terraform terraform plan # Review new content terraform apply # Deploy to students # Users automatically see new notebooks in shared workspace!

🚨 Common Issues

"Only accessible by admins" Error

Root Cause: PAT lacks workspace admin privileges

Solution:

  1. Verify you're in admins group in Databricks UI
  2. Regenerate PAT after confirming admin access
  3. Update ~/.databrickscfg with new token
  4. Retry Terraform

Catalog Creation Errors

Solution: Create catalogs manually in UI first, then import to Terraform

terraform import 'databricks_catalog.custom_catalogs["catalog_name"]' catalog_name

Students Can't Access Notebooks

# Check permissions databricks workspace get-status /Shared/terraform-managed/course/notebooks # Verify user exists databricks users list | grep student@company.com

Ready to Deploy?

View the complete technical documentation for advanced configuration

View Technical Docs →