⚙️ Data Platform Engineer Guide
Deploy complete Databricks infrastructure in 15 minutes
🎯 What You'll Accomplish
Set up a complete Databricks learning environment where students get instant access to course content, new content automatically deploys, and everything is reproducible.
👥
8 Users
Role-based permissions configured automatically
🗄️
5 Catalogs
Complete Unity Catalog structure
📚
24 Schemas
Medallion architecture (bronze, silver, gold)
📓
21 Notebooks
Deployed to shared workspace
🚀 Quick Start (15 minutes)
Step 1 Get the Infrastructure Code
git clone https://github.com/datatribe-collective-labs/databricks-infra
&& cd databricks-infra
Step 2 Configure Authentication
Generate Personal Access Token (PAT):
- Log into your Databricks workspace as admin
- Navigate to: Settings → Developer → Access Tokens
- Click "Generate New Token" (90 days recommended)
- Copy and save the token securely
Configure Databricks CLI:
databricks configure --token --profile <profile_name>
Step 3 Customize for Your Organization
Edit terraform/users.json with your team:
{
"users": [
{
"user_name": "instructor@yourcompany.com",
"display_name": "Lead Instructor",
"groups": ["admins"]
},
{
"user_name": "student@yourcompany.com",
"display_name": "Student Name",
"groups": ["students"]
}
]
}
Step 4 Create Catalogs
In Databricks UI, create these catalogs (use default storage):
sales_dev
sales_prod
marketing_dev
marketing_prod
Step 5 Deploy Infrastructure
cd terraform
terraform init
terraform import 'databricks_catalog.custom_catalogs["sales_dev"]' sales_dev
terraform plan
terraform apply
✅ What Gets Created
👥 Users & Groups
- User accounts for all students
- Groups with appropriate permissions
- RBAC configured automatically
🗄️ Catalogs & Schemas
- Reference catalogs (sales, marketing)
- Course catalog for student work
- Medallion architecture schemas
📚 Course Content
- 19 notebooks deployed
- Sample datasets in shared location
- Utility notebooks for user isolation
🔒 Permissions
- User-specific data isolation
- Read-only access to shared catalogs
- Write access to personal schemas
👥 Managing Users
Adding New Students
Add entry to terraform/users.json:
{
"user_name": "newstudent@yourcompany.com",
"display_name": "New Student",
"groups": ["students"]
}
Then run:
cd terraform
terraform apply
What Happens Automatically:
- User account created in Databricks
- Added to workspace group (platform_students)
- Personal schema created:
databricks_course.newstudent
- Permissions configured based on group membership
- Data isolation: all table writes go to isolated schema
Removing Users
⚠️ Warning: User account, personal schema, and ALL data will be permanently deleted.
To preserve data before removal:
ALTER SCHEMA databricks_course.student_name
OWNER TO `admin@yourcompany.com`;
Then remove user from terraform/users.json and apply.
🔄 Getting Course Updates
When new course content is released, updating is automatic:
git pull origin main
git log --oneline --since="1 week ago" course/notebooks/
cd terraform
terraform plan
terraform apply
🚨 Common Issues
"Only accessible by admins" Error
Root Cause: PAT lacks workspace admin privileges
Solution:
- Verify you're in admins group in Databricks UI
- Regenerate PAT after confirming admin access
- Update
~/.databrickscfg with new token
- Retry Terraform
Catalog Creation Errors
Solution: Create catalogs manually in UI first, then import to Terraform
terraform import 'databricks_catalog.custom_catalogs["catalog_name"]' catalog_name
Students Can't Access Notebooks
databricks workspace get-status /Shared/terraform-managed/course/notebooks
databricks users list | grep student@company.com
Ready to Deploy?
View the complete technical documentation for advanced configuration
View Technical Docs →