Tricky scenarios, debugging exercises, and rapid-fire questions. Click "Reveal Answer" to check your thinking.
A stakeholder says "the revenue dashboard hasn't updated since yesterday, but nobody got an alert." What happened and how do you prevent it?
Your orders table suddenly has 2x the expected rows. All row-level checks pass. What's your investigation plan?
A data analyst says the warehouse total revenue differs from the billing system by $50K. How do you debug this?
Every Monday, your daily active users metric jumps 30% then drops Tuesday. The pipeline hasn't changed. What's causing this?
Your staging table has 1M rows, but after your dbt model runs, the mart has only 800K. No errors in logs. What happened?
After deploying a new version of your API, a column that was never NULL suddenly has 15% NULLs. No schema change detected. Why?
Revenue increased 200% in the warehouse but the source system shows no change. All pipeline checks pass. What's wrong?
An upstream team renames a column in their API response without telling you. Your pipeline doesn't fail — it just fills the old column with NULLs. How do you prevent this?
Your team has 200 data quality alerts. 80% are false positives. Engineers ignore them all. How do you fix this?
Your mobile app batches events and sends them when the user regains internet. Some events are 3 days late. How do you handle this without re-running 3 days of pipeline history?
Your quarantine table grows to 10% of total data. Stakeholders ask "is this normal?" What do you do?
Click each question to reveal the answer. Aim for 30 seconds per answer.
Your golden datasets must include NULLs, empty strings, duplicates, max-length values, and timezone edges. Production data is messy — test for mess.
A table with zero rows passes all NOT NULL checks. Always pair row-level rules with volume and freshness checks.
An alert that fires at 3 AM with no instructions on what to do is worse than no alert. Every alert needs an owner and a step-by-step runbook.
SELECT * breaks when columns are added or reordered. Always enumerate columns explicitly in production queries and use contract-enforced schemas.
Even trusted APIs change behavior. A field that was "always populated" can become optional in a new API version. Validate at ingestion, not just at transformation.
If nobody reviews quarantined data, you're just hiding problems. Set a weekly quarantine review cadence and track quarantine rate as a metric.
You've completed the course! Go ace that interview. 🎉