15 tricky scenario questions that separate senior data engineers from the rest. No warm-up β every question is interview-grade.
Read each scenario carefully. Pick your answer before revealing the solution. In a real interview, you'd need to explain your reasoning β practice saying it out loud.
"Design a dimensional model for an e-commerce company. They need to track orders, returns, and customer lifetime value. Orders can have multiple items, promotions, and shipping methods."
"Design a model to track patient encounters. A patient can have multiple diagnoses per visit, a doctor can see multiple patients, and you need to report on average length of stay by diagnosis."
"Design a model for a SaaS company that needs to track MRR, churn, expansions, and contractions. Subscriptions have plan changes, upgrades, and cancellations over time."
If you can't state what one row represents in one sentence, stop and fix it before writing any DDL.
Never store avg_price or margin% in fact tables. Store components (revenue, cost, quantity) and compute at query time.
Balances, inventory counts, and headcounts can't be summed across time. Use AVG, LAST_VALUE, or filter to single period.
Many-to-many without a bridge = row explosion = wrong totals. Always check cardinality before joining.
Natural keys change (email, product SKU). Use surrogate keys in dimensions. Keep the natural key as a business key column.
"Marketing's customer count doesn't match Sales." Same entity, different definitions. One conformed dimension fixes this.
You've covered data modeling fundamentals, fact/dimension tables, star and snowflake schemas, SCDs, advanced patterns, and survived the gauntlet. You're ready to ace your data modeling interview.