Module 5: Advanced Modeling Patterns | Data Modeling Interview Mastery

🌉 Bridge Tables

A bridge table resolves many-to-many relationships that star schemas can't handle natively.

Without one, joining a fact to a multi-valued dimension explodes row counts and breaks aggregations.

dim_patient

bridge_patient_diagnosis

dim_diagnosis

↕ FK ↕

fact_encounter

bridge_patient_diagnosis
patient_key	diagnosis_key	weight_factor
101	D-10	0.50
101	D-22	0.50
202	D-10	1.00

💡 Interview Gold

"Bridge tables include a weight factor that sums to 1.0 per group. Multiply measures by the weight before aggregating to prevent double-counting. Revenue for patient 101 is split 50/50 across two diagnoses."

🎭 Role-Playing Dimensions

A role-playing dimension is one dimension table used multiple times in the same fact, each time playing a different role.

📅 dim_date as 3 roles

fact_orders has order_date_key, ship_date_key, and delivery_date_key — all pointing to the SAME dim_date table with different aliases.

👤 dim_employee as 2 roles

fact_support_tickets has assigned_agent_key and resolved_by_key — both point to dim_employee but represent different people.

-- Role-playing dim_date in a query
SELECT
  od.calendar_month AS order_month,
  sd.calendar_month AS ship_month,
  SUM(f.revenue) AS total_revenue
FROM fact_orders f
JOIN dim_date od ON f.order_date_key = od.date_key
JOIN dim_date sd ON f.ship_date_key = sd.date_key
GROUP BY od.calendar_month, sd.calendar_month;

🎯 Interview Tip

"BI tools often create views for each role (v_order_date, v_ship_date) that alias the same underlying dim_date. This makes self-service easier — analysts pick the right date without knowing it's the same table."

🗑️ Junk & Outrigger Dimensions

🗑️ Junk Dimensions

📌 Combines miscellaneous low-cardinality flags into one table
📌 Avoids cluttering the fact table with dozens of flag columns
📌 Example: is_gift, is_rush, payment_method, shipping_tier
📌 Pre-populated with all valid combinations
📌 Keeps the fact table lean and joinable

🔗 Outrigger Dimensions

📌 A dimension joined to another dimension, not the fact
📌 Example: dim_customer → dim_geography (customer's region details)
📌 Use sparingly — it normalizes the star, adding joins
📌 Only justified when the sub-dimension is very large or shared
📌 Also called a "snowflake arm"

dim_order_flags (junk dimension)
flag_key	is_gift	is_rush	payment_method
1	N	N	Credit Card
2	N	Y	Credit Card
3	Y	N	PayPal
4	Y	Y	PayPal

💡 Interview Gold

"Junk dimensions reduce fact table width by consolidating flags. If you have 5 binary flags, that's 32 possible combinations — a tiny dimension that replaces 5 columns in the fact table with one FK."

🚫 Preventing Double-Counting

Double-counting is the #1 data quality issue in poorly designed models. Here's your defense toolkit:

📐 Enforce Correct Grain

If you join two tables and get more rows than expected, your grain is wrong. Always test: does COUNT(*) match expectations after each join?

🔑 Use Many-to-One Joins

Fact → dimension should always be many-to-one. If it's many-to-many, you need a bridge table with weight factors.

📊 Define Metric Rules

SUM revenue, COUNT DISTINCT users, LAST_VALUE for balances. Document which aggregation function is valid for each metric.

🧪 Reconciliation Tests

Compare aggregated totals against source systems. Check row counts at each layer. Use dbt tests or Great Expectations for automation.

-- Detect fan-out: if this returns rows, you have a problem
SELECT f.order_line_id, COUNT(*) AS cnt
FROM fact_order_lines f
JOIN dim_product p ON f.product_key = p.product_key
GROUP BY f.order_line_id
HAVING COUNT(*) > 1;

-- Reconciliation: compare warehouse total to source
SELECT
  (SELECT SUM(revenue) FROM fact_sales) AS warehouse_total,
  (SELECT SUM(amount) FROM source_orders) AS source_total;

✅ Validating a Data Model

After any model change, run these checks. Interviewers often ask: "How do you know your model is correct?"

1. Key Uniqueness

Every surrogate key must be unique. Every natural key + effective_date combo must be unique in SCD2 tables. Test with COUNT vs COUNT DISTINCT.

2. Referential Integrity

Every FK in the fact table should match a PK in its dimension. Orphaned FKs indicate load failures or late-arriving dimensions.

3. Row Count Reconciliation

Source → staging → warehouse row counts should be explainable. Document expected ratios (e.g., orders : order_lines is 1:N).

4. Metric Regression

Compare key business metrics before/after model changes. Total revenue, customer count, and order count should not shift unexpectedly.

🎯 Interview Tip

"I validate models with four layers: key uniqueness, referential integrity, row-count reconciliation, and metric regression. I automate these as dbt tests or CI checks so they run on every model change."

🌉 Bridge Tables

🎭 Role-Playing Dimensions

📅 dim_date as 3 roles

👤 dim_employee as 2 roles

🗑️ Junk & Outrigger Dimensions

🗑️ Junk Dimensions

🔗 Outrigger Dimensions

🚫 Preventing Double-Counting

📐 Enforce Correct Grain

🔑 Use Many-to-One Joins

📊 Define Metric Rules

🧪 Reconciliation Tests

✅ Validating a Data Model

1. Key Uniqueness

2. Referential Integrity

3. Row Count Reconciliation

4. Metric Regression

Quiz: Test Yourself

Q1: Products belong to multiple categories. Joining fact_sales to dim_category directly causes row explosion. What's the fix?

Q2: fact_orders has order_date_key, ship_date_key, delivery_date_key — all pointing to dim_date. What is this pattern called?

Q3: Your fact table has 8 boolean flag columns (is_gift, is_rush, etc.). What pattern reduces fact table width?

Q4: After joining fact_sales to dim_product, total revenue increased by 30%. What happened?

Q5: How do you detect that a dimension load failed after a fact load completed?