A bridge table resolves many-to-many relationships that star schemas can't handle natively.
Without one, joining a fact to a multi-valued dimension explodes row counts and breaks aggregations.
dim_patient
bridge_patient_diagnosis
dim_diagnosis
โ FK โ
fact_encounter
bridge_patient_diagnosis
patient_key
diagnosis_key
weight_factor
101
D-10
0.50
101
D-22
0.50
202
D-10
1.00
๐ก Interview Gold
"Bridge tables include a weight factor that sums to 1.0 per group. Multiply measures by the weight before aggregating to prevent double-counting. Revenue for patient 101 is split 50/50 across two diagnoses."
๐ญ Role-Playing Dimensions
A role-playing dimension is one dimension table used multiple times in the same fact, each time playing a different role.
๐ dim_date as 3 roles
fact_orders has order_date_key, ship_date_key, and delivery_date_key โ all pointing to the SAME dim_date table with different aliases.
๐ค dim_employee as 2 roles
fact_support_tickets has assigned_agent_key and resolved_by_key โ both point to dim_employee but represent different people.
-- Role-playing dim_date in a querySELECT
od.calendar_month AS order_month,
sd.calendar_month AS ship_month,
SUM(f.revenue) AS total_revenue
FROM fact_orders f
JOIN dim_date od ON f.order_date_key = od.date_key
JOIN dim_date sd ON f.ship_date_key = sd.date_key
GROUP BY od.calendar_month, sd.calendar_month;
๐ฏ Interview Tip
"BI tools often create views for each role (v_order_date, v_ship_date) that alias the same underlying dim_date. This makes self-service easier โ analysts pick the right date without knowing it's the same table."
๐๏ธ Junk & Outrigger Dimensions
๐๏ธ Junk Dimensions
๐ Combines miscellaneous low-cardinality flags into one table
๐ Avoids cluttering the fact table with dozens of flag columns
๐ A dimension joined to another dimension, not the fact
๐ Example: dim_customer โ dim_geography (customer's region details)
๐ Use sparingly โ it normalizes the star, adding joins
๐ Only justified when the sub-dimension is very large or shared
๐ Also called a "snowflake arm"
dim_order_flags (junk dimension)
flag_key
is_gift
is_rush
payment_method
1
N
N
Credit Card
2
N
Y
Credit Card
3
Y
N
PayPal
4
Y
Y
PayPal
๐ก Interview Gold
"Junk dimensions reduce fact table width by consolidating flags. If you have 5 binary flags, that's 32 possible combinations โ a tiny dimension that replaces 5 columns in the fact table with one FK."
๐ซ Preventing Double-Counting
Double-counting is the #1 data quality issue in poorly designed models. Here's your defense toolkit:
๐ Enforce Correct Grain
If you join two tables and get more rows than expected, your grain is wrong. Always test: does COUNT(*) match expectations after each join?
๐ Use Many-to-One Joins
Fact โ dimension should always be many-to-one. If it's many-to-many, you need a bridge table with weight factors.
๐ Define Metric Rules
SUM revenue, COUNT DISTINCT users, LAST_VALUE for balances. Document which aggregation function is valid for each metric.
๐งช Reconciliation Tests
Compare aggregated totals against source systems. Check row counts at each layer. Use dbt tests or Great Expectations for automation.
-- Detect fan-out: if this returns rows, you have a problemSELECT f.order_line_id, COUNT(*) AS cnt
FROM fact_order_lines f
JOIN dim_product p ON f.product_key = p.product_key
GROUP BY f.order_line_id
HAVINGCOUNT(*) > 1;
-- Reconciliation: compare warehouse total to sourceSELECT
(SELECTSUM(revenue) FROM fact_sales) AS warehouse_total,
(SELECTSUM(amount) FROM source_orders) AS source_total;
โ Validating a Data Model
After any model change, run these checks. Interviewers often ask: "How do you know your model is correct?"
1. Key Uniqueness
Every surrogate key must be unique. Every natural key + effective_date combo must be unique in SCD2 tables. Test with COUNT vs COUNT DISTINCT.
2. Referential Integrity
Every FK in the fact table should match a PK in its dimension. Orphaned FKs indicate load failures or late-arriving dimensions.
3. Row Count Reconciliation
Source โ staging โ warehouse row counts should be explainable. Document expected ratios (e.g., orders : order_lines is 1:N).
4. Metric Regression
Compare key business metrics before/after model changes. Total revenue, customer count, and order count should not shift unexpectedly.
๐ฏ Interview Tip
"I validate models with four layers: key uniqueness, referential integrity, row-count reconciliation, and metric regression. I automate these as dbt tests or CI checks so they run on every model change."
Quiz: Test Yourself
Q1: Products belong to multiple categories. Joining fact_sales to dim_category directly causes row explosion. What's the fix?
Q2: fact_orders has order_date_key, ship_date_key, delivery_date_key โ all pointing to dim_date. What is this pattern called?
Q3: Your fact table has 8 boolean flag columns (is_gift, is_rush, etc.). What pattern reduces fact table width?
Q4: After joining fact_sales to dim_product, total revenue increased by 30%. What happened?
Q5: How do you detect that a dimension load failed after a fact load completed?