⭐ The Star Schema

The star schema is the most widely used dimensional modeling pattern. A central fact table connects to multiple dimension tables in a hub-and-spoke layout.

dim_customer
dim_product
dim_date
↕ many-to-one joins ↕
fact_sales
↕ many-to-one joins ↕
dim_store
dim_promotion
💡 Interview Gold

"Star schemas optimize for query simplicity and read performance. All joins are fact-to-dimension (many-to-one), which BI tools understand natively. Denormalized dimensions mean fewer joins per query."

❄️ The Snowflake Schema

A snowflake schema normalizes dimensions into sub-tables. Instead of one flat dim_product, you split into dim_product → dim_category → dim_department.

dim_city
dim_state
dim_customer
dim_product
dim_date
fact_sales
dim_category
dim_brand

⭐ Star Schema

  • ✅ Fewer joins — faster queries
  • ✅ BI tools prefer it (auto-aggregation)
  • ✅ Simpler for analysts to understand
  • ⚠️ Dimension redundancy (city name repeated)
  • ⚠️ Larger dimension tables

❄️ Snowflake Schema

  • ✅ Less dimension redundancy
  • ✅ Easier to maintain hierarchies
  • ✅ Smaller dimension storage
  • ⚠️ More joins — slower query patterns
  • ⚠️ Harder for analysts to navigate

Conformed Dimensions

A conformed dimension is the same dimension used consistently across multiple fact tables. This is what makes enterprise-wide analytics possible.

📅 dim_date

Shared across fact_sales, fact_support_tickets, fact_web_visits. Same date_key, same fiscal calendar attributes everywhere.

👤 dim_customer

Used by orders, support, campaigns, and billing facts. One definition of "customer" prevents conflicting counts across teams.

📦 dim_product

Shared by sales, inventory, returns. Conformed attributes like category, brand, and department enable cross-fact analysis.

🎯 Interview Tip

"Without conformed dimensions, you can't drill across fact tables. Marketing says '10K customers,' Sales says '12K.' The difference? Different definitions of 'customer.' Conformed dimensions fix this."

Data Vault as a Contrast

Interviewers may ask how Data Vault compares to Kimball star schemas. Know the three building blocks:

🔵 Hub

Stores unique business keys (customer_id, product_id). Immutable — once loaded, never changes. The anchor entity.

🔗 Link

Captures relationships between hubs. Example: "Customer purchased Product." Stores only foreign keys to hubs. No business attributes.

🛰️ Satellite

Holds descriptive attributes with full history (load date, record source). All changes are appended, never overwritten. Enables time-travel queries.

⭐ Kimball Star Schema

  • 📌 Optimized for BI and ad-hoc queries
  • 📌 Business-process oriented
  • 📌 Delivers ready-to-query tables
  • 📌 Harder to adapt when sources change

🏛️ Data Vault

  • 📌 Optimized for agility and auditability
  • 📌 Source-system oriented
  • 📌 Requires business vault / star on top for BI
  • 📌 Easily adapts to new sources
💡 Interview Gold

"Data Vault is for the integration layer — it captures everything with full history. Star schemas are for the presentation layer — optimized for consumption. Many architectures use Data Vault underneath with star schemas on top."

When to Choose Which Schema

⭐ Use Star When...

BI tools query directly. Ad-hoc analytics. Performance matters. Analysts need simple self-service. Most common choice for presentation layer.

❄️ Use Snowflake When...

Dimension hierarchies change frequently. Storage is very constrained. You need strict normalization for audit. Some cloud warehouses optimize snowflake joins well.

🏛️ Use Data Vault When...

Many source systems feed one warehouse. Requirements change frequently. Full history and auditability required. Large enterprise with evolving schemas.

🎯 Interview Tip

"In practice, star schema is the default for 80% of warehouse use cases. Snowflake is rare in modern columnar warehouses because the storage savings are negligible. Data Vault is an integration pattern, not a replacement for dimensional modeling."

Quiz: Test Yourself

Q1: Your BI team needs self-service analytics on sales data. Which schema pattern is best?

Q2: Marketing counts 10K customers, Sales counts 12K. What's the root cause in a dimensional model?

Q3: In Data Vault, which component stores descriptive attributes with change history?

Q4: Why is snowflake schema LESS common in modern cloud warehouses like Snowflake or BigQuery?

Q5: How do Data Vault and Star Schema typically coexist in an enterprise?