What Is Schema Drift?

Schema drift is when upstream data changes its structure without warning. A column gets renamed, a type changes, a new field appears โ€” and your pipeline breaks or, worse, silently produces wrong results.

Column Renamed

user_name becomes username. Your transforms SELECT the old name โ†’ NULLs everywhere, silently.

Type Changed

price goes from DECIMAL to STRING. Aggregations break or produce garbage.

Column Dropped

A "deprecated" field removed without notice. Any downstream model depending on it fails.

New Column Added

Usually safe but can break SELECT * patterns, change column ordering, or exceed schema limits.

Interview Gold

"I detect schema drift by comparing incoming schemas against an expected contract โ€” column names, types, and nullability. Additive changes can be auto-allowed; breaking changes fail fast."

Breaking vs Non-Breaking Changes

Not all schema changes are equal. Your policy should distinguish between safe and dangerous changes.

Breaking Changes

  • ๐Ÿšจ Column renamed or removed
  • ๐Ÿšจ Type changed (DECIMAL โ†’ STRING)
  • ๐Ÿšจ Nullability changed (NOT NULL โ†’ nullable)
  • ๐Ÿšจ Primary key columns changed
  • ๐Ÿšจ Enum values removed

Non-Breaking Changes

  • โœ… New column added (with default)
  • โœ… Column description updated
  • โœ… New enum values added
  • โœ… Type widened (INT โ†’ BIGINT)
  • โœ… New optional metadata fields
Gotcha

Type widening (INT โ†’ BIGINT) looks safe but can break Avro/Protobuf consumers that hardcode types. Always check compatibility at the serialization level, not just SQL.

What Are Data Contracts?

Data contracts are explicit agreements between producers and consumers about schema, semantics, SLAs, and ownership. They shift quality left โ€” problems are caught before production.

# Example data contract (YAML) contract: name: orders owner: payments-team sla: freshness: 1h completeness: 99.5% schema: - name: order_id type: STRING required: true unique: true - name: amount type: DECIMAL(10,2) required: true checks: ["value >= 0"] - name: status type: STRING allowed: [pending, paid, refunded]
Key Insight

A contract isn't just a schema. It includes SLAs (freshness, completeness), semantic definitions (what "amount" means), ownership, and change policies.

Contract Enforcement in CI/CD

Contracts only work if they're enforced. Here's how they integrate into the development workflow.

Producer PR
Changes schema
โ†’
CI Check
Compare to contract
โ†’
Breaking?
Block merge
โ†’
Notify Consumers
Migration plan

Schema Registries

Confluent Schema Registry, AWS Glue Schema Registry. Store versioned schemas, enforce compatibility rules (BACKWARD, FORWARD, FULL).

dbt Contracts

dbt 1.5+ supports contract: {enforced: true} on models. Columns, types, and constraints validated at build time.

Protobuf / Avro

Serialization formats with built-in schema evolution rules. Field additions are safe; removals require deprecation.

Producer vs Consumer Responsibilities

Data contracts create clear boundaries. Know who owns what โ€” interviewers test this.

Producer Responsibilities

  • ๐Ÿ“ Define and maintain the contract
  • ๐Ÿ”’ Never make breaking changes without notice
  • โฐ Meet freshness and completeness SLAs
  • ๐Ÿงช Run contract validation in CI
  • ๐Ÿ“ฃ Announce deprecations with migration time

Consumer Responsibilities

  • ๐Ÿ“– Read and understand the contract
  • ๐Ÿ”ง Tolerate non-breaking changes gracefully
  • ๐Ÿšซ Don't depend on undocumented fields
  • ๐Ÿ“‹ Migrate within deprecation windows
  • ๐Ÿ› Report contract violations to producers
Interview Tip

"Data contracts shift accountability to producers. Without contracts, the data team owns every downstream break. With contracts, producers own stability and consumers own adaptation."

Quiz: Test Yourself

Q1: Which schema change is most dangerous for downstream consumers?

Q2: What distinguishes a data contract from a simple schema definition?

Q3: In a schema registry, what does BACKWARD compatibility mean?

Q4: Who bears the cost of a breaking schema change when there are NO data contracts?

Q5: How does dbt enforce data contracts?