The Integration That Almost Wasn't
In Q3 2025, Networth Corp was brought in six weeks into a $400M merger between two regional P&C insurers — Allegheny Mutual and Great Plains Indemnity. The stated goal was straightforward: unify claims adjudication onto a single platform within 120 days. The reality was that Allegheny ran a modernized event-driven architecture on Kafka and Snowflake, while Great Plains still operated batch ETL jobs feeding a DB2 warehouse updated nightly. By the time we arrived, the integration team had already burned three sprints trying to reconcile 14 shared data entities — claimant, policy, loss event, reserve, payment — across systems that agreed on almost nothing at the schema level.
The symptoms were familiar: 1,847 reconciliation errors per week, $2.3M in duplicate reserve postings flagged by internal audit, and a data engineering team of eleven working nights to hand-patch misaligned fields. Schema drift between the two platforms was not a future risk — it was an active, measurable cost. The CTO told us the board was two weeks from pausing the integration entirely.
What a Data Contract Actually Contains
We proposed data contracts as the interface layer between the two platforms. A data contract, in our implementation, is a versioned YAML specification owned by the producing team that defines the schema, semantic types, freshness SLA, and quality invariants for a specific data product. Each contract lives in its own Git repository alongside the pipeline code that fulfills it. The contract for the unified ClaimEvent entity, for example, specified 38 fields with explicit nullability rules, an enum registry for claim_status with seven allowed values, a maximum delivery latency of 45 seconds for streaming and 6 hours for batch, and four quality checks including referential integrity against the Policy contract.
Versioning and Compatibility Rules
We adopted a semver-inspired versioning scheme. Additive changes — new nullable columns, expanded enums — increment the minor version and are backward-compatible. Breaking changes — column removals, type changes, tightened nullability — require a major version bump, a 30-day deprecation window, and explicit opt-in from every registered consumer. The contract registry tracked 23 active contracts across five major data entities by week four, with consumer dependency graphs generated automatically from CI metadata.
CI Enforcement: Contracts Without Teeth Are Just Documentation
The technical specification alone would have failed. What made the contracts operational was CI enforcement. Every pull request that modified a producer pipeline triggered a contract validation stage in GitHub Actions. The pipeline ran three checks: schema compatibility against the registered contract using a custom JSON Schema validator, a sample-data smoke test that pushed 500 synthetic records through the transformation and asserted output conformance, and a breaking-change detector that blocked merges to main when a major version bump was required but not declared. Failed checks produced a diff-style report showing exactly which fields violated the contract, with line-level references to the YAML spec.
Runtime Monitoring and Alerting
CI catches problems before deployment; runtime monitoring catches everything else. We deployed Great Expectations suites as post-load validation on both the Kafka consumer and the nightly batch landing zone. Contract violations at runtime — a null value in a non-nullable field, a claim_status value outside the enum, a delivery beyond the SLA window — generated PagerDuty alerts routed to the producing team, not the consumers. This ownership inversion was critical: producers could no longer silently ship breaking changes and let downstream teams absorb the debugging cost.
The Harder Problem: Organizational Change
The most difficult part of this engagement was not technical. It was convincing two legacy-org data teams — who had operated independently for decades — to accept producer ownership of data quality. We ran a two-day workshop with both teams to collaboratively define the first five contracts, ensuring neither side felt the other was dictating terms. We established a rotating contract review board with three engineers from each organization that met weekly to adjudicate disputes, approve breaking changes, and update the shared enum registry. By week six, the teams were self-governing; our role shifted to observability tuning and edge-case support.
Eight weeks after contract adoption, weekly reconciliation errors dropped from 1,847 to 143 — a 92.3% reduction. The duplicate reserve postings stopped entirely once the Payment contract enforced idempotency keys. The integration hit its revised 120-day deadline with two weeks to spare, and the unified claims platform processed its first live policy on January 14, 2026. Data contracts did not make the merger easy. They made the merger possible.