Engineering & ArchitectureApril 202611 min read

Data Platform Modernization: A Reference Architecture We Actually Ship

The patterns we use when re-platforming mid-market data estates onto the modern lakehouse — and the four decisions that determine whether the platform stays alive in production.

Andrei Volkov

Principal · Engineering & Architecture

“A data platform isn't a stack diagram. It's the contract between the people producing data and the people accountable for decisions made on it.”

We are usually called in after the second platform has stalled. The first one — the on-prem warehouse — ran out of room and out of patience. The second — the modern data stack assembled from twelve SaaS tools and a Slack channel — is somehow more expensive, more brittle, and produces fewer trusted numbers than the thing it replaced. This is the reference architecture and the four decisions we use to break that pattern.

The four decisions that decide everything else

Almost every architecture choice on a data platform is downstream of four decisions. If a leadership team gets these four right, the rest of the platform is mostly a procurement exercise. If any of these four are unresolved, no amount of tooling sophistication recovers the program.

1Where the analytical compute lives — Snowflake, Databricks, BigQuery, or self-managed. This decision constrains the team you can hire and the SLAs you can promise.
2Where transformations are owned — in the warehouse via dbt, in a separate orchestration layer, or in application code. Whoever owns transformations effectively owns the semantics of every business metric.
3Where governance is enforced — at ingestion, in the catalog, or at the BI layer. Each is defensible. Picking two is fatal.
4Where AI/ML workloads run — same platform as analytics, a parallel platform, or a third-party vendor. The cost and lock-in profiles diverge sharply between these three over a five-year horizon.

The reference architecture

For most mid-market clients in 2026, the architecture we converge on looks remarkably similar. Sources land into raw object storage (S3 / GCS / ADLS) via Fivetran, Airbyte, or hand-rolled connectors for the long tail. A lakehouse engine — Snowflake or Databricks, depending on the AI ambition — provides the compute. dbt owns the transformation layer end-to-end, with tests as a build gate, not an afterthought. A catalog (Unity, Atlan, or DataHub) is the system of record for definitions, lineage, and access. BI is split: governed dashboards in one tool, exploratory work in another, and the line between them is enforced by which models the BI tool can hit.

There is nothing exotic about this. The discipline is in what we say no to. We do not stand up a separate streaming platform until a real sub-second use case arrives. We do not pre-build a feature store before the second ML model. We do not buy reverse ETL until we have a customer for the data being reverse-ETL'd. Every component in the platform earns its way in by being on the critical path of a working use case.

“Every component in the platform earns its way in by being on the critical path of a working use case.”

Lakehouse vs warehouse: how we actually pick

The Snowflake-vs-Databricks decision is rarely about benchmarks. It is about the team and the workload mix. We pick Snowflake when the workload is dominated by SQL analytics, the team is SQL-fluent and Python-light, and the AI roadmap is mostly off-the-shelf or vendor-hosted. We pick Databricks when the workload meaningfully includes ML training, when the team has Python and Spark fluency, or when the company has a serious data science investment that needs to live next to the analytical data.

Both can do the other's job. Both have closed the feature gap by a wide margin. The decision is about which platform your team will operate well, sustainably, in year three — not which one has the better demo.

Where modernization programs actually fail

We have run the post-mortem on enough failed modernizations to see the same three failure modes:

The migration is run as a lift-and-shift with no semantic re-anchoring. The same broken metrics arrive in the new platform faster, and the leadership team blames the platform.
Governance is bolted on after the platform is live. Within six months there are seven definitions of revenue and the data team is the bottleneck for every executive question.
The team that built it cannot operate it. The migration was shipped by a partner who left. The internal team inherits a stack they did not choose, and operating cost compounds quietly until someone notices the run-rate.

How we run the engagement

A typical Polance data platform engagement runs in three phases. Phase one is two to four weeks of architecture design and decision capture — we land the four decisions above, get explicit sign-off from the operating leadership, and produce a one-page reference architecture the engineering team will live with. Phase two is the build, typically twelve to twenty weeks, with our senior engineers on the keyboard alongside the client team — we do not write code that only we can maintain. Phase three is a deliberate handover and stabilization period, where we step out of the critical path and the client team runs the platform under our review for a final four to eight weeks.

We are deliberately on the work, not above it. Architecture decisions made by people who are not also implementing them tend to be elegant in the deck and unworkable in the repo.

Cost discipline that actually compounds

FinOps for data platforms is mostly common sense applied consistently. The patterns that move the needle: warehouse sizing tied to actual query patterns, not vendor-suggested defaults; auto-suspend everywhere; cost per use case as a first-class observability metric; and a quarterly architecture review that is allowed to retire components that did not earn their cost. None of this is sophisticated. All of it requires somebody senior to care about it for the duration of the engagement.

Written by

Andrei Volkov

Principal · Engineering & Architecture

Let's talk about the outcome
you're trying to move.

Tell us where you're stuck — strategy, digital, data, or all three. We'll bring a senior partner to the first call, share an honest perspective, and only propose work if we genuinely believe we can move the number.

Typical response within one business day · NDA on request · No obligation

Data Platform Modernization: A Reference Architecture We Actually Ship

The four decisions that decide everything else

The reference architecture

Lakehouse vs warehouse: how we actually pick

Where modernization programs actually fail

How we run the engagement

Cost discipline that actually compounds

Related reading

Beyond Dashboards: Building a Data-Driven Operating Model

AI Without the Hype: A Practical Adoption Framework

Let's talk about the outcome
you're trying to move.

The four decisions that decide everything else

The reference architecture

Lakehouse vs warehouse: how we actually pick

Where modernization programs actually fail

How we run the engagement

Cost discipline that actually compounds

Related reading

Beyond Dashboards: Building a Data-Driven Operating Model

AI Without the Hype: A Practical Adoption Framework

Let's talk about the outcome you're trying to move.

Let's talk about the outcome
you're trying to move.