Why do we need Data Products?

The true reason behind any data-related activity is to enhance business outcomes. The objective of leveraging data was and will be to enrich the business experience and returns. This direct tie between data and business, even though obvious, was lost in translation for a long time as we, the data community, dived into the tactics and forgot the end objective.

As data teams got caught up in defining, building, and maintaining the process – data infrastructures, pipelines, and architectures, it progressively ate up the time they spent on the core data and data applications. What this meant was:

  • Limited data applications to power customer-facing endpoints or business decisions.
  • Faulty and untrustworthy data as a result of increasing debt in the data infra layers and never enough data engineers to manage it.
  • Long and loopy path from data to insights that led to the loss of several business opportunities.
  • High time to ROI and low ROI of data teams due to complex builds and fixer-uppers that hogged significant resources and ate up from whatever ROI was generated in the first place.

Over 90% of the world's data was generated (captured) only during the last couple of years and stored across expensive storage such as data lakes, warehouses, or lakehouses that was hauntingly similar to the basement with dusty files filled with rich information that no one could essentially operationalize. Another name for it is a data swamp.

Such data mismanagement results from data stacks that silo, duplicate, fragment, lock in, and misgovern data. To battle this chaos, we need to establish the data product ideology and implement it through a unified data architecture that pushes back against the data disruption caused by prevalent stacks and frees up the organization’s resources to focus on building the real deal – the data product.

What is a Data Product?

A data product is a reliable unit of data or a container of data that enables a direct and seamless impact on business decisions and outcomes at the time of opportunity.

There are five aspects to the above definition:

Simplicity or Seamless Impact

One of the challenges that businesses face is to wade through the complexities of data to somehow mine insights and patterns that they could half-heartedly rely on. A data product is targeted to simplify this barrier since they are born out of decoupled physical and logical layers where the control of the data narrative lies with business. From the perspective of product thinking, data product is an enhanced user experience of data.

Data Unit or a Container

The most fundamental physical or logical data unit that could independently add value to the end user. While physical units are directly consumed, logical units are used as on-the-fly channels for materializing the physical units, powering intelligent data movement and saving movement expenses in the process.

Direct Business Impact

The ideal objective of any data stack, data team, or data initiative should be to create valuable data that actually uplifts business objectives. While this end vision is lost under the task of maintaining heaps of complex subsystems, the data product brings back the target to the forefront and solves it right-to-left.

Reliable Unit

Nobody would make use of data they cannot trust, and even if they do due to limited options, there is the inevitable burden of engaging high-cost dedicated teams and resources to look after the lineage of the unstable business use case.

Operationalized at the time of opportunity

The most critical factor in business is time, especially due to the rate at which business moves today. Infra infiltrated with leaky pipelines needing constant fixes steals away time from actually producing business-relevant insights. A data product, therefore, is data that is available on demand with quality and governance enforced on the fly.

These five factors are key to businesses and embody the outcome of a well-developed product mindset where data itself is treated as a product by the organization. Unlike common practice, data must be developed and managed as a product to leverage optimal business value.

Interestingly, data product materialization is more of a people and process problem than a technology problem. There can be numerous potential ways to power a data product, as we have seen pop up over the last few years, but the key is to implement an architecture that duals as a data culture enabler.

Otherwise, as is common in the data ecosystem, everything falls apart as another unmanageable asset that the data team is compelled to look after due to the high investments poured in by the organization. Establishing the product mindset is the largest barrier to enabling and maintaining a data product so that it doesn’t become another feared high-entropy design approach.

A data product adds value to the user consistently and reliably and, at every point in time, embodies the following features:

data-mesh

Discoverable

Data discoverability is perhaps the most important feature behind data operationalization. If you cannot find the data, you are blocked at the very first stage. Discoverability implies reusability, which is why the data product approach encourages uniqueness and is anti-duplicity. Discoverability is powered by a global metadata manager that sits in a central control plane and has visibility over the entire data ecosystem across distributed user planes.

Secure

Data products being on-demand value providers are inherently embedded with security protocols that activate on-the-fly based on the demand center. They must be governed centrally with policies enforced at asset, row, and column levels. The uniqueness of data products heightens the importance of security, requiring adherence to standards and compliances across all materialization channels.

Addressable

Addressable data is available as a standard asset across cross-functional, regional, and multi-cloud environments. This means having a common and unique address that conforms to the organizational structure and is accessible via code or low-code/no-code platforms.

Trustworthy

Trustworthy data meets quality expectations, adheres to standards, and includes lineage and provenance metadata. It allows business users to rely confidently on data, supported by governance systems that manage quality without manual intervention.

Natively Accessible

Natively accessible data is agnostic to language, format, personnel, and systems. It includes support for DSLs, low-level subsystems, multiple programming languages, and the ability to onboard new ones through native transpilers—making data more usable across all roles.

Interoperable

Data generated or sourced from any source should be able to talk to each other without bombarding conflicts. This is possible through data APIs or logical constructs and loosely coupled yet tightly integrated components that sit on top of unique and addressable data to ensure visibility across the required use cases with end-to-end on-the-fly governance. Interoperability also requires universal semantics that are manageable through a central control plane.

Valuable on its own

Data is valuable on its own if it acts as a complete and independent entity that directly impacts business decisions. This requires the data to adhere to universal semantics, business quality assertions, and use-case-specific governance standards.

Examples of Data Products

Now that we have a better understanding of what data products are and why they are necessary for the data ecosystem, let’s look at a few examples to get a clearer picture:

  • A spreadsheet in an S3 bucket
  • A table view knitted from across heterogeneous sources
  • A report generated from an analytics dashboard
  • A metrics layer on top of a data model
  • Features in an ML feature store
  • A database of dynamic rules in a self-driving car
  • An encrypted or masked dataview displayed through a SQL interface

The list is not exhaustive. In short, any data or logical data construct that embodies the attributes discussed above can be declared and used as a data product.

How is a Data Product materialized?

Data Products would be the outcome of elegant dataOps and agile methodologies. While there are multiple ways to generate data products, maintaining data products is the key challenge and the primary barrier is the data culture problem.

A unified architectural approach that decouples a central control plane and user data planes with domain-based segregation stimulates the right tradeoff between technology enhancement and cultural enablement. Such an architecture also inherently decouples logical and physical data layers to restore ownership of data products to the business

To deep-dive into each component from the above architecture, feel free to refer to the Data Mesh Implementation of a Data Operating System.

data-mesh

Unified Architecture with decoupled control and user planes, domain segregation, and decouple logical and physical layers.

Let's talk

Stay in touch with us

Whether you have a specific project or just want to explore options — we look forward to hearing from you.