Behind the trend of data mesh lies the true challenge of data decentralization

Article Data Governance 11.07.2024
By Laurent Nicolas-Guennoc

Used to supporting large companies in their data governance strategies and modalities, Thibault Lorrain, Senior Manager, and David Guede, Practice Leader, within the Data Technologies teams at Converteo, focus on the trendy concept of “data mesh.” Throughout this interview conducted by Laurent Nicolas-Guennoc, Marketing Director of Converteo, Thibault and David dissect the data mesh to reveal what this concept truly enables and under what conditions.

 

The term ‘data mesh’ has been a buzzword among digital and data professionals for the past few years. Is it just a fad, or a real trend?

Thibault Lorrain – As often happens when such a term emerges, it is necessary to cut through the ambient noise and delve deeper, methodically, to understand what this expression means.

The data mesh was first described by a consultant, Zhamak Dehghani, in 2019, in a document titled “How to Move from a Monolithic Datalake to a Distributed Data Mesh,” written while she was working at the American company Thoughtworks.

The key term for Dehghani is “decentralized.” A data mesh is a technical architecture and an organization of teams based on the decentralization of data, with the assumption and objective that this decentralization allows for better responses to data users and the business needs of the company.

David Guede – “Mesh” refers to the network or interconnection in English: we seek to “mesh” the organization by distributing responsibilities and access to data to the domains of the company where users who need it work.

Distribution, domains—here we are: behind the recent expression of data mesh lies a whole body of computer literature that describes, as Eric Evans did as early as 2003 in Domain-Driven Design: Tackling Complexity in the Heart of Software, the necessity of defining domains, the specific contexts in which software or data operates, working with the experts in those domains—what we now call “the business”—and clearly describing the relationships and interactions between these different contexts to avoid ambiguities and inconsistencies.

In other words, there is no “mesh” without a “domain.” And we have been talking about domains for over twenty years, so this is more than just a trend; it is a fundamental principle.

What is the “datamesh” approach supposed to address?

TL – It is a counter-model to the “monolith,” which is a well-known critique that dates back quite a while: a IT department that, claiming to ensure quality, security, and reliability, dictates the rules of the game and processes, chooses the data platform, deploys it, and strictly controls access. The result: frustrations from the business, a feeling that their needs are misunderstood and not addressed quickly enough…

Decades of organizational sociology have shown that the more this type of dependency is closed off and strictly enforced, the more it creates a fertile ground for workarounds.

DG – What is sometimes referred to as “shadow IT,” where a team resorts to its own tools, duplicates parts of infrastructure and data, and defines its own rules, is actually very common.

While the business may regain some leeway, this “everyone for themselves” approach cannot last without having negative effects on the organization: interoperability issues, the creation of silos, divergent practices and rules, data reliability problems, and often, a multiplication and redundancy of costs.

 

How to resolve the dilemma between centralism and chaos?

TL – Decentralized governance—an expression we generally prefer over “datamesh,” as there are actually a whole range of possible options and not a single datamesh model—proposes a third way that must primarily restore autonomy to the business units, allowing them to gain in responsiveness.

In practical terms, as with any governance, the project must start with formalization steps: defining the roles of technical and data teams on one hand, and those of the different “domains” or meshes of the organization on the other hand, and documenting the conditions for interoperability between domains. It is particularly essential to standardize the definition of indicators; take revenue, for example, which is used in different teams (marketing, finance, etc.) but can be constructed differently. Our role as an external advisor in these major restructuring phases is, in our experience, essential for companies that genuinely want to create a consensus-driven approach.

DG – At Converteo, we also hold the belief that there can be no successful decentralization of data governance without a cultural shift that involves treating data as an internal product. To reconcile data usage that adapts to the needs of different teams, there needs to be a “data product” that is standardized, reliable, and interoperable.

To ensure interoperability, the data product must meet very strict requirements. A good practice to keep in mind is to standardize information rigorously. Simple recommendations include defining a uniform format for dates and maximizing the use of ISO standards. In some industries, these standardizations are taken to the extreme with standardized field names and very strict formats. Take, for example, the CDISC format in the pharmaceutical industry, which has become established and simplified the work of everyone by facilitating exchanges, both internally to cross-reference data sources and externally to share data with other partners. An interesting approach is also to make better use of metadata, which is often neglected today.

Advanced exploitation of metadata allows for even greater data sharing across different domains and serves as a universal “dictionary” for the company. One cannot talk about a data product without mentioning data contracts, a very important topic for many of our clients. These data contracts establish sharing rules between the producer (responsible for the data) and the consumer (the data team) by defining the formats and necessary transformations at the time of collection, as well as mandatory data and those that can remain empty. This framework greatly accelerates the standardization phases and thus the cross-functional use of data within a company.

TL – The companies that are most successful in making the shift to decentralization are those that manage to lower the barriers to data usage within their business units, particularly by making significant efforts on the user interface, as well as the ability to produce simple, clear documentation accessible to non-specialists.

It is no coincidence that the demand for “Data Product Owner” profiles, who possess both product management culture and data expertise, is growing during the operationalization phases of these new governance structures.

The parallel with the decentralization of the state is quite effective, I believe: local authorities have requested the means to exercise the new competencies that have been delegated to them. Business teams, the “domains” of the company that are calling for access to data, must also have the right resources—user-friendly tools, training, and new skills within the teams—to fully benefit from data decentralization.

 

Sources : 

By Laurent Nicolas-Guennoc

Chief Marketing Officer