As companies struggle to process, store, and leverage ever-increasing amounts of structured and unstructured data, data governance is becoming a critical part of every company’s data management.
Governance not only helps a company understand and use its data, but it ensures everyone has access to the data they need, when they need it. “Data doesn’t have much value if it lies dormant in your system, where no one can gain insight from it,” says Salim Syed, head of engineering for Capital One Slingshot. “A well-governed data platform brings data out of that darkness.”
Effective governance also enables a company to implement and manage internal policies and standards related to the security and usage of data. This not only supports a company’s response to external compliance directives, but also standardizes the data for use across the company. Standardized data provides the “single source of truth” required for critical business decisions, as well as the data quality and trustworthiness teams need to do their jobs.
Data governance challenges
On the surface, implementing data governance might seem obvious and straightforward, but the act of governing data across a company’s teams and products introduces levels of complexity that many companies either half-heartedly attempt to address or avoid altogether.
Instilling the processes, policies, and protections of governance requires new mindsets around people, processes, and technology. “It’s not the run-time activities that persuade someone not to do governance,” says Syed. “It’s all the work that’s needed to set up governance.”
For many, the approach to data governance is to establish policies that are overseen by individual sectors of the business, which makes implementation all the more difficult. “Think about all the different teams that are doing that in a large organization,” explains Syed. “They all have to do that dependency check, and each team is also doing separate development work to meet those requirements, which is a lot of duplicated effort.”
A siloed data governance initiative that requires each team to monitor its own data dependencies takes time and effort away from other work as well. “It becomes cumbersome to innovate because at every step of innovation, you have to check if there are dependencies on your governance policies,” says Syed.
Siloed approaches also introduce the possibility of error and make it more difficult to ensure all governance policies are followed consistently, in all cases. These hurdles can result in a lack of buy-in from employees and stakeholders, deflating any realized data governance benefits.
A federated governance solution
In many companies, data is viewed as an IT asset, and thus an IT responsibility. Although that might have been true in the past, the volume and speed of data today, and the innovative ways companies are using their data, means data is the responsibility—and the driving force—for all business units.
To build an effective data governance program to serve every area of the business, it’s best to centralize the framework to reduce errors and to reduce duplicate efforts. “For federated teams to be successful in applying data management rules and governance, you can’t just set a policy and let every team go build technology to enforce it,” says Syed. A centralized approach is less complicated to monitor, facilitates data consistency and accuracy, and is easier to make transparent, all of which helps with stakeholder buy-in. “If you have a centrally managed data platform, a centrally managed data ingestion pipeline, and a centrally managed data policy, then you only make changes to [the data] in one place,” Syed explains. This ensures data remains compliant, secure, and consistent wherever it is used.
A best practice in establishing a centralized data governance initiative, Syed argues, is to build a central data catalog. All incoming data is ingested to a central location where it is first classified—meaning, data is identified and labeled with metadata, and restriction levels are determined. From there, access and permissions can be assigned, which facilitates sharing across the organization. “With a centralized catalog, wherever your data resides, it’s the map,” explains Syed. “Once it’s cataloged and classified, then you can share. You can basically break the silo.”
A data catalog effectively creates a compliant, secure data marketplace that allows teams to access any data they have permissions to use. This is beneficial on several fronts. It assists in data discovery at scale, which in turn can decrease development time and spur innovation. Cataloged data also comes with context, making it easier and faster to understand and use when making business decisions. This level of data stability fosters data quality, data integrity, and data lineage—all of which instills trust in the data, an essential component of data value. If the data can’t be trusted, it can’t be used to inform business decisions.
Embracing the culture shift
No data governance initiative will be successful without buy-in and adoption. To cultivate the culture shift needed to implement data governance, it’s essential that the tools and processes work in concert. “It’s so important that the data catalog is always in sync with the data platform,” says Syed. “A lot of companies don’t pay attention to that problem. You forget to register something and then you have orphan data everywhere. If something is added or changed and it’s not reflected in your catalog, then it just completely loses its value.” If users are frustrated with a lack of functionality, adoption will be an uphill battle.
It’s also important that the tools and processes remain flexible. Tools need to accommodate the unique load patterns of each line of business, notes Syed. Otherwise, teams will find workarounds to achieve their goals (sometimes called shadow IT), which can jeopardize the integrity of a data governance framework.
Perhaps most important for adoption, the tools and processes need to be clear-cut. “It has to be simple and very easy to use.” says Syed. “And you really have to empower the business to own the data and treat data as a product—with its own service-level agreements, with its own quality, and own resiliency. That is also a completely different mindset that must be changed.”
This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.