Modern data platforms continue to grow in complexity to meet the changing needs of data consumers. Data analysts and data scientists demand faster access to data, but IT, security and governance are stuck, unable to figure out how to give access to the data in a simple, secure, and standardized way across a wide variety of analytic tools.
In fact, according to Gartner, through 2022, only 20 percent of organizations investing in information governance will succeed in scaling their digital businesses. As a result, organizations are designing data access frameworks that allow them to overcome the data delivery challenge, maintain scalability, and ensure universal data authorizations across all parties.
Why Modern Data Platforms are So Complex
Organizations of all sizes continue to leverage data to better understand their customers, achieve competitive advantage, and improve operational efficiency. To meet these needs, an enterprise data platform capable of handling the complexity of managing and using the data is essential.
One of the biggest challenges facing data platform teams today is how to make data universally accessible from the wide range of disparate storage systems (data lakes, data warehouses, relational databases, etc.) while meeting increasingly complex data governance and compliance requirements due to emerging privacy legislation such as GDPR, CCPA, etc.
This complexity is exacerbated by the disconnect between data stakeholder groups: the technical data platform and data architecture teams; centralized data security and compliance; data scientists and analysts sitting in the lines of business chartered with generating insights; and data owners and stewards responsible for building new data products.
Without proper data access and an authorization framework to help automate processes, the complexity of managing customer data and personally identifiable information (PII) will significantly affect productivity and limit the amount of available data that can be used.
How To Establish Cloud-Based Data Security and Regulatory Compliance
When data stakeholders are not in alignment, organizations become stuck on their data delivery journey. This is because data consumers need to be able to find the right dataset, understand its context, trust its quality, and access it in the tool of their choice — all while the data security and governance teams must be trusted to apply the correct data authorization and governance policies.
Accelerating time-to-insight on data platforms requires a solid framework that not only meets the needs of all stakeholders, but also provides the ability to scale as systems expand.
When designing or architecting a solution to ensure responsible data use, it is important to develop a universal data authorization framework that includes these six key capabilities:
1. Leverage Attribute-Based Access Control (ABAC)
Most organizations start creating access control policies using role-based access control (RBAC). This approach is useful for simple use cases, but since roles are manual and inherently static, every new use case requires the creation of a new role with new permissions granted to that user.
As the data platform grows in scale and complexity, the result is a painful policy environment called “role explosion.” Also, each system has its own standards of defining and managing permissions on roles, and RBAC is often limited to coarse-grained access (e.g. to an entire table or file).
Alternatively, ABAC allows organizations to define dynamic data authorization policies by leveraging attributes from multiple systems in order to make a context-aware decision on any individual request for access.
ABAC, a superset of RBAC, is able to support the complexity of granular policy requirements and expand data access to more people and use cases via three main categories of attributes (user, resource and/or environmental) that can be used to define policies.
2. Dynamically Enforce Access Policies
Most existing solutions for policy enforcement still require maintaining multiple copies of each dataset, and the cost of creating and maintaining these can quickly add up. Simply leveraging ABAC to define policies doesn’t completely alleviate the pain, especially when the attributes are evaluated against the access policy at the decision point. This is because they still point toward a static copy.
Once the demanding job of defining attributes and policies are completed, they should be pushed down to the enforcement engine to dynamically filter and transform the data by redacting a column, or applying data transformations like anonymization, tokenization, masking, or even advanced techniques such as differential privacy.
Dynamic enforcement is key to increasing the granularity of access policies without increasing complexity in the overall data system. It’s also key to ensuring the organization remains heavily responsive to changing governance requirements.
3. Create a Unified Metadata Layer
If ABAC is the engine needed to drive scalable, secure data access then metadata is the engine’s fuel. It provides visibility into the what and where of the organization’s datasets and is required to construct attribute-based access control policies. A richer layer of metadata also enables organizations to create more granular and relevant access policies with it.
There are four key areas to consider when architecting the metadata lifecycle:
- Access: How can we enable seamless access via API, in order to leverage metadata for policy decisions?
- Unification: How can we create a unified metadata layer?
- Metadata Drift: How do we ensure the metadata is up to date?
- Discovery: How can we discover new technical and business metadata?
The challenge is that metadata, just like data, typically exists in multiple places in the enterprise and is owned by different teams. Each analytical engine requires its own technical metastore, whereas governance teams maintain the business context and classifications within a business catalog like Collibra or Alation.
Therefore, organizations need to federate and unify their metadata so that the complete set is available in real time for governance and access control policies. Inherently, this unification is done via an abstract layer since it would be unreasonable, and almost impossible, to expect to have metadata defined in a single place.
Unifying metadata on a continuous basis establishes a single source of truth with respect to data. This helps to avoid “metadata drift” or “schema drift” (aka inconsistency in data management) over time and enables effective data governance and business processes such as data classification or tagging across the organization. It also establishes a unified data taxonomy, making data discovery and access easier for data consumers.
Metadata management tools that use artificial intelligence to automate parts of the metadata lifecycle are also helpful as they can perform tasks like identifying sensitive data types and applying the appropriate data classification, automating data discovery and schema inference, and automatically detecting metadata drift.
4. Enable Distributed Stewardship
Scaling secure data access is not just a matter of scaling the types of policies and enforcement methods. The process of policy decision-making must also be able to scale because the types of data available, and the business requirements needed to leverage it, are so diverse and complex.
In the same way that the enforcement engine could be a bottleneck if not properly architected, the lack of an access model and user experience that enables non-technical users to manage these policies will get in the way of an organization’s ability to scale access control.
Effective data access management should seek to embrace the unique needs of all constituents, not obstruct them. Unfortunately, many access management tools require complex change management and the development of bespoke processes and workflows to be effective. Enterprises need to ask how this access model adapts to their organization early on.
To enable distributed stewardship the access system should support two key areas. First delegate the management of data and access policies to people in the lines of business (data stewards and administrators) who understand the data or governance requirements and replicating centralized governance standards across groups in the organization, and next ensure that change can be propagated consistently throughout the organization.
5. Ensure Easy Centralized Auditing
Knowing where sensitive data lives, who is accessing it, and who has permission to access it are critical for enabling intelligent access decisions.
This is because editing is a consistent challenge for governance teams, since there is no single standard across the variety of tools in the modern enterprise environment. Collating audit logs across various systems so that governance teams can answer basic questions is painful and are unable to scale.
The governance team too, despite setting the policies at the top level, has no way to easily understand whether their policies are being enforced at the time of data access and the organization’s data is actually being protected.
Centralized auditing with a consistent schema is critical for generating reports on how data is being used and can enable automated data breach alerts through a single integration with the enterprise SIEM. Organizations are also looking to solutions that audit log schema as they enable governance teams to answer audit questions, since many log management solutions are more focused on application logs.
Another consideration is to invest in a basic visibility mechanism early in the data platform journey to help data stewards and governance teams understand data usage and help demonstrate the value of the platform. Once the business knows what data it has and how people are using it, teams can design more effective access policies around it.
Lastly, look for a flexible, API-driven architecture to ensure that the access control framework is future-proof and capable of adapting with the needs of the data platform.
6. Future-Proof Integrations
Integrating with an organization’s broader environment is a key factor to any successful access control approach, as the data platform will likely change over time as data sources and tools evolve. Likewise, the access control framework must be adaptable and support flexible integrations across the data fabric.
One advantage of using ABAC for access control is that attributes can come from existing systems within the organization, provided that attributes can be retrieved in a performant way in order to make dynamic policy decisions.
Creating a flexible foundation also prevents the organization from having to figure out the entire architecture from day one. Instead, they can start with a few key tools and use cases and add more as they understand how the organization uses data.
After all, policy insight is a continuum and interesting insights sit at the overlap of key questions such as what sensitive data do we have? Who is accessing and why? Who should have access?
Some organizations choose to focus on open source for this reason since they have the option to customize integrations to meet their needs. However, a key consideration is that building and maintaining these integrations can quickly become a full-time job.
In the ideal scenario, the data platform team should remain lean and have low operational overhead. Investing time into engineering and maintaining integrations is unlikely to provide differentiation to the organization, especially with several high-quality integration tools exist in the ecosystem.
Success with Universal Data Authorization
Like with any big initiative, it’s important to take a step back and leverage a design-to-value approach when trying to secure data access. This means finding the highest value data domains that need access to sensitive data and enabling or unblocking them first, as well as trying to establish visibility on how data is being used today in order to prioritize action.
Organizations are making significant investments in their data platforms in order to unlock new innovation; however, data efforts will continue to be blocked at the last mile without an underlying framework.
Scaling secure, universal data authorization can be a tremendous enabler of agility within the organization, but by leveraging the six principles above, organizations can ensure that they’re staying ahead of the curve and designing the right underlying framework that will make all stakeholders successful.