Sharing data
Cloud native philosophy enables a high degree of transferability among data and processes. GCP's BigQuery Architecture has built in tools which enable sharing data efficiently and safely to audiences across your organization and across the world.
Topics Include:- Defining rules to share data
- Publishing datasets
- Publishing reports and visualizations
- Analytics Hub
GCP Professional Data Engineer Certification Preparation Guide (Nov 2023)
→ Preparing and using data for analysis
→ Sharing data
Topic Contents
Defining rules to share dataPublishing datasets
Publishing reports and visualizations
Analytics Hub
Defining rules to share data
Data products are meant to be shared, analyzed, recycled, and acted upon, but these data must be shared in accordance with pre-defined organizational rules and policies. In keeping with this effort, GCP provides tools and best practices when choosing to share data with organizations or external parties while controlling for access.
Defining Rules To Share Data
Data is meant to be shared. Whether you are producing a report for mangement or creating a publicly searchable human genome dataset the process is almost essentially the same. Identify who the audience for this data is, how they will access the data, and which rules should be applied to the data to ensure a private, safe, and secure experience for all stakeholders.
Tools like Dataplex can provide sophisticated rules which can be automatically applied when developing data in your data warehouse. Data Catalog is a powerful tool which allows you to apply tags to data assets such as tables, datasets, or views. You can use Dataplex and Data Catalog to apply hierarchical control groups to your data and enable access based upon user classification and categorization. Dataplex also ensures that any data you do share is of high-quality and is capable of providing the information needed by your audience.
Identifying the target audience is the first step when choosing to share data. This should be a rigorous process and requires sometimes very sophisticated rules to determine who should (and who shouldn't) have access to your data. This requires a good organizational resource hierarchy and the permissions tree will reflect the state of the org chart. When you're working with a data council on a data mesh project you can work out the sets of rules needed to ensure proper access controls.
Use Data Catalog to apply search rules via tag templates and make your data searchable by users within your organization. Use IAM to control access to hierarchical resources in a sophisticated way without having to apply permissions to individual datasets one by one.
BigQuery
Within BigQuery there are generally two ways to share data: authorized views and authorized datasets. Each applies inclusionary rules to determine which parties or users have permissions to view your data. Another option is to use tags to create rules for data access. An example is to add a tag which limits rules base upon a user role, such as analyst or engineer. Combine this with Sensitive Data Protection (Cloud DLP) to safely share and profit from data generated within your organzaition.
You can also apply individual user permissions to datasets, but this can become cumbersome to maintain after a while. Using tags to control access permissions is a more sustainable and robust solution to control data access. Use tools such as dynamic data masking and column level security to apply masking to fields based upon a user's role.
Publishing datasets
GCP offers a few tools and methods which can be used to share datasets with various audiences.
Publishing Datasets
Datasets can be be published and made searchable in various ways within GCP. This includes making data searchable within Dataplex by providing business and technical metadata to incoming objects, sharing datasets within BigQuery, and using Analytics Hub to publish and access data.
Dataplex is a useful tool for automatically collecting and applying complex data quality and discovery rules to individual datasets, lakes, and tables onboarded onto your data ecosystem. Dataplex can perform some very sophisticated discovery actions such as automatically creating BigQuery External Tables or file-sets from your incoming structured or semi-structured data, as long as the data are properly formatted according to Dataplex standards. This metadata is then made searchable within Dataplex according to the pre-defined governance rules.
Publishing reports and visualizations
GCP has tools and methods to safely and securely publish reports and visualizations.
Publishing reports and visualizations
Reports and visualizations can be published by Looker Studio or Looker (Core). Publishing reports and visualizations enables your intended audience to access the data the way you intend the data to be seen and digested. Use highly detailed and sophisticated reports to reflect the economic activity of your organization communicated through your data.
Reports and visualizations breath life into your data and help tell the story of your organization more completely and in a more compelling way than simply words or a spreadsheet can alone.
Analytics Hub
GCP's Analytics Hub is a high performing source for publishing and subscribing to datasources across organizations. It enables data producers to create, share, and profit from valuable data products.
Analytics Hub
Analytics Hub is a powerful and complex data discovery and sharing platform hosted by GCP. It provides a highly integrated security and privacy framework which can be tailored to your organizations security and privacy rules. With Analytics hub you can publish your BigQuery datasets and set rules to enable monetization or public sharing of the datasets.
It is BigQuery's unique serverless architecture which enables the pub/sub model of dataset sharing via Analytics Hub. This capability is unmatched outside of GCP. Use Analytics Hub Data Exchanges to share almost any BigQuery object from datasets to tables to views to ML models. Objects can be shared via private exchanges with limited permission or public exchanges to share your data with the world. Share data with partners to produce powerful cross-synergies and enable conversation.
You can protect source data by creating authorized views. Control Subscriber and Publisher access as the Analytics Hub administrator. Analytics Hub allows sharing read-only access to data resources via Listings, which are searchable entries in Analytics Hub Data Exchanges.