Log In

Storing the data Foxtrot Communications

Storing the data

Ingested data must be stored and how you choose to architect your storage layer will set the tone for your entire project. GCP offers a number of fully managed storage options to choose from. The correct choice will depend upon a number of trade-offs between availability, durability, cost, and performance.

Topics Include:

GCP Professional Data Engineer Certification Preparation Guide (Nov 2023)
 → Storing the data

Module Topics

Selecting storage systems
Planning for using a data warehouse
Using a data lake
Designing for a data mesh

Selecting storage systems


At the core of your entire solution is the storage layer. Build this correctly and your entire solution will run smoothly, build it incorrectly and it will be extremely difficult to solve and will plague your solution. Leverage GCP native services to ensure a high quality and cost-effective storage solution.

Topics Include:
  • Analyzing data access patterns
  • Choosing managed services (e.g., Bigtable, Cloud Spanner, Cloud SQL, Cloud Storage, Firestore, Memorystore)
  • Planning for storage costs and performance
  • Lifecycle management of data

Planning for using a data warehouse


A data warehouse is an effective tool for structuring and leveraging your data for analysis and reporting. Use tools such as BigQuery to create a high-performance serverless data warehouse solution. Organize your data effectively by mapping current and future architecture to current and future business requirements. Leverage developed tools to support data access patterns.

Topics Include:
  • Designing the data model
  • Deciding the degree of data normalization
  • Mapping business requirements
  • Defining architecture to support data access patterns

Using a data lake


A data lake can be a great solution for data storage and architecture. Although there are many ways to develop a data lake, there are a few best practices for managing a data lake in GCP.

Topics Include:
  • Managing the lake (configuring data discovery, access, and cost controls)
  • Processing data
  • Monitoring the data lake

Designing for a data mesh


Data Mesh is a fairly new concept which encourages domain driven data product development. GCP has a number of native tools which can enable you to quickly and efficiently build a data mesh, segment data, and build a federated governance model.

Topic Include:
  • Building a data mesh based on requirements by using Google Cloud tools (e.g., Dataplex, Data Catalog, BigQuery, Cloud Storage)
  • Segmenting data for distributed team usage
  • Building a federated governance model for distributed data systems