Ingesting and processing the data
The first component of any pipeline is to ingest whichever data you are working with. GCP has a number of proprietary technologies to ensure consistency and high-performance throughout your stack.
Topics include:- Planning the data pipelines, such as data transformations, networking, and encryption.
- Building the data pipelines, from cleansing the data to identifying core technologies, data transformations, and data integrations.
- Deploying and operationalizing the data pipelines, such as implementing Cloud Composer and CI/CD pipelines.
GCP Professional Data Engineer Certification Preparation Guide (Nov 2023)
→ Ingesting and processing the data
Module Topics
Planning the data pipelinesBuilding the pipelines
Deploying and operationalizing the pipelines
Planning the data pipelines
Data pipelines are software applications. They require planning and standardization to operate at peak efficiency. Understand and plan for the various requirements of data pipelining in GCP.
Topics Include:- Defining data sources and sinks
- Defining data transformation logic
- Networking fundamentals
- Data encryption
Building the pipelines
After planning your pipeline you can beginning developing your solution. Identify and develop the core services your pipelines require while relying upon GCP core services to facilitate your efforts.
Topics Include:- Data cleansing
- Identifying the services (e.g., Dataflow, Apache Beam, Dataproc, Cloud Data Fusion, BigQuery, Pub/Sub, Apache Spark, Hadoop ecosystem, and Apache Kafka)
- Batch
- Streaming (e.g., windowing, late arriving data)
- Language
- Ad hoc data ingestion (one-time or automated pipeline)
- Data acquisition and import
- Integrating with new data sources
Transformations
Deploying and operationalizing the pipelines
Following development of your pipeline and components you should have a plan in place to effectively deploy and operationalize the pipeline. Leverage tools such as GCP Cloud Build to achieve a high frequency development architecture centered around a CI/CD DEVOPS platform.