Ingesting and processing the data Foxtrot Communications

Ingesting and processing the data

The first component of any pipeline is to ingest whichever data you are working with. GCP has a number of proprietary technologies to ensure consistency and high-performance throughout your stack.

Topics include:

Planning the data pipelines, such as data transformations, networking, and encryption.
Building the data pipelines, from cleansing the data to identifying core technologies, data transformations, and data integrations.
Deploying and operationalizing the data pipelines, such as implementing Cloud Composer and CI/CD pipelines.

GCP Professional Data Engineer Certification Preparation Guide (Nov 2023)
→ Ingesting and processing the data

Module Topics

Planning the data pipelines
Building the pipelines
Deploying and operationalizing the pipelines

Planning the data pipelines

Data pipelines are software applications. They require planning and standardization to operate at peak efficiency. Understand and plan for the various requirements of data pipelining in GCP.

Topics Include:

Defining data sources and sinks
Defining data transformation logic
Networking fundamentals
Data encryption

Building the pipelines

After planning your pipeline you can beginning developing your solution. Identify and develop the core services your pipelines require while relying upon GCP core services to facilitate your efforts.

Topics Include:

Data cleansing
Identifying the services (e.g., Dataflow, Apache Beam, Dataproc, Cloud Data Fusion, BigQuery, Pub/Sub, Apache Spark, Hadoop ecosystem, and Apache Kafka)

Batch
Streaming (e.g., windowing, late arriving data)
Language
Ad hoc data ingestion (one-time or automated pipeline)

Data acquisition and import
Integrating with new data sources

Deploying and operationalizing the pipelines

Following development of your pipeline and components you should have a plan in place to effectively deploy and operationalize the pipeline. Leverage tools such as GCP Cloud Build to achieve a high frequency development architecture centered around a CI/CD DEVOPS platform.