Log In

Ingesting and processing the data Foxtrot Communications

Ingesting and processing the data

The first component of any pipeline is to ingest whichever data you are working with. GCP has a number of proprietary technologies to ensure consistency and high-performance throughout your stack.

Topics include:

GCP Professional Data Engineer Certification Preparation Guide (Nov 2023)
 → Ingesting and processing the data

Module Topics

Planning the data pipelines
Building the pipelines
Deploying and operationalizing the pipelines

Planning the data pipelines


Data pipelines are software applications. They require planning and standardization to operate at peak efficiency. Understand and plan for the various requirements of data pipelining in GCP.

Topics Include:
  • Defining data sources and sinks
  • Defining data transformation logic
  • Networking fundamentals
  • Data encryption

Building the pipelines


After planning your pipeline you can beginning developing your solution. Identify and develop the core services your pipelines require while relying upon GCP core services to facilitate your efforts.

Topics Include:
  • Data cleansing
  • Identifying the services (e.g., Dataflow, Apache Beam, Dataproc, Cloud Data Fusion, BigQuery, Pub/Sub, Apache Spark, Hadoop ecosystem, and Apache Kafka)

  • Transformations
    • Batch
    • Streaming (e.g., windowing, late arriving data)
    • Language
    • Ad hoc data ingestion (one-time or automated pipeline)

  • Data acquisition and import
  • Integrating with new data sources

Deploying and operationalizing the pipelines


Following development of your pipeline and components you should have a plan in place to effectively deploy and operationalize the pipeline. Leverage tools such as GCP Cloud Build to achieve a high frequency development architecture centered around a CI/CD DEVOPS platform.