Deploying and operationalizing the pipelines
Following development of your pipeline and components you should have a plan in place to effectively deploy and operationalize the pipeline. Leverage tools such as GCP Cloud Build to achieve a high frequency development architecture centered around a CI/CD DEVOPS platform.
GCP Professional Data Engineer Certification Preparation Guide (Nov 2023)
→ Ingesting and processing the data
→ Deploying and operationalizing the pipelines
Topic Contents
Job automation and orchestration (e.g., Cloud Composer and Workflows)CI/CD (Continuous Integration and Continuous Deployment)
Job automation and orchestration (e.g., Cloud Composer and Workflows)
Pipelines should be scalable, repeatable, and robust. Use pipelines and workloads to perform batch processing on a given chron schedule. Use GCP's Cloud Composer to orchestrate your data pipelines and manage individual task and workload executions.
Cloud Composer
Cloud Composer is GCP's fully managed Apache Airflow Service. It is hosted and configured in GCP and runs natively on Kubernetes Engine, giving Cloud Composer much greater flexibility compared to alternative offerings from other cloud providers.
Cloud Composer is packaged and ran in environments, which are collections of the assortment of necessary services which Cloud Composer requires. This includes a web server, database, bucket, Redis deployment and Kubernetes Engine Deployed in Autopilot configuration. Although it is technically possible to alter the cluster configurations in Cloud Composers Node Pools, it is not recommended and might break the environment.
Cloud Composer environments can be created via either the cloud console, gcloud cli, api, or terraform. There are a huge array of configuration options available. You can choose a specific version of Airflow to use, worker configurations, high resiliency, networking, web server access if you want a private deployment, environment variables, and data encryption standards.
The diagram below shows the intricacies of the Composer service. It isn't necessary to memorize this for the exam, but it is useful to know in general practice. Cloud Composer/Apache Airflow is a very popular service for data engineering in general and understanding how it works will be beneficial.
CI/CD (Continuous Integration and Continuous Deployment)
DevOps is an important consideration for most software applications, especially in the cloud, and data engineering is no different. Utilize tools such as Git, Cloud Code, and Cloud Build to create robust and high performing CI/CD pipelines which can ensure a high quality and high performing product.
CI/CD
GCP has a number of native services which facilitate and enable CI/CD DevOps practices on the cloud. Cloud Build and Cloud Deploy are tightly coupled within GCP and, quite often, you will use the service without even realizing it. For example, if you build a Cloud Composer instance, GCP will use the CI/CD process to build the Composer artifact and deploy it using Google Kubernetes Engine.
Cloud Build
Cloud Build is GCP's Serverless CI/CD pipeline service. It can be used to quickly and easily build complex software solutions in Google Cloud. It is especially helpful for Cloud Run and Cloud Functions which be developed, emulated, tested, and deployed right from your workstation. Cloud Build can be used to run test suites and manage container deployments to container registry. Cloud Build is very flexible and can be used across your entire cloud ecosystem, including hybrid and multi-cloud environments.
Cloud Build is a useful tool for building and automating scripts for building GCP infrastructure. It can be used along with tools like Terraform to automate your infrastructure.
Cloud Build can be connected to your repository to kick off a build process when code is committed or merged.
Artifact Repository
Artifact Repository can be used to house your software artifacts as part of your CI/CD process. Artifact Registry can be used to deploy to GKE, Cloud Build, Cloud Run, Compute Engine, or App Engine.
Cloud Deploy
Use Cloud Deploy to automatically create releases to deploy your assets as part of your CI/CD stack.