Loading navigation...

Overview

3 mins READ

Introduction

Deploying an data pipeline is a critical step in putting your data processing automations into action.

This article covers the deployment options available after the initial pipeline deployment and how to manage the pipeline's execution.

Pipeline States

Before diving into deployment options, it's important to understand the different states a pipeline can be in:

Draft State: The initial state of a pipeline that has never been deployed. All configurations and logic exist only in the development environment.
Unpublished Changes: This state occurs when modifications have been made to a previously deployed pipeline, but these changes haven't been deployed yet. The running pipeline still operates on the last deployed version.

Deployment Options

After a pipeline has been deployed for the first time, you have several options for subsequent deployments and management:

Deploy
- Function: Applies new changes to the pipeline.
- Behaviour:
  - Moves the pipeline from "Unpublished Changes" to "Deployed" state.
  - Resumes operation from the last checkpoint.
- Use Case: Implementing minor changes or updates that don't require full reprocessing.
- Example: Adjusting a transformation rule or adding a new field to the output.
Deploy & Restart
- Function: Applies changes and restarts the entire pipeline process.
- Behaviour:
  - Deploys any unpublished changes.
  - Clears all progress and begins processing from the start.
- Use Case: Major changes that affect historical data or require full reprocessing.
- Example: Changing the primary key of a data set or modifying core transformation logic.
Restart Currently Deployed
- Function: Restarts the existing deployed version without applying any new changes.
- Behaviour:
  - Keeps the current deployed version.
  - Clears progress and starts from the beginning.
- Use Case: Reprocessing data without any pipeline changes.
- Example: Rerunning the pipeline after source data has been updated or corrected.
  Note
  If there are no new changes in the pipeline, then only “Restart Currently Deployed” option is enabled.

Pipeline Execution Control

Pause/Resume Toggle

Function: Allows you to pause or resume the pipeline execution.
Pause: Temporarily stops the pipeline processing.
Resume: Continues pipeline execution from where it was paused.
States: There are three states of the toggle-
- Paused: This state is present when the pipeline is paused.
- Running: This state indicates that the pipeline is running and migrating the data from source to destination.
- Updating: This is the transient state between paused and running. Whenever the toggle is clicked or pipeline is deployed the state of the toggle becomes updating.
  Note
  If the pipeline is paused for a duration longer than log retention period, it can lead to permanent pipeline failure and the pipeline has to restarted in such a case.

Best Practices

Testing: Always test changes in a non-production environment before deploying.
Checkpoints: Utilise checkpoints to enable efficient resumption after pauses or failures.
Monitoring: Keep an eye on pipeline performance after deployments to catch any issues early.
Review Changes: Review all the change made in the pipeline in Audit logs before deploying to ensure data integrity.

FAQs

When should I use "Deploy" vs "Deploy & Restart"?

Use "Deploy" for minor changes that don't affect historical data.

Use "Deploy & Restart" for major changes or when you need to reprocess all data.

What happens to in-progress data when I pause a pipeline?

In-progress data is typically held in a temporary state. Upon resuming, the pipeline will continue from the last successfully processed checkpoint.

Can I rollback a deployment if I encounter issues?

While there's no explicit rollback option listed, you can revert back to the last known good state and “Deploy & Restart” the pipeline.

How often should I restart my pipeline?

Restart frequency depends on your data freshness requirements and the stability of your pipeline. Some pipelines run continuously with occasional restarts for maintenance, while others might restart daily or weekly.

Last updated : 10 September 2024, 01:10 PM