Unify Integrations
Logo
Amazon Redshift as Source

Amazon Redshift as Source

Logo

3 minutes READ

UnifyApps enables seamless integration with Amazon Redshift as a source for your data pipelines. This article covers essential configuration elements and best practices for connecting to Redshift sources.

Overview

Amazon Redshift is a fully managed, petabyte-scale data warehouse service optimized for analytics workloads. UnifyApps provides native connectivity to extract data from Redshift clusters efficiently and securely, supporting both historical data loads and continuous data synchronization.

Connection Configuration

Frame 427319262.png
Frame 427319262.png
ParameterDescriptionExample
Connection Name*Descriptive identifier for your connection"Production Redshift Analytics"
Host Address*Redshift cluster endpoint"redshift-cluster.abc123xyz.us-east-1.redshift.amazonaws.com"
Port Number*Database port5439 (default)
User*Database username with read permissions"unify_reader"
Password*Authentication credentials"********"
Database Name*Name of your Redshift database"analytics_db"
Server Time Zone*Database server's timezone"UTC"
Connection*Method of connecting to the databaseDirect or Via SSH

To set up a Redshift source, navigate to the Connections section, click New Connection, and select Amazon Redshift. Fill in the parameters above based on your Redshift environment details.

Server Timezone Configuration

Frame 427319263.png
Frame 427319263.png

When adding objects from a Redshift source, you'll need to specify the database server's timezone. This setting is crucial for proper handling of date and time values.

  • In the Add Objects dialog, find the Server Time Zone setting

  • Select your Redshift server's timezone

This ensures all timestamp data is normalized to UTC during processing, maintaining consistency across your data pipeline.

Data Extraction Methods

UnifyApps uses specialized techniques for Redshift data extraction:

Historical Data (Initial Load)

For historical data, UnifyApps uses a snapshot-based approach:

  1. Data is extracted in chunks from Redshift tables

  2. These chunks are temporarily staged in Amazon S3

  3. The data is processed and loaded into the destination

  4. S3 staging objects are automatically cleaned up after successful processing

This method optimizes for Redshift's architecture and minimizes impact on your cluster performance.

Live Data (Incremental Updates)

Frame 427319274.png
Frame 427319274.png

For ongoing changes, UnifyApps implements cursor-based polling

You must select a cursor field (typically a timestamp or sequential ID)

  1. The pipeline tracks the highest value processed in each run

  2. Subsequent runs query only for records with cursor values higher than the last checkpoint

  3. Recommended cursor fields are datetime columns that track record modifications

Thumbnail

Note

Important: A validation error will be thrown if no cursor is configured when using "Historical and Live" or "Live Only" ingestion modes.

Frame 427319264.png
Frame 427319264.png

Supported Data Types for Cursor Fields

CategorySupported Cursor Types
NumericINTEGER, DECIMAL, SMALLINT, BIGINT
StringVARCHAR, CHAR (lexicographically ordered)
Date/TimeDATE, TIMESTAMP, TIMESTAMPTZ, TIME, TIMETZ

Ingestion Modes

ModeDescriptionBusiness Use CaseRequirements
Historical and LiveLoads all existing data and captures ongoing changesData warehouse migration with continuous synchronizationValid cursor field required
Live OnlyCaptures only new data from deployment forwardReal-time dashboard without historical contextValid cursor field required
Historical OnlyOne-time load of all existing dataPoint-in-time analytics or compliance snapshotNo cursor field needed

Choose the mode that aligns with your business requirements during pipeline configuration.

CRUD Operations Tracking

When tracking changes from Redshift sources:

OperationSupportNotes
Create✓ SupportedNew record insertions are detected
Read✓ SupportedData retrieval via full or incremental queries
Update✓ SupportedUpdates are detected as new inserts with the updated values
Delete✗ Not SupportedDelete operations cannot be detected
Thumbnail

Note

Due to Redshift's architecture, update operations appear as new inserts in the destination, and delete operations are not tracked. Consider this when designing your data synchronization strategy.

Supported Data Types

CategorySupported Types
IntegerSMALLINT, INTEGER, BIGINT
DecimalDECIMAL, REAL, DOUBLE PRECISION
Date/TimeDATE, TIME, TIMESTAMP, TIMESTAMPTZ, TIMETZ
StringCHAR, VARCHAR
BooleanBOOLEAN
OtherSUPER, VARBYTE

All standard Redshift data types are supported, enabling comprehensive data extraction from various analytics workloads.

Prerequisites and Permissions

To establish a successful Redshift source connection, ensure:

  • Access to an active Amazon Redshift cluster

  • Network connectivity between UnifyApps and your Redshift cluster

  • IAM permissions for S3 staging if in the same AWS account

  • Database user with the following minimum permissions:

    GRANT SELECT ON ALL TABLES IN SCHEMA public TO unify_reader;
    

Common Business Scenarios

Analytics Data Consolidation

  • Extract processed analytics data from Redshift

  • Combine with operational data for comprehensive reporting

  • Configure appropriate cursor fields on timestamp columns

Data Warehouse Migration

  • Extract schema and data from legacy Redshift clusters

  • Transform and optimize for new target platforms

  • Maintain continuous synchronization during migration

Business Intelligence Enhancement

  • Extract aggregated datasets from Redshift

  • Combine with real-time operational data

  • Create unified dashboards across data sources

Best Practices

CategoryRecommendations
PerformanceUse distribution key columns as chunking fields where possible Schedule extractions during off-peak hours Use column pruning to extract only necessary fields
Cursor SelectionPrefer timestamp columns with last modified information Ensure chosen cursor fields are indexed Use monotonically increasing fields for consistent ordering
S3 ConfigurationUse the same AWS region for Redshift and S3 staging Configure appropriate S3 bucket lifecycle policies Use VPC endpoints for enhanced security
Query OptimizationLimit large table extractions with appropriate WHERE clauses Avoid cursors on columns with many duplicate values Use sort key columns when possible for range queries

By properly configuring your Redshift source connections and following these guidelines, you can ensure reliable, efficient data extraction while meeting your business requirements for data timeliness, completeness, and compliance.