Enabling compression in Redshift COPY command

Amazon Redshift is a powerful, fully managed data warehouse that supports fast and scalable analytics. As businesses handle ever-increasing volumes of data, efficient storage and performance optimization become critical. One of the key ways to achieve this is by using compression (encoding) when loading data into Redshift using the COPY command.

In this blog, we'll explore how to enable compression in the Redshift COPY command, why it’s important, and the best practices to follow for optimal performance.


🔍 What Is Compression in Redshift?

In Amazon Redshift, compression (also called column encoding) reduces the amount of disk space required to store data. It improves query performance by reducing I/O, since less data needs to be read from disk. Redshift supports several compression types such as LZO, Zstandard, BYTEDICT, and DELTA.

By default, Redshift automatically applies compression when you use the COPY command with an empty table that has no compression encoding defined. This allows Redshift to analyze sample data and choose the best compression algorithm for each column.


🛠️ How to Enable Compression in COPY Command

To enable compression in Redshift during a COPY operation, follow these steps:


Option 1: Let Redshift Auto-Apply Compression

This is the most commonly used and recommended method.

Steps:

Create your target table without any column encodings.

Run the COPY command.

Redshift analyzes the incoming data sample and applies the most optimal compression for each column.

Example:

sql

Copy

Edit

CREATE TABLE sales (

    sale_id INT,

    product_id INT,

    quantity INT,

    sale_date DATE

);


COPY sales

FROM 's3://your-bucket/sales_data.csv'

CREDENTIALS 'aws_access_key_id=...;aws_secret_access_key=...'

CSV;

After the data is loaded, Redshift automatically assigns compression types based on the column data.


To check the assigned encodings:


sql

Copy

Edit

SELECT "column", type, encoding

FROM pg_table_def

WHERE tablename = 'sales';

Option 2: Manually Define Compression Types

If you want more control or are loading into an existing table, you can manually specify compression encodings when creating the table.


Example:


sql

Copy

Edit

CREATE TABLE sales (

    sale_id INT ENCODE BYTEDICT,

    product_id INT ENCODE AZ64,

    quantity INT ENCODE DELTA,

    sale_date DATE ENCODE LZO

);

Then run your COPY command as usual.


This approach is ideal when you already know your data patterns or want consistency across environments.


⚙️ Best Practices for Compression in Redshift

Use ANALYZE COMPRESSION to Optimize Existing Tables

If your table already contains data, run:


sql

Copy

Edit

ANALYZE COMPRESSION sales;

This recommends optimal compression for each column.


Use Columnar Data Formats

When loading from S3, prefer formats like Parquet or ORC. These already use compression and are columnar, which improves COPY performance.


Compress Data at Source

Redshift’s COPY command supports loading compressed files (.gz, .bz2, .zstd). This reduces S3 transfer time and speeds up the load:


sql

Copy

Edit

COPY sales

FROM 's3://bucket/data.csv.gz'

GZIP

CSV;

Avoid Overuse of BYTEDICT

BYTEDICT is great for low-cardinality columns but can degrade performance if misused.


Monitor Storage and Vacuum Regularly

Compression helps save space, but regularly VACUUM your tables to maintain performance.


🧠 Conclusion

Enabling compression in the Redshift COPY command is a powerful way to optimize storage and boost query performance. Whether you let Redshift automatically choose encodings or define them manually, the key is to understand your data and apply the best-fit strategy. 

Learn AWS Data Engineer Training

Read More: Writing custom job bookmarks in AWS Glue
Read More: Integrating Glue with Apache Airflow on MWAA
Read More: Data pipeline blueprint for e-commerce analytics

Visit IHUB Training Institute Hyderabad
Get Direction

Comments

Popular posts from this blog

How to Use Tosca's Test Configuration Parameters

Creating a Test Execution Report with Charts in Playwright

Installing Java and Eclipse IDE for Selenium Automation