New features for data analysts in Amazon Redshift

Every year, the Amazon Redshift team launches new and exciting features, and 2020 was no exception. New features to improve the data warehouse service, and add interoperability with other AWS services were rolling out all year.

I am on a team that uses Redshift to store tables from systems and usage data from our SaaS product. Amazon Redshift is our one source of truth. We use it to prepare reports that support the business, and ad hoc queries when numbers are needed quickly.

When AWS re:Invent comes around, I look forward to the new features, enhancements, and functionality that make things easier for analysts. If you haven’t tried Amazon Redshift in a while, or even if you’re a longtime user, these new capabilities are designed with analysts in mind to make it easier to analyse data at scale.

Amazon Redshift ML

The preview of Amazon Redshift ML lets data analysts use Amazon SageMaker over Redshift to solve business problems without the need for a data scientist to create custom models.

As a data analyst myself, this is one of the most interesting announcements to come out in re:Invent 2020. Analysts generally use SQL to query data and present insights, but they don’t often do data science too. Now there is no need to wait for a data scientist or learn a new language to create predictive models.

For what you need to get started see Create, train, and deploy machine learning models in Amazon Redshift using SQL with Amazon Redshift ML.

Federated queries

We often have to join datasets that aren’t in the same format or aren’t ready in the same place. By using federated queries to access data in other databases or S3, you don’t need to wait for an engineer or ETL process to move data.

re:Invent 2019 featured some interesting talks from Amazon Redshift customers who were tackling this problem. Now federated queries over Aurora PostgreSQL is generally available and Aurora MySQL is in preview, I’m excited to hear more.

For a step-by-step example see Build a Simplified ETL and Live Data Query Solution Using Redshift Federated Query.

SUPER data type

Another problem we face as analysts is that the data we need isn’t always in rows and columns. The new SUPER data type makes JSON data easy to use natively in Amazon Redshift with PartiQL.

PartiQL is an extension that helps analysts get up and running quickly with structured and semistructured data so you can unnest and query data. This is exciting for those who deal with data coming from applications that store data in JSON or unstructured formats.

For use cases and a quickstart, see Ingesting and querying semistructured data in Amazon Redshift (preview).

Partner console integration

The preview of the native console integration with partners announced will also make data analysis quicker and easier. Although analysts might not be doing the ETL work, this release makes it easier to move data from Salesforce, Google Analytics, and Facebook Ads into Amazon Redshift.

Matillion, Sisense, Segment, Etleap, and Fivetran are launch partners, with other partners coming soon. If you’re an Amazon Redshift partner and would like to integrate into the console, contact [email protected].

RA3 nodes with managed storage

Previously, when you added Amazon Redshift nodes to a cluster, both storage and compute were scaled up. This all changed with the 2019 announcement of RA3 nodes, which upgrade storage and compute independently.

In 2020, RA3.xlplus nodes were introduced, which offer more compute sizing options to address a broader set of workload requirements.

AQUA for Amazon Redshift

We want queries to run quickly so we can spend more time empowering the users of our insights and less time watching data return. AQUA, the Advanced Query Accelerator for Amazon Redshift tackles this problem at an infrastructure level by bringing data closer to the compute power.

This hardware-accelerated cache enables Amazon Redshift to run up to 10 times faster as it scales out and processes data in parallel. Each node accelerates compression, encryption, and data processing tasks like scans, aggregates, and filtering. Analysts should still try their best to write efficient code, but AQUA will speed up the return of results considerably.

AQUA is available on Amazon Redshift RA3 instances at no additional cost. To get started with AQUA, sign up for the preview.

Automated performance tuning

For analysts who haven’t used sort and distribution keys, the learning curve can be steep. A table created with the wrong keys can mean results take much longer to return.

Automatic table optimisation tackles this problem by using machine learning to select the best keys and tune the table design. Letting Amazon Redshift determine how to improve cluster performance reduces manual effort.

Summary

These are just some of the Amazon Redshift announcements made in 2020 to help analysts get query results faster. Some of these features help you get access to data, whether it’s in Amazon Redshift or somewhere else. Others are under-the-hood enhancements that make things run smoothly with less manual effort.

For more information about these announcements and a complete list of new features, see What’s New in Amazon Redshift.

Photo by Krivec Ales from Pexels

This post first appeared on AWS Big Data Blog