Takeaways from AWS re:Invent 2019

I’ve just got back from my first AWS re:Invent, it was exciting and busy and I learned a whole lot. This post is about the key things I learned about new releases, and other companies experiences.

If you’d like to read about the experience of being there, getting through the crowds and surviving, check out My 2019 re:Invent: From A to Z

The gameplan

My strategy when approaching the re:Invent schedule was to prioritise the hands-on and keep sessions and keynotes for later. My favourite experiences at re:Invent were the workshops. Getting to talk to the engineers who are working on Redshift and Aurora was so valuable.

I’ve been tackling sessions I either got to in an overflow room or put in my playlist for later. These are the best of the best of data, analytics and leadership from re:Invent 2019.

New Athena functionality
The Lakehouse
Enhanced Redshift
Design for infinite
Resilient services
Biases in AI and ML
Streaming data for fraud detection
Innovation at speed
Streaming data in one year
Amazon and Oracle

New Athena functionality

ANT307 – Amazon Athena Deep dive

Technically announced before the conference, this “pre:Invent” announcement changes the game for Athena users.

With Athena Federated Query, users can run SQL queries across data stored in relational, non-relational, object, and custom data sources. Prebuilt connectors execute in Lambda and write the result to S3 for further analysis.

The Lakehouse

ANT335 – Scale data analytics w/ Amazon Redshift, ft. Warner Bros

The first of a series of talks on Redshift, the team from Warner Brothers showed how they converted their traditional architecture. This new functionality allows users to ingest data into Redshift or operate directly on an S3 lake. The best of both worlds.

The team talked us through how they wanted to take advantage of Redshift in combination with tools for overnight processing with Lambdas sending data between systems.

Slides
Spectrum and Glue Workshop
Redshift Federated Query
Redshift Unload to Parquet

Enhanced Redshift

ANT320 – What’s new with Redshift, featuring Yelp

The team from Yelp showed how Redshift functionality and bringing compute closer to storage improves performance. They showed how small changes to architecture and Redshift makes a difference to performance.

Announced pre:Invent, Query Priority lets you assign a priority to each Redshift queue. We can now prioritise mission-critical work and deprioritise exploratory queries.

Redshift Workshop
Spark-Redshift connector
RA3 Uncoupling of Storage and Compute
Auto WLM and Query Prioritisation Blog Post
Auto WLM Documentation

Design for Infinite

FSI304 – Nasdaq: From Data Warehouse to Data Lake

The team from Nasdaq presented their Big Data architecture and the journey to their current state. The team run was hitting hard limits, long load times and unhappy users.

Moving to Athena worked in the short term but due to complex queries didn’t help with performance. They then engaged the AWS Data Lab for a custom solution. This highlighted quick fixes and recommended multiple Lakehouses.

CTEs default to diststyle EVEN but it’s better to create temp tables to avoid broadcast joins.
Use Sort Keys on the columns used in WHERE clause.
Always compress analyse.
Right size columns and SELECT only what you need.

Slides
AWS Data Lab

Resilient services

DOP342 – Amazon’s approach to building resilient services

I attended this session as I had some time at the Venetian. It ended up being one of the best of the week and highlights how technology can’t succeed without culture.

The takeaways related to the need for operational accountability and creating a safe space for operations teams.

Leadership must be connected to the team and not lose sight of the details
Fix what isn’t broken rather than waiting until it is on fire.
If all your Principal is doing is drawing diagrams on a whiteboard they are solving the wrong problems.
Small teams with strong ownership over their services – no ops teams, no QA teams, everyone is hands-on.

Slides

Biases in AI and ML

WPT202 – Promoting fairness in AI/ML

This fireside chat was held to discuss how whatever you think about the world influences your model.

The discussion was lively and raised important talking points that everyone involved in building models should be aware of:

Tech is developing so quickly that there are unintended consequences.
It is very hard to develop fair algorithms if women and minorities don’t have a seat at the table.
The more something is human like the higher standards we need to hold it to.
There will always be confirmation bias as you always want to see it in a certain light.
It’s also not clear what the absence of bias should look like.
All of us need to be aware, it is not only for underrepresented people to solve.

Slides

Streaming data for fraud detection

ANT331 – AWS analytics enables fraud prevention for Sony’s PlayStation

The Sony team uses streaming technology to evaluate every purchase on their network and prevent fraudulent logins.

The final architecture took the best of batch and streaming processing to reduce time to resolve. A sign that batch processing isn’t dead yet.

Batch Processing:

Batch processing processes all data
REST API interacts with Kinesis Firehose to convert to parquet
Glue and Spark aggregate data and persists in DynamoDB

Stream Processing:

Adds a speed layer for temporary real time current state decisions
Uses Kinesis Analytics and persists in DynamoDB

AI/ML Detection Workshop

Innovation at speed

ARC203 – Innovation at speed

In this talk Adrian Cockcroft, VP Cloud Architecture Strategy at AWS distils what he knows about leadership.

Not all of us can expect to operate like a tech giant but the takeaways were thought provoking.

What is the incentive for an employee to become trained? – don’t train them for a brain drain.
Move from Projects to Product teams – long term ownership, continuous delivery and DevOps reduces tech debt.
Do small things quickly – less risk, faster repair, less time merging changes, faster flow, happier developers.

“If you want to build a ship, don’t drum up the men to gather wood, divide the work, and give orders. Instead, teach them to yearn for the vast and endless sea.”
Antoine de Saint-Exupéry

Slides
14 Leadership Principles from Amazon
7 Leadership Principles from Netflix

Streaming data in one year

ANT326 – Building a streaming data platform with Amazon Kinesis

GoDaddy went from architecting to BAU of a streaming data platform in just one year.

In this talk, they discuss the challenges when integrating Kinesis for integration with other infrastructure.

Create APIs for products designed with business logic in mind.
Learning how to recover from failure – how can you tell if records are missed?
Using CDC on non-event driven data-stores.
Start with new products rather than forcing change on existing systems.

Slides
Realtime Data Platform Workshop

Amazon and Oracle

DAT359 – How Amazon.com migrated from Oracle to AWS databases

Amazon has been in the press recently to celebrated migrating of over 7500 databases to AWS. They also faced some of the issues I had when using the Schema Conversion Tool and Database Migration Service. This was comforting, it’s not just a ‘me’ problem. They’ve used this to build out playbooks to support those on a similar journey.

Slides

That sums up what I enjoyed and took away from my re:Invent experience. Some interesting learnings not only on the Data and Analytics world but team culture as well.

What was your favourite re:Invent session? Are there any more you think I should check out?

Photo by Lisa Fotios from Pexels

Takeaways from AWS re:Invent 2019

The gameplan

New Athena functionality

ANT307 – Amazon Athena Deep dive

Read more:

The Lakehouse

ANT335 – Scale data analytics w/ Amazon Redshift, ft. Warner Bros

Read more:

Enhanced Redshift

ANT320 – What’s new with Redshift, featuring Yelp

Read more:

Design for Infinite

FSI304 – Nasdaq: From Data Warehouse to Data Lake

Read more:

Resilient services

DOP342 – Amazon’s approach to building resilient services

Read more:

Biases in AI and ML

WPT202 – Promoting fairness in AI/ML

Read more:

Streaming data for fraud detection

ANT331 – AWS analytics enables fraud prevention for Sony’s PlayStation

Read more:

Innovation at speed

ARC203 – Innovation at speed

Read more:

Streaming data in one year

ANT326 – Building a streaming data platform with Amazon Kinesis

Read more:

Amazon and Oracle

DAT359 – How Amazon.com migrated from Oracle to AWS databases

Read more: