I’ve just got back from my first AWS re:Invent, it was exciting and busy and I learned a whole lot. This post is about the key things I learned about new releases, and other companies experiences.
If you’d like to read about the experience of being there, getting through the crowds and surviving, check out My 2019 re:Invent: From A to Z
The gameplan
My strategy when approaching the re:Invent schedule was to prioritise the hands-on and keep sessions and keynotes for later. My favourite experiences at re:Invent were the workshops. Getting to talk to the engineers who are working on Redshift and Aurora was so valuable.
I’ve been tackling sessions I either got to in an overflow room or put in my playlist for later. These are the best of the best of data, analytics and leadership from re:Invent 2019.
New Athena functionality
The Lakehouse
Enhanced Redshift
Design for infinite
Resilient services
Biases in AI and ML
Streaming data for fraud detection
Innovation at speed
Streaming data in one year
Amazon and Oracle
New Athena functionality
ANT307 – Amazon Athena Deep dive
Technically announced before the conference, this “pre:Invent” announcement changes the game for Athena users.
With Athena Federated Query, users can run SQL queries across data stored in relational, non-relational, object, and custom data sources. Prebuilt connectors execute in Lambda and write the result to S3 for further analysis.
Read more:
- Slides
- Video with sample code and walk through
- Athena Documentation
- Blog post announcement
The Lakehouse
ANT335 – Scale data analytics w/ Amazon Redshift, ft. Warner Bros
The first of a series of talks on Redshift, the team from Warner Brothers showed how they converted their traditional architecture. This new functionality allows users to ingest data into Redshift or operate directly on an S3 lake. The best of both worlds.
The team talked us through how they wanted to take advantage of Redshift in combination with tools for overnight processing with Lambdas sending data between systems.
Read more:
- Slides
- Spectrum and Glue Workshop
- Redshift Federated Query
- Redshift Unload to Parquet
Enhanced Redshift
ANT320 – What’s new with Redshift, featuring Yelp
The team from Yelp showed how Redshift functionality and bringing compute closer to storage improves performance. They showed how small changes to architecture and Redshift makes a difference to performance.
Announced pre:Invent, Query Priority lets you assign a priority to each Redshift queue. We can now prioritise mission-critical work and deprioritise exploratory queries.
Read more:
- Redshift Workshop
- Spark-Redshift connector
- RA3 Uncoupling of Storage and Compute
- Auto WLM and Query Prioritisation Blog Post
- Auto WLM Documentation
Design for Infinite
FSI304 – Nasdaq: From Data Warehouse to Data Lake
The team from Nasdaq presented their Big Data architecture and the journey to their current state. The team run was hitting hard limits, long load times and unhappy users.
Moving to Athena worked in the short term but due to complex queries didn’t help with performance. They then engaged the AWS Data Lab for a custom solution. This highlighted quick fixes and recommended multiple Lakehouses.
- CTEs default to diststyle EVEN but it’s better to create temp tables to avoid broadcast joins.
- Use Sort Keys on the columns used in WHERE clause.
- Always compress analyse.
- Right size columns and SELECT only what you need.
Read more:
- Slides
- AWS Data Lab
Resilient services
DOP342 – Amazon’s approach to building resilient services
I attended this session as I had some time at the Venetian. It ended up being one of the best of the week and highlights how technology can’t succeed without culture.
The takeaways related to the need for operational accountability and creating a safe space for operations teams.
- Leadership must be connected to the team and not lose sight of the details
- Fix what isn’t broken rather than waiting until it is on fire.
- If all your Principal is doing is drawing diagrams on a whiteboard they are solving the wrong problems.
- Small teams with strong ownership over their services – no ops teams, no QA teams, everyone is hands-on.
Read more:
- Slides
Biases in AI and ML
WPT202 – Promoting fairness in AI/ML
This fireside chat was held to discuss how whatever you think about the world influences your model.
The discussion was lively and raised important talking points that everyone involved in building models should be aware of:
- Tech is developing so quickly that there are unintended consequences.
- It is very hard to develop fair algorithms if women and minorities don’t have a seat at the table.
- The more something is human like the higher standards we need to hold it to.
- There will always be confirmation bias as you always want to see it in a certain light.
- It’s also not clear what the absence of bias should look like.
- All of us need to be aware, it is not only for underrepresented people to solve.
Read more:
- Slides
Streaming data for fraud detection
ANT331 – AWS analytics enables fraud prevention for Sony’s PlayStation
The Sony team uses streaming technology to evaluate every purchase on their network and prevent fraudulent logins.
The final architecture took the best of batch and streaming processing to reduce time to resolve. A sign that batch processing isn’t dead yet.
Batch Processing:
- Batch processing processes all data
- REST API interacts with Kinesis Firehose to convert to parquet
- Glue and Spark aggregate data and persists in DynamoDB
Stream Processing:
- Adds a speed layer for temporary real time current state decisions
- Uses Kinesis Analytics and persists in DynamoDB
Read more:
- AI/ML Detection Workshop
Innovation at speed
ARC203 – Innovation at speed
In this talk Adrian Cockcroft, VP Cloud Architecture Strategy at AWS distils what he knows about leadership.
Not all of us can expect to operate like a tech giant but the takeaways were thought provoking.
- What is the incentive for an employee to become trained? – don’t train them for a brain drain.
- Move from Projects to Product teams – long term ownership, continuous delivery and DevOps reduces tech debt.
- Do small things quickly – less risk, faster repair, less time merging changes, faster flow, happier developers.
“If you want to build a ship, don’t drum up the men to gather wood, divide the work, and give orders. Instead, teach them to yearn for the vast and endless sea.”
Antoine de Saint-Exupéry
Read more:
- Slides
- 14 Leadership Principles from Amazon
- 7 Leadership Principles from Netflix
Streaming data in one year
ANT326 – Building a streaming data platform with Amazon Kinesis
GoDaddy went from architecting to BAU of a streaming data platform in just one year.
In this talk, they discuss the challenges when integrating Kinesis for integration with other infrastructure.
- Create APIs for products designed with business logic in mind.
- Learning how to recover from failure – how can you tell if records are missed?
- Using CDC on non-event driven data-stores.
- Start with new products rather than forcing change on existing systems.
Read more:
- Slides
- Realtime Data Platform Workshop
Amazon and Oracle
DAT359 – How Amazon.com migrated from Oracle to AWS databases
Amazon has been in the press recently to celebrated migrating of over 7500 databases to AWS. They also faced some of the issues I had when using the Schema Conversion Tool and Database Migration Service. This was comforting, it’s not just a ‘me’ problem. They’ve used this to build out playbooks to support those on a similar journey.
Read more:
- Slides
That sums up what I enjoyed and took away from my re:Invent experience. Some interesting learnings not only on the Data and Analytics world but team culture as well.
What was your favourite re:Invent session? Are there any more you think I should check out?
Photo by Lisa Fotios from Pexels
Comments are closed, but trackbacks and pingbacks are open.