Development

Amazon Redshift now supports Elastic resize

Amazon Redshift now supports Elastic resize

One of the major pain points for me with Amazon Redshift has always been the coupling between storage and compute. Competitors like Snowflake and Google’s BigQuery offer independent compute and storage, which means easier (and quicker) scaling in times of increased load. Redshift’s main drawback in the scalability sense has been that it can take up to 24 hours to resize your cluster (during which it’s in read-only mode), meaning there’s a lot of pressure to get your cluster configuration spot on before you go into production. Redshift’s provision of elasticity is just not up to par with most of Amazon’s other services. While Redshift Spectrum helps with this, it’s not a solution to the issue of scalability for an existing cluster.

In the lead up to re:Invent, Amazon last night dropped a load of really neat announcements (server-side encryption for DynamoDB as standard, SSE support for SNS), among which was the reveal of Elastic resize for Redshift. As an aside, if this is the stuff they’re announcing now, there should be some really nice announcements at re:Invent.

How does resizing work?

Traditionally, when you identified a need to resize your Redshift Data Warehouse, you’d have to plan in some maintenance time to carry out the resize operation. This can typically take anything between 1-24 hours, depending on your node type, volume of data, and other factors.

Under the “classic” model, Redshift switch your cluster into read-only mode and take a snapshot of your data. It’ll then go away and provision an entirely new cluster that meets your new spec, and start loading all your data in from the snapshot. Only once this load operation is complete, does Redshift point your cluster endpoints over to the new cluster and release its read-only hold. The old cluster then gets destroyed.

As you can imagine, this is a time consuming and disruptive process. Do you really want your Enterprise Data Warehouse to be unavailable for writes for up to a day? While there are workarounds, such as provisioning a new cluster yourself and creating a pseudo-replication process, these are typically heavy on effort and cost.

Elastic resizing

As they often do, Amazon have recognised the pain point and worked to remedy it. Elastic resizing (read more here) massively improves the resizing process by turning it into a mainly online resize, and reduces the period of disruption from <24 hours, to only a few minutes.

If you want to understand how this works under the hood, I thoroughly recommend you watch Amazon’s online tech talk on the subject, which details how they’ve achieved Elastic resizing: https://pages.awscloud.com/Best-Practices-for-Scaling-Amazon-Redshift_1111-ABD_OD.html

At a high level, they’ve developed a way whereby some of the slices of your cluster can be transferred to new nodes in a transparent manner, minimising disruption, and allowing the cluster to remain read/write capable throughout the majority of the process. There may be some minor disruption, including query cancellations etc. but I’ll take a few minutes over several hours any day.

There are some limitations of course:

  • You can only use Elastic resize to add/remove nodes, not change node type.
  • It only supports dc2 and ds2 node types. Anyone still running a dc1 cluster will have to upgrade. It’s worth doing this anyway for the free performance boost.
  • Single-node clusters aren’t supported (not that you’d be using single-node in production anyway).
  • It appears that you can only double or halve your cluster nodes. I suspect this is related to the way slices are allocated on disc. For example, if you’re running a 4-node ds2.xlarge cluster, you can Elastic resize to a 2-node or 8-node cluster.
  • There’s no sorting involved with an Elastic resize, so it can’t substitute for a vacuum operation, whereas a Classic resize can.

Summary

All-in-all, the introduction of the Elastic resize capability is a major plus for Redshift. While it doesn’t remove the coupled storage/compute setup, it does remove a major barrier in cluster scaling, and even opens doors to being able to scale up/down according to demand - a use case that has just never been really practical on Redshift until now.

Has anyone tried out Elastic resize so far? If so, let me know what you think of the capability and how this has impacted your business.

comments powered by Disqus