Cockroach Labs, creators of the distributed SQL database CockroachDB, recently released a report that benchmarks raw system and database transaction performance on some of the big players in the cloud computing industry: Amazon AWS, Microsoft Azure, and Google GCP. While conducted with the needs of their database product in mind, the exhaustive benchmarks cover metrics that apply to virtually any application including CPU, network throughput and latency, storage IO, and OLTP benchmark derived from TPC-C. Cockroach Labs has been publishing cloud benchmarking results since 2017. What catches one’s eye immediately regarding the 2021 test rankings, is that AWS occupies 7 out of the 12 last place rankings (vs 3 for Azure and 2 for GCP). AWS is almost absent in the second place rankings, and Azure has the fewest first place rankings. As pointed out in the report, GCP outperformed all competitors, but it’s not that simple of course. Let’s dig a little deeper.
The report consists of 12 benchmarks, some of which may not be terribly meaningful for typical use cases. The problem with examining the benchmark blindly is that equal value is given to each test, which skews the results in unfavorable ways when considering a cloud platform for a specific workload. For example, one test considers single core CPU performance, which has dubious value for many workloads. Another example are the TPM measurements, which are of particular value to database workloads, but less for applications like web servers and machine learning.
Another consideration is the margin of victory. Narrow margins of victory have dubious value when comparing platforms, but wider margins in individual tests should be considered seriously, especially if they are showing a year over year trend. For those selecting a single public cloud platform for a multi-year investment, being committed to a declining platform can have serious consequences.
Only 1 core and 16 core performance are considered for CPU performance. In the single CPU measure, GCP outpaces the other contenders by almost 9%. If you have a rare single core use case, 9% is a considerable margin of victory. For the 16 core measure, AWS wins out by about 5% over the other two. Most will not notice a 5% difference, especially considering the inherent uncertainty in benchmarks, but if you have an extraordinarily performance sensitive application, perhaps 5% is a lot. What is perhaps more significant is the movement of AWS to the top of the heap, previously occupied by Azure in 2020. This is due to their introduction of Graviton2 processors.
The report covered network throughput and latency. And as in previous years, GCP completely dominated the throughput benchmark. GCP has nearly 3x the network throughput of AWS and Azure. Counterintuitively however, AWS beats the others by about 30% when latency is measured. Both GCP’s throughput performance and AWS’s latency performance are far too significant to be ignored, but what does it mean since latency and throughput are not independent variables?
The report gives a potential answer when discussing the variability of latency measures. They tend to be highly dependent on instance placement, and showed great variability even between runs on the same provider. Consider that the tests were run on cloud providers under “real world” conditions. This means that any results (not just networking), have the potential to be greatly affected by workloads being run by the myriad customers at any time. From a high level, the “why” doesn’t really matter, just the performance. What it may mean is that the results may be measuring the level of load on a given cloud/AZ rather than it’s underlying infrastructure.
Considering the primacy of storage to a database vendor, the majority of tests in the report measure storage performance. The report compares read and write IOPS, latency and throughput. The report focuses mainly on network attached storage, although does address local SSDs. In general, AWS brings up the rear in storage performance, coming in last in 5 out of six tests. Azure wins in IOPS by a little over GCP, but GCP wins in throughput by a large margin.
As you might expect from the networking throughput results, GCP significantly outperformed (especially AWS) the competition in the storage read/write throughput tests. It beat Azure by about 20% and more than doubled the performance of AWS. Read/write latency was a mixed bag, with no obvious winner. IOPS (I/O Operations per Second) was won by GCP by about 10% over GCP, with AWS losing by over 25%.
In summary IOPS dominated workloads (lots of small reads and writes) will see a benefit from Azure, where throughput dependent workloads (less chatty, large reads and writes) will see a significant boost from GCP.
OLTP (On-Line Transaction Performance) a high level measure of database per,formance under real world workloads like shopping carts and financial applications. Due to limitations in the benchmark itself, it essentially ignored the benefits of fast storage and emphasized fast CPUs and memory (RAM) size.
As far as cost/transaction measurement is concerned, there was a virtual dead heat between all providers with AWS winning by a hair. Unsurprisingly, GCP won in terms of transaction throughput, and that was even considering how little the benchmark stressed storage throughput.
GCP puts on an impressive display with it’s overall competitiveness, and dominance in throughput. GCP also delivered great IOPS value with its general purpose disk losing to Azure’s UltraDisk by only 5%. AWS lights up the CPU benchmark with its Graviton-2 processors and provides good $/OLTP value, but comes in a disappointing last in 7 of the 12 tests. Azure rests solidly in the middle of the pack, competitive with GCP (except in throughput), and handily beating AWS.