Advanced Computing in the Age of AI | Thursday, March 28, 2024

Researchers Target Cloud Cost, Performance 

When it comes to determining the best cloud provider, the answer depends on many variables. A team of researchers from the Polytechnic School at the University of São Paulo is helping to increase the accuracy of the decision-making process by analyzing different aspects of cloud costs and performance.

Cloud computing, with an emphasis on elasticity, availability, and scalability, is rapidly becoming an acceptable IT platform. The number of cloud service providers has expanded to keep pace with demand, assisted by falling hardware prices and accessible virtualization technologies. While the increase of available options should provide customers with better performance and more competitive prices, would-be cloud adopters are now faced with the task of comparing all these competing offerings. For the average customer, this endeavor can be met with some degree of trepidation. Adding to the confusion, cloud is a catch-all term and a pretty fuzzy one at that.

Cloud providers Amazon Web Services and Rackspace Hosting, say the Polytechnic School researchers, offer sufficiently high I/O performance to make them viable candidates for a range of enterprise and HPC workloads. But how does one go about selecting from the growing menu of instance and storage types? And for that matter how does vendor A's small instance compare with vendor B's small instance? Their paper, Experiences Applying Performance Evaluation to Select a Cloud Provider, appears in the just-published Recent Advances in Computer Engineering, Communications and Information Technology and seeks to answer these and other questions.

The authors set about analyzing CPU, memory, and I/O performance in Amazon EC2, and I/O performance in Rackspace. They note that these characteristics are directly relevant to an application's characteristics, i.e., whether it is CPU bound, memory bound, or I/O bound. Two types of instances (small and large) and two types of storage (disk and block) are evaluated for each cloud provider. The SSD storage type is also analyzed. The IOzone benchmark is used to evaluate the storage types using sequential (write and rewrite) and random (write and read) access modes with synchronous operations. The SPEC benchmark is applied to the CPU and the NPB benchmark to the memory.

Amazon EC2 and Rackspace offer a large variety of instance types without an explicit hardware relationship. This is one of those sticking points that makes vendor-to-vendor comparisons difficult in the infrastructure cloud space. Out of all the available instance types, the team selected two distinct groups, small and large. The small group has simple hardware (one virtual CPU, or vCPU), a small amount of disk storage and prices are pretty equivalent between the two providers. For the large group, the team selected the second best instance of each provider. The Amazon instance types are called ec2.small, ec2.large; for Rackspace they are rack.small and rack.large. Refer to Table 1 for more details.

cloud evaluation table1 The paper provides a thorough overview of the different types of storage offered by both providers, including disk, block, and object storage options. For the performance evaluations, the team selected ec2.root, ec2.stand, ec2.iops, ec2.ssd, rack.local, rack.sata and rack.ssd storages – as outlined in Table 3. Object storage is not explored in this research.

cloud evaluation table3

Amazon experiments were performed in the us-east-1 region (North Virginia) and Rackspace experiments took place in the Chicago (ORD) region. The CentOS 6.4 variant of Linux was used for all experiments.

The experiments led to some interesting discoveries. For small instances, rack.local achieved the best performance for both sequential and random write and rewrite options and it was also the cheapest configuration. For write operations, rack.local obtained a 197 percent performance improvement over the second best provider configuration, i.e., rack.ssd. The best Amazon configuration (ec2.iops) was 358 percent worse than the worst Rackspace configuration (rack.sata).

"These results in the small instances can be explained, despite the fact that Amazon EC2 instances have more memory than Rackspace instances (1.7GB versus 1GB)," the authors write. "The best results in Rackspace are probably due to the better memory and disk configuration (higher bandwidth between memory and disk, numbers of channels, disk technology and cache memory, among others)."

This leads to an important point about the challenge of accurately evaluating virtual machines with unspecified underlying hardware.

"It is difficult to accurately report why this happens because the information about real resources in Cloud Computing is 'cloudy.' It is not easy to get accurate information about the hardware. The hardware virtualization hides the real information about the hardware, e.g., number of memory channels or I/O bus," the team observes.

The small instance Amazon IOPS volume had the best performance for random reads. For large instances, Amazon EC2 with local SSD provided the best performance for write, rewrite, and random write operations. As for random read operations, all the configurations returned similar results, making Rackspace with local disk the most cost-effective option.

In analyzing the experimental results from running from SPEC CPU and NPB benchmarks on small, medium, and large EC2 instances, the team came upon a surprising finding: the best performing instance was cpu.mem.medium. The medium instance had an 89 percent performance improvement for the BT Benchmark and 24 percent improvement for the SPEC CPU benchmark, compared with the cpu.mem.large instance. The team hypothesized that the seeming anomaly was caused by the noisy neighbor affect – other virtual machines running on the same physical host affecting resources. By performing additional tests, the team verified their interference hypothesis. As the experience highlights, the selection of an appropriate cloud provider is more complex than simply going with the more powerful or expensive one.

Along similar lines, computation doesn't necessarily scale proportionately with price as another experiment showed. Using a Perl Script as a benchmark, the team found that the computational difference between the cpu.mem.small and cpu.mem.medium instance was about 54 percent, while the cost increased 100 percent. Furthermore, the computational difference between the cpu.mem.medium and cpu.mem.large was only 24 percent, despite another doubling of cost. This serves as a reminder to the user to consider how much better an application will perform when executing in a larger instance and whether the increased cost is justified.

In the final analysis, the research reaffirms that there are no easy answers and no single best cloud provider. The study did, however, generate some interesting findings, which are summarized as follows:

1. It's not always the most expensive instance that has the best performance.

2. External volumes can perform better than local disks and the network may not be the system bottleneck.

3. Local disk can offer better cost-benefit than more expensive storage.

4. The number of threads can improve the performance in local or external SSD volume.

5. The same instance type can offer different performance depending on the selected datacenter location.

When it comes to making the right choice about a cloud service provider, the team maintains that "a deep knowledge of the application behavior and the technologies used by cloud provider are the main keys."

About the author: Tiffany Trader

With over a decade’s experience covering the HPC space, Tiffany Trader is one of the preeminent voices reporting on advanced scale computing today.

EnterpriseAI