Data centers and the systems that power hyperscale cloud platforms represent the pinnacle of IT infrastructure design and implementation. They offer levels of scalability, reliability, and throughput beyond what any average business will ever need.
That said, enterprise IT teams, including storage administrators, have a lot to learn from Google, AWS, and other large public cloud providers. By applying certain hyperscale data center design principles, administrators can work toward more scalable, resilient, and automated computer storage systems.
Main similarities and differences
The two hyperscale cloud providers and corporate IT operators are struggling to cope with a data explosion. They also share similarities when it comes to spending. Every dollar counts for cloud operators and online service providers when building servers and storage systems; Seemingly small savings add up when multiplied across tens of thousands of systems. While businesses aren’t as cost conscious and are willing to pay more for products from a trusted vendor, no IT organization has money to waste.
To minimize operational costs – an essential aspect of optimizing IT costs – hyperscale cloud providers automate every task that does not require manual monitoring. The key to task automation is software that, in the context of cloud infrastructure, requires the replacement of function-specific hardware with extensible software that can run on standard servers.
These and other demands from hyperscale cloud operators have reshaped the server, networking and storage industries in several ways, including:
- new techniques for distributed redundancy and scalability;
- an emphasis on flexible hardware built from core components; and
- a concomitant transition from purpose-built appliances to software-defined services that run on standard, easily replaceable servers.
Once IT organizations and engineers embrace cloud ethics treat systems like cattle (as in the way farmers manage a herd), not pets (like the way we take care of pets), it transforms every IT department – whether it’s compute resources or storage pools – in software.
Implications for the design and storage of large-scale data centers
While there are similarities between traditional enterprise and public cloud infrastructure, the analogy is not perfect – which Google points out. in a blog on native cloud architectures.
For example, traditional architectures tend to involve expensive infrastructure that IT teams must manually manage and modify. These architectures also tend to have a small, fixed number of components. However, this type of traditional fixed infrastructure does not make sense for the public cloud, due to the pay-as-you-go model of the cloud; organizations can reduce costs if they reduce their infrastructure footprint. Public cloud resources also scale automatically. Therefore, some attributes of on-demand cloud services do not apply to private infrastructure.
However, IT teams can apply the following hyperscale data center design principles to optimize enterprise storage:
Adopt the software abstraction layer
Servers were the first infrastructure layer to be virtualized, with a software abstraction layer between physical hardware and logical resources. Virtual machines (VMs) have become the standard runtime environment for business applications. Over the past decade, software virtualization has spread throughout the data center as virtual machines have evolved into containers. Software-defined networking has spawned software-defined WAN, virtualization of network functions, and virtual network overlays. Software Defined Storage (SDS) has decoupled data storage devices from the control plane of information management and data placement.
The initial SDS platforms were designed for special uses, such as providing block volumes for VM instances and databases. Recent products have become format and protocol independent, capable of partition data on multiple nodes and present it as a logical volume, network file share, or object storage. To provide hardware flexibility, SDS also works with standard servers that have JBOD SSDs, hard drives, and built-in NVMe devices.
Create services, not infrastructure
By isolating resources from physical hardware, software abstraction layers provide the flexibility to mix and match hardware. They let teams bundle resources like services instead of raw infrastructure. To be inspired by hyperscale cloud providers, use SDS to provide object, file or volume service that not only includes capacity but also valuable ancillary features like backup, long term archiving , version management and QoS levels.
Providing services instead of infrastructure also provides flexibility in the design of the infrastructure and the packaging of associated services. It enables feature and performance upgrades without changing delivery and billing models. With Storage as a Service, administrators can also use servers and disks with different performance and cost characteristics to provide different levels of service, as well as spread data across multiple datacenters and regions for greater availablity.
Design for Automation
Replacing raw storage with software-defined data and information management services also makes it easier to automate tasks. This, in turn, reduces operating expenses, decreases provisioning time, and increases reliability. SDS enables programmatic control as it exposes a multitude of APIs for storage configuration, deployment, software updates, and user provisioning. To provide storage as a hyperscale cloud provider, use the APIs exposed by SDS products in automation and infrastructure platforms as code like Terraform, Ansible, SaltStack or VMware vRealize Automation, because it turns manual processes into programmable scripts.
Planning for failures
Servers and storage devices die regularly. For cloud providers with hundreds of thousands of servers and millions of disks, outages happen all the time. To embrace the philosophy of pet-free cattle, design for chess. Make sure that a dead drive or server is not damaging a storage volume or blob. A standard technique is to fragment files, blobs, or volumes into blocks that are replicated and spread across multiple disks, nodes, and data centers, using erasure coding, hashing, or similar algorithms. to ensure data integrity.
Some failures do not involve destruction of data, but rather corruption or loss of performance. Cloud operators continuously monitor such events and use notification systems and automated scripts to repair or mitigate the damage without manual intervention – and, hopefully, before users know it. Monitoring can also determine the extent of any corruption or failure, and route incoming storage requests to intact replicas and unaffected data centers.
Focus on scalability
IT has always struggled to meet demands for storage capacity. But today, accelerate data growth has created a crisis in many organizations. To create storage as a hyperscale cloud platform, design Moore’s Law type of growth. Administrators must be able to add storage nodes and JBOD arrays to expand scalable systems without disruption.
SDS is essential to such designs because it separates the control plane – volume, file and node management and configuration – from the data plane – storage nodes and arrays. Thus, adding capacity to a distributed system does not require removing and migrating a volume. Instead, IT staff can add nodes and allow the system to automatically redistribute data to new available capacity.
Unlike traditional SAN-based enterprise storage designs, hyperscale clouds don’t scale and consolidate, they scale and distribute. They also use surveillance telemetry and predictive machine learning algorithms to determine scaling profiles for capacity additions. The goal is to have enough capacity without wasting too much in the reserved space.
Remember that machines are substitutable
Compared to traditional storage systems, standard servers that run an SDS stack save money. Businesses can replace expensive, proprietary storage hardware with inexpensive basic servers. However, these same machines are substitutable, such as storing Lego blocks in a larger distributed system. Because each file or chunk of data is replicated to drives across multiple nodes, the failure of one or two systems does not affect an entire data volume. Machine interchangeability and data redundancy also allow IT staff to perform mass repairs or replacements at a convenient time, not like a reactionary fire drill.
To act as a cloud operator, IT organizations must be able to justify the number of systems required for scalable distributed designs. When you only have one dairy cow, it is impossible to treat Elsie like another animal in the herd.