Going All Flash, SAN-less and Staying Fault Tolerant

Benefitting from Flash Performance with Uninterrupted Service in a Virtualized Environment

Allon Cohen, PhD
Scott Harlin

Introduction

When server virtualization is added to an IT environment, application data is typically placed in a centrally reachable repository, such as a SAN, so that any application load can be run from any server in the data center through the use of virtual machines (VMs). Over the years, these external SAN repositories have been filled with hard disk drives (HDDs). Due to the fact that HDDs traditionally deliver between 200 and 350 input/output operations per second (IOPS) per drive, SAN’s had to include hundreds if not thousands of drives to keep up with the requirements of virtual server environments.

Figure 1
Figure 1: Concurrently running multiple virtual machines (VMs) in a virtualized environment will cause heavy randomization of data access towards the SAN

To complicate matters further, HDDs were designed primarily for handling sequential read and write data streams located on the same track. They cannot keep pace with randomized workload demands in virtualized environments as servers process concurrent data streams from multiple workloads that generate millions of IOPS. The resulting bandwidth from random read/ write storage requests create a system performance disparity in virtualized environments as external storage becomes the main bottleneck preventing servers from reaching their full potential. It is typical in today’s data center for thousands of VMs to compete for data from external storage arrays filled with slow HDDs that have trouble servicing this multitude of simultaneous requests. This bottleneck is depicted in Figure 1.

The solution in the past required IT departments to continue purchasing additional HDDs just to satisfy the server IOPS performance demands. As each SAN and its stockyard of HDDs continued to grow significantly more power and cooling was also required to keep the HDDs spinning, driving up data center total cost of ownership (TCO). As mechanical HDDs are prone to failure at random times, complex high availability (HA) schemes were also required to address individual HDD and storage array malfunctions. These schemes further increased the number of HDDs required to keep the system running and the HA software at the SAN layer resulted in higher CAPEX/OPEX costs to the data center.

In contrast, on-host flash memory storage, having no moving parts, handles random data access effortlessly making it a superior enabler of today’s virtualization requirements. A single flash-based SSD that fits directly into a server’s PCI Express (PCIe) bus can deliver random IOPS performance to VMs equivalent to large external SAN arrays deployed with thousands of HDDs. When flash is inside of the host, the solid-state technology becomes a local resource that is comparable to the IOPS performance requirements of servers and an excellent storage solution for a virtualized environment. However, deploying on-host flash in these environments requires care as choosing the wrong solution architecture results in negating critical virtualization services such as HA and Fault Tolerance (FT).

The purpose of this white paper is to present a deeper look at how OCZ Technology’s innovative virtualization solutions keep applications running at the speed of flash while at the same time provides uninterrupted services to end-users. Using these solutions and their ability to present flash as a highly available network resource sets the precedence for an all-silicon SAN-less data center that delivers all the benefits of virtualization without the need for costly back-end HDD SANs.

OCZ's On-Host Flash Solution

OCZ has developed hardware and software technology that integrate the power of flash acceleration with the power of sophisticated storage virtualization. This advanced approach not only moves data onto host-based flash to maximize performance and efficiently utilize host resources, but also provides the required storage services for continuous non-interrupted availability of enterprise applications even during total server failures.

To achieve this level of optimized virtualization, OCZ’s fourth generation Z-Drive R4 PCIe SSD provides a compact, performance-rich, power-efficient solid-state solution that plugs directly into a server’s PCIe bus. In conjunction with its proprietary Virtualized Controller Architecture™ (VCA) and OCZ’s VXL Caching and Virtualization Software, it can efficiently distribute randomized data between all available NAND cells locally on-host while delivering a complete virtual performance system that accelerates VM applications.

Figure 2
Figure 2: VXL Software monitors VM data requests to/from the external SAN keeping critical data in the Z-Drive R4 PCIe card on-host to reduce data traffic up to 90%

VXL Software creates a flash virtualization layer on top of the virtualization hypervisor (such as ESXi) enabling IT managers to dynamically deploy flash resources exactly to the needs of VMs when the VMs need them. It enables intelligent and efficient on-demand distribution of flash resources between all connected VMs so that the Z-Drive R4 PCI SSD can be virtualized as a highly available network resource to be shared amongst any VM in the cluster. The result is that no VM inefficiently occupies flash when it can be better used elsewhere in the environment. OCZ’s virtualized solution is depicted in Figure 2.

Since the data that resides on the Z-Drive flash card is virtualized by VXL Software, it can be shared by multiple servers and typically any entry point will be able to share the flash resources even though the Z-Drive is installed on a different server. This industry-leading approach provides the highest return on investment (ROI) in a virtualized environment where many VMs share the flash cache resource and often do not reach peak workload requirements at the same time.

VXL Software does not require agents or special drivers for caching VM data on the Z-Drive PCIe cards because it communicates directly with the virtualization hypervisor. This ’no agents‘ approach dramatically simplifies the deployment, management and maintenance of storage especially if there are hundreds or thousands of VMs in the virtualized environment. With Z-Drive R4 PCIe cards deployed at the host layer of virtual servers, VXL Software can run up to 10 times the number of VMs and can keep up with the random I/O requirements of all VMs in the cluster.

Understanding Critical Services In a Virtualized Environment

With a number of hazards associated with modern data center infrastructures, IT managers are also concerned as to whether the implemented flash-based SSDs are able to provide uninterrupted services to end-users as depicted in Figure 3. A combination of high-performance and HA/FT capabilities are critical requirements for SSDs in the enterprise.

Figure 3
Figure 3: Key resilient services that are critical to an application include High Availability, Fault Tolerance, End-to-end Mirroring, and vMotion

VM resiliency to failure is typically categorized into two levels: HA and FT. HA capabilities in a virtualized environment provides the first level of resiliency assuring that if a server containing flash resources fail, all VMs with stored data in the virtualized cluster can be rebooted to a new server so that processing can be continued with data intact at the time of failure. In this scenario, the flash virtualization layer is required to not only write data to a primary flash resource but also to a secondary flash resource that can be made available to the surviving server. Though HA capabilities provide full data access upon system failures, it does not enable continuous user operation. This capability is reserved for FT.

FT is one of the most demanding services in virtualized environments as it provides continuous non-interrupted availability of an application and its data even during total server failures. To achieve successful FT, two live identical copies of a VM (mapped down to the last bit) are required so that one copy can be an immediate backup for the other. Utilizing on-host flash, a solution that supports both synchronous mirroring between host servers and immediate connectivity failover is required to assure no downtime and no data loss during these critical failures.

Since HA and FT typically assume the existence of externally accessible storage, if the on-host flash-based SSDs are not deployed correctly in the data center, the server application in question may gain a performance boost but may lose the ability to deliver key HA/FT services critical to that application. IT managers must take into account these critical HA/FT requirements when considering SSD deployments.

Enabling Uninterrupted FT Services to End-Users

OCZ developed VXL Software as a flash virtualization and acceleration appliance that enables data to be replicated synchronously between two servers. It seamlessly integrates with the VMware server virtualization platform extending both HA and FT capabilities to on-host flash For example, consider the fault tolerant SAN-less environment depicted in Figure 4.

Figure 4
Figure 4: OCZ’s All-Flash, SAN-less, Fault Tolerant Data Center

The environment contains two servers running a VMware cluster at the speed of flash using the on-host Z-Drive cards virtualized by VXL Software without a back-end SAN. The servers run a fault tolerant VM whose memory image is replicated by VMware to a secondary shadow copy. At the same time that VXL Software is replicating the underlying data on flash, the data is exposed to the servers so that each server believes it is the same flash volume, but in actuality, two identical copies of that volume are maintained down to the last command. Though both VMs have identical flash volumes, one is live while the other is a synchronous mirror. The two VMs continue to run in parallel so that commands and new data written by one server are synchronously replicated to the other.

This OCZ hardware/software solution assures that performance will be maintained at accelerated levels even through a complete server failure. When the server continuing the FT VM fails, VMware will attempt to continue to service end-users through the shadow VM. To enable this failover, VMware will check that the shadow VM has access to all of the data that the original VM had access to. What VMware will find is that VXL Software has a copy of data running in a secondary server that is immediately available to the FT VM shadow.

VXL Software will then make sure that the data is up-to-date, down to the last write operation, and that all connectivity parameters are transferred to the surviving server. The fault tolerant VM from this secondary server is now activated and continues operation from exactly the point where the downed server stopped, providing flash speed to end-users as if no problem had occurred.

To provide this functionality, VXL Software deploys two recovery mechanisms concurrently synchronized to work with VMware’s FT capability. The first mechanism will immediately recognize that VXL Software residing on the failed server can no longer service VMs and the surviving VXL Software on the secondary server will take over operations. Any VMs that were serviced by VXL Software in the downed server will now have accessibility to data using this target failover mechanism.

Concurrent with the target failover, VXL Software will provide a flash mirror failover mechanism. Recognizing that the original host-based flash card is no longer accessible, the surviving VXL Software on the secondary server will immediately switch to using the mirror copy of the drive in the surviving server. This failover mechanism between mirrored flash SSD resources is completely transparent to VMs so that I/O access is not interrupted.

The ability to provide both mirrored replication between flash resources as well as transparent target failover is the underlying technology that enables the virtualized environment to function at the speed of flash while being completely fault tolerant.

The SAN-Less Environment

If the application needs to fullfill a large I/O load comprised of hot data, an external SAN may not be required as VXL Software with Z-Drive R4 PCIe SSDs could very easily do the job SAN-lessly. For example, a separate database volume within a database application can be created to function in the same way as a SAN except it will use local server-based storage.

In this scenario the VXL Software layer manages how to best utilize the Z-Drive SSD flash resource within the virtualization OS platform enabling storage to be localized within the host providing host-to-host data sharing versus host-to-SAN. Maximum performance is also achieved as the Z-Drive flash resource fits directly into the server’s PCIe bus for fast access to host CPU and memory.

Enabling uninterrupted FT services, the OCZ SAN-less solution provides the same level of resiliency as a traditional SAN but provides dramatically accelerated performance coupled with the TCO savings of not having to purchase and maintain a SAN for the environment. Leveraging VMware FT in combination with VXL flash virtualization and mirroring capabilities provides data protection and application availability when an unplanned server failure occurs. The result is no data loss to end-users and as soon as the VMware FT process realizes that the original VM is down, the application picks up right where it left off.

The result is the ultimate SAN replacement solution while eliminating bottlenecks associated by the SAN. For many environments that already have an investment in a SAN, OCZ’s all-flash SAN-less technology is an ideal IT refresh alternative. Instead of purchasing a new storage system, IT managers can leverage the existing one changing its focus from an HDD capacity-centric operation that stores older VMs and their legacy data. Using VXL Software and HA/FT support, the most active VMs can be moved to PCIe-based Z-Drive storage in their respective hosts through the VMware vMotion capability eliminating the long and tedious process associated with VM migration. The result is a significant performance boost to production VMs, while lessening the load on the legacy storage system.

Conclusion

The combination of OCZ’s VXL Software and Z-Drive R4 PCIe SSDs keep applications running at the speed of flash while providing uninterrupted services to end-users. The Z-Drive R4 cards are virtualized as highly available network resources creating an all-silicon SAN-less data center that delivers all the benefits of virtualization to eliminate the need for costly back-end HDD SANs. OCZ’s all-flash approach enables nearly instantaneous server responses (near zero latency) without additional costs associated for maintenance, power consumption or cooling. The result is a new all-silicon, extreme performance, fault tolerant, SAN-less data center as depicted in Figure 5 below.

Figure 5
Figure 5: OCZ’s all-silicon, fault tolerant, SAN-less solution

DOWNLOAD PRINTABLE VERSION (PDF)