Storage Virtualization Showdown (Part 5): Data Protection

By Mike Richardson, on October 22nd, 2011

In this part of the Storage Virtualization Showdown series I will focus on Data Protection Capabilities between the 4 platforms. As datasets grow larger and become more difficult to protect and recover with traditional backup means, organizations will need to increasingly deploy array-based data protection. Array-based data protection, when implemented as part of a data protection strategy, can provide the nearest Recovery Points, shortest Recovery Times and do so with the least impact to production applications.

I’ll be focusing on the below capabilities in this post:

Synchronous Replication: A Disaster Recovery capability that enables remote locations to have exact, byte-for-byte copies of production data. Synchronous replication ensures zero data loss during site failures or other catastrophic events. Synchronous replication requires locally written data to also be remotely written before acknowledging the write to hosts. This does cause some performance overhead and limits distance at which remote copies can be replicated.
A-Synchronous Replication: A Disaster Recovery capability similar to synchronous replication, however A-Sync replication has no requirement to keep both copies byte-for-byte in-sync. This enables replication to occur across far greater distances than synchronous replication, but assumes some data will be lost in the event of a site loss. A-Sync replication can be near-real-time, only losing a few seconds or milliseconds of data, or scheduled to occur at hourly, daily or custom intervals—in which greater data loss could occur.

Clones: Rapid, low-impact, full copies of volumes. Clones can also be used as production replicas to facility reporting or testing of applications.
Snapshots: Instantaneous, logical copies of volumes at lower cost, because they do not require space for a full copy. Snapshots only require space for delta-changes, however some snapshot methods have performance overhead as changed blocks are copied-out.
Consistency Groups: Enable I/O consistency across multiple volumes during replication, clone or snapshot operations. This ensures large, multi-volumed applications are fully consistent and recoverable.

HDS VSP

The VSP supports both replication types through 2 different facilities. Hitachi TrueCopy is used for Synchronous while Hitachi Universal Replicator (HUR) is used for A-Sync. Cascading replication is supported by implementing TrueCopy for the 1st hop and HUR for the 2nd hop. Multi-Target replication can be achieved by implementing both TrueCopy and HUR off the source volume. VSP does not support native IP replication from the platform, so customers should plan for installing FCIP extension switches if they wish to replicate over IP.

VSP also supports Copy-On-Write (COW) snapshots as well as full volume clones (ShadowImage). Up to 9 ShadowImage clones can be created off a single volume and up to 64 COW snapshots. Other competing platforms allows significantly larger numbers of snapshots and clones, which weakens the VSPs value if you plan to maximize these technologies. HDS also has a more complex method of creating clones, while 9 total clones can be created, only 3 can be 1st Level clones. The additional 6 are established by making up to (2) 2nd level clones of the 1st Level clones. While ShadowImage clones are copied in the background with limited production performance impact, the deltas in COW snapshots are copied-out on demand, as new overwrites are written into the volume. For write-intensive applications, this type of snapshot processing can create noticeable performance overhead on the host applications.

Consistency Groups are supported for VSP replication as well as ShadowImage and Snapshot features.

IBM SVC

SVC also supports both replication types—Metro Mirror and Global Mirror. Metro Mirror is used for Sync replication within metro distances, while Global Mirror is used for A-Synch replication across a much wider geography. Cascading of replication relationships between clusters, as well as Multi-Target replication of volumes are supported by combining Metro and Global Mirror relationships. Like VSP, SVC also does not support native IP replication. FCIP Extension switches must be used.

The platform also supports, through the FlashCopy capability, both clones and snapshots of production data. Clones are created with standard FlashCopy, which establishes the cloned images in the background with limited production impact. FlashCopy also supports a “Thin” mode, accomplished by setting “background copy” to 0%, that does not create full clones, but uses Copy-On-Write mechanics to track changes as the production volume receives new write I/O. Like other Copy-On-Write snapshots, this can cause a performance impact on write intensive workloads. IBM SVC does support cascaded (clones of clones) FlashCopy, but—unlike VSP—this is not required to achieve the max 256 Flashcopy Targets. Flashcopy targets do count towards the max LUN limit in an SVC cluster, so if you plan on taking many snapshots, be aware that you may exhaust the LUN count (4096), before exhausting the cluster’s resources.

Consistency Groups are also supported with Metro Mirror and Global Mirror as well as FlashCopy.

NetApp V-Series

NetApp supports 3 modes of SnapMirror replication referred to as Sync, A-Sync and Semi-Sync. Synchronous SnapMirror, like other synchronous replication, maintains a byte-for-byte replica at the destination. Unlike the competitors A-Synch replication approaches that send new data to the destination copy as soon as it is received at the source, NetApp’s A-Synchronous replication leverages NetApp’s Snapshot capability to maintain consistency at scheduled intervals. These intervals can be as short as minutes or hours to days, weeks or months. While this mode carries with it the risk of additional data loss through a longer Recovery Point Objective, the replication method is very efficient and suitable for long-distances or low-bandwidth links. NetApp also offers Semi-Sync replication which operates much like other vendor’s A-Sync replication and offers Recovery Point Objectives of <10 seconds. NetApp supports native IP replication built into the controllers—and it is the preferred means of replication. Native Replication Compression is also supported which gives NetApp an edge in long-distance and low-bandwidth environments—especially when combined with deduplicated datasets—without requiring WAN accelerators. NetApp has the most flexible replication options with support for both multi-target, cascades and even cascades of cascades.

NetApp is well known for their unique approach to snapshots. The dynamic mapping of blocks eliminates the need for copy-on-write snapshots as implemented by the competition. Instead of copying-out the old block to a snapshot location, NetApp writes the new block in a new location and updates the virtualization layer references to point to the new location. This gives NetApp the ability to maintain dozens of snapshots (255 max per vol) with little to no performance overhead during writes. For this reason, full clones are less common on NetApp as they are not required due to the good performance of snapshots. For heavy I/O needs such as reporting and testing on clones, however, it is possible to create snapshots and clones of SnapMirror destination volumes which can serve this purpose without impacting production.

NetApp does have a consistency group capability that is natively leveraged when using the NetApp SnapManager data protection suite. This ensures snapshots are fully application consistent—even across many volumes and LUNs. However, NetApp doesn’t have a straight-forward mechanism to maintain consistency groups during replication. For sync replication, NetApp leverages write-ordering capabilities of the application to ensure consistency. A-Sync replication consistency groups are only supported if the SnapMirror updates are directed to use the SnapManager created consistency group snapshots. Semi-sync consistency groups cannot be performed while maintaining the <10 second Recovery Point Objective.

EMC VPLEX

VPLEX also supports both Synchronous and A-Synchronous replication through VPLEX Distributed Mirroring. RAID 1 Distributed volumes in a VPLEX Metro configuration perform synchronous “write-through” caching to ensure zero data loss in the event of a site failure for sites within a 5ms RTT latency. “Write-through” caching ensures data written by a host is present on both legs of the mirror before the acknowledgment is sent back to the host. In a VPLEX Geo Cluster A-sync replication, through the use of “write-back” caching, can be deployed over much greater distances. With “write-back” caching, host writes are mirrored in the cache of 2 VPLEX directors and an acknowledgement sent to the host, before the write I/O is forwarded to the destination mirror. Like all A-Sync mirroring, some data loss is to be expected in the event of a site failure. VPLEX is currently limited to only 2 clusters, which forbids native multi-hop or cascading replication. This, however, could be accomplished through external arrays. Like VSP and SVC, EMC requires FCIP extension switches for remote replication.

VPLEX uniquely supports the ability to treat distributed volumes as active/active—where application writes can be sent to either side of the mirror. VPLEX volume and cache coherency algorithms ensure write consistency while leveraging caching resources at each cluster to boost I/O performance. This enables distributed host clusters to failover between sites more seamlessly and enables the easier migration of active datasets between locations. Due to the latency requirements of VPLEX, most customers will not be able to leverage active/active capabilities outside of metro areas.

Like many other virtualization capabilities, VPLEX relies on external storage arrays for snapshot and clone capabilities. The complexity of coordinating snapshots through a VPLEX should be noted, however. Customers should also be sure to discuss supportability of application integrated snapshot suites (such as NetApp SnapManager) when devices are virtualized through VPLEX. Customers will most likely have difficulty in troubleshooting and receiving support for such multi-vendor, complex implementations.

VPLEX does support consistency groups for both Metro and Geo Clusters. Snapshot consistency groups will need to be coordinated by external array vendors.

Next Up

Data Protection and Disaster Recovery capabilities are an important part of an overall storage virtualization strategy. Be sure the vendor you select can meet your needs in the areas that are most important to your business. Feel free to share your experiences in the comments section or add additional commentary to my points above.

In the final post in this series, I’ll discuss the High Availability capabilities of each vendor and what behaviors you can expect during node and component failures.

Storage Virtualization Showdown (Part 5): Data Protection

Leave a Reply

About This Blog:

Recent Posts

Recommended Reading

Archives