Final Round: Data Mobility and Solution Longevity
This post ends the series where I’ve been comparing appliance based deduplication with Data Domain vs. Simpana Native Duplication and explaining why I think native deduplication is the best approach. If you haven’t read the earlier posts in this series, they can be found here:
Backup data can exist for a long period of time, in some cases forever. It is also well known that the longer you retain data on deduplicated media, the greater your deduplication savings. These reasons make deduplicated media, such as is available with Data Domain, an excellent target for long term storage of backup and compliance archive data. However, this also means that the data management strategies we pick should be designed for the long term (5-7 years at least).
Hardware Is Interchangable…or Is It?
Migrating hardware has notoriously been painful. In my previous job as a Storage Administrator, we would spend great portions of the year just migrating applications from one storage system to another. With the invention of server and storage virtualization, much of that has changed in a very good way. Virtualization allows hardware to be more agnostic and easily swapped out as it ages or as vendors need to be replaced. In the backup world, where data may need to be retained for years, it is important that we have a solid, long-term strategy for dealing with these hardware refreshes or vendor changes.
The good news is that Data Domain does support replication between devices. An important feature is that it allows the data to be replicated—or in this case migrated—in deduplicated form. For this reason, it should be easy enough to replicate your backup data to the new device, switch over the backup application access paths and then decommission the old device. This migration/upgrade path works well as long as both you and EMC are committed to Data Domain in the long run.
Migrating deduplicated data off dedup appliances, however, is extremely difficult. Let’s say you have 100TB of data you have kept for 3 years of a 7 year retention. Fortunately, through deduplication you’ve achieved a 95% deduplication ratio and you are only storing 5TB. However, the maintenance on your Data Domain expires. Suddenly, *shock* you are no longer getting that great pricing you first received for a new appliance and EMC refuses to budge (I’m sure this never happens). Let’s also say that you’ve found another dedup appliance vendor or decided that the native dedup approach is much more affordable or unlocks valuable capabilities not available in the appliance approach. Moving to another approach would require a migration.
The first major hurdle in the migration is rehydration. Even though your data was deduplicated down to 5TB, you would have to migrate the entire logical set—100TB. This is because the deduplication occurs within the appliance so only Data Domain can replicate this data in deduplicated form. Also remember, the read speeds of deduplicated data are significantly slower than the backup throughput—so much so that they are not published. Such a migration, using impaired read speeds that must copy the full 100TB logical set across a backup infrastructure already loaded with non-deduplicated backup traffic, could take weeks. Imagine when it becomes petabytes…
This problem has lingered for years with tape. But, unlike disk, tape has no maintenance. So it was perfectly acceptable to store it in a cave until the retention expired and not worry about a migration. Disks fail and must to be replaced or you could lose the whole dataset. So you either pay maintenance for the remainder of the retention, go through the painful migration or pay the premium for the upgrade and stick with the Data Domain hardware. The more your data grows, the more difficult the migration becomes and the more “locked-in” you are to your vendor.
Native IS Future-Proof
Since the Simpana platform leverages any type of disk or tape to begin with, it makes it easy to migrate between vendors. In a similar scenario, lets say you were using Simpana Native Dedup on an EMC CX and wanted to migrate to an IBM XIV. Simpana, understanding the deduplication layout, would migrate only the 5TB of deduplicated data, making the migration much easier to handle.
Native dedup customers will be in a great position to let a variety of hardware vendors compete for the best price rather than being cornered and potentially faced with lose-lose decisions. Unlike deduplication appliances, migrating between hardware vendors also won’t require new training for Backup Admins, as Simpana Native Deduplication is agnostic to hardware change.
Simpana also natively supports deduplication to tape. This means multi-year compliance backup or archive sets can be deduplicated on tape and stored on that maintenance free media without a forced migration every few years. When it is time to migrate data from old media to new, Simpana makes this easy too with a media refresh capability built into the platform.
The final consideration for Simpana Native Dedup is around future capabilities. End-to-End deduplication is a core part of the Simpana platform. There are a lot of great capabilities in v9 that leverage deduplication and I’m sure many future capabilities will leverage the technology in clever new ways. By choosing an appliance approach today, you may limit your ability to take advantage of new features built on native deduplication as the platform continues to evolve.
Final Round Results:
As I’ve said before, Data Domain is a solid appliance and the question of this series wasn’t so much about whether it it a good approach as much as it was about whether Simpana Native Deduplication is a better approach. If you are on a legacy backup product today and you absolutely can’t stomach a change in software, then Data Domain might be a good fit. However, if you are dissatisfied that your backup software is not a technological leader or growing weary of the high cost of deduplication appliances, Simpana is definitely worth a look. It could save you a lot in both time and resources over the life of your backup data.
So, what do you think of the series? Have I missed the mark? Have I left anything out or misconstrued the facts? Does this give you something to think about or help your decision in any way? I’d be interested in hearing your thoughts.