| Home > Deduplication > News > Deduplication: the pros and cons end users are not always told | SNS Europe Data storage and IT management: Deduplication, data deduplication, deduplication software |
|
Deduplication is the process of examining a data set or byte stream and storing and/or sending only unique data; duplicate data is replaced with a pointer to the first occurrence of the data. Some IT professionals think that deduplication and Single Instance Store (SIS) are the same thing, but they are not. The key difference between the two is that SIS evaluates the data stream at the file level, so if a user renames a file, SIS will cause it to be seen as new and be stored again, whereas with deduplication, the entire internal contents of the file will be seen as duplicate. As a result SIS delivers less space savings. All deduplication journeys end with a significantly reduced amount of data on the disk but the ways they get there can differ greatly. The two prevailing methods are fixed-block length and variable-block length; with the latter the deduplication engine can change the block size and recognize more duplicate patterns thereby decreasing the amount of data stored and increasing the space savings. Inline and post process deduplication also offer different advantages and tradeoffs. With inline deduplication, data is deduplicated before being stored on disk; this approach does not require any additional disk space to store the data prior to deduplication but has the following tradeoffs:
With post-process deduplication, the backup is briefly placed on disk-based staging storage prior to being deduplicated; some technologies allow deduplication to start after a set amount of the data stream has been staged, reducing the sizing requirements for the staging storage while allowing the backups to complete as fast as possible. So although post-process deduplication requires additional disk space for the staging storage area, it enables faster backups, shrinking the backup window, it allows the non-deduplication of data that does not deduplicate well, and it offers faster restores. Where the deduplication occurs is just as: on the source/client or on the target/storage. Source-side deduplication typically uses a client-located deduplication engine that will check for duplicates against a centrally-located deduplication index, typically located on the backup server or media server; only unique blocks will be transmitted to the disk. The advantage of source-side deduplication is that it reduces network contention because less data is sent over it. However, by running source-side deduplication users are adding hashing, a processor-intensive algorithm, to the client. This means that clients that are already overloaded will become even more stretched possibly slowing down the backups and lengthening the backup window. Target-side deduplication is generally better suited for data-intensive environments and runs the deduplication at the storage level, removing the need to have clients with enough horsepower because the hashing occurs at the target. The trade-off is that more data is going to be sent over the network. Different vendors offer different solutions that mix and match the when and where: for example, one solution could do inline deduplication starting at the source, while others may do post-process starting at the target. A final criterion to review when evaluating deduplication technologies is deciding how long to retain data; the more the data that is examined, the greater the likelihood that duplicates are found and hence the greater the space savings. For example an initial full backup will only be deduplicated against itself but when the full backup for week 2 is performed, only the unique data that has been updated or added since week 1 will be stored. When deduplicating backups, each additional week of backup can be retained using a decreasing amount of additional disk space, allowing organisations to store even more backups on the existing amount of storage for a longer period, virtually eliminating the need to restore from offsite storage unless there is complete site failure. So, in summary, what should users consider when planning a deduplication strategy? Their goal(s) will influence the deduplication technologies they should evaluate. Following are some typical deduplication goals and considerations:
Deduplication can lead to significant savings in terms of time, human resources and of course budget. Although the technology continues to develop, there are several proven solutions already on the market today, and those organisations who choose the right products to meet their requirements will find that few storage technologies have made such a difference to their datacentres in the past.
ShareThis
Tags: Deduplication |
| Related News | ||||
|---|---|---|---|---|
|
||||
| Read more News » |
| Related Web Exclusives | ||||
|---|---|---|---|---|
|
||||
| Related Magazine Articles | |
|---|---|
|
|
| Related Supplements | ||||
|---|---|---|---|---|
|
||||
| White Paper Downloads |
|---|
|
Keep up to date with the latest industry products, services and technologies from the world's leading IT companies.
|
| Recruitment |
|---|
|
Latest IT jobs from leading companies.
|