Wednesday, November 4, 2009

Quick spin with ZFS dedup

I've had a quick look at deduplication in ZFS, it works as expected and seems quite fast for my simple tests.

Enable dedup couldn't be easier :
# zfs set dedup=on zdedup01

Simplest case, same file different name gives a dedup factor of 2:
# cp Solaris/sol-nv-b121-x86-dvd.iso /zdedup01
# cp Solaris/sol-nv-b121-x86-dvd.iso /zdedup01/duplicate.iso
# zfs list zdedup01
NAME USED AVAIL REFER MOUNTPOINT
zdedup01 6.91G 55.6G 6.90G /zdedup01
# zpool list zdedup01
NAME SIZE USED AVAIL CAP DEDUP HEALTH ALTROOT
zdedup01 63.5G 3.47G 60.0G 5% 2.00x ONLINE -
# ls -lh /zdedup01
total 6.9G
-rw-r--r-- 1 root root 3.5G 2009-11-04 22:52 duplicate.iso
-rw-r--r-- 1 root root 3.5G 2009-11-04 22:51 sol-nv-b121-x86-dvd.iso

ZFS dedup is block based, that is multiple blocks with the same checksum will point to a single block, so if the exact same data appears more than once but with different block alignment it won't get deduped.

Unarchive a tar-archive, here the block alignment will differ and therefor the checksums of the blocks and no dedup:
# cp sunsudio.tar /zdedup01
# cd /zdedup01
# tar xf sunstudio.tar
# zpool list zdedup01
NAME SIZE USED AVAIL CAP DEDUP HEALTH ALTROOT
zdedup01 63.5G 1.76G 61.7G 2% 1.00x ONLINE -

Empty files will give a quite nice dedup ratio:
# mkfile 5G testfile
# zpool list zdedup01
NAME SIZE USED AVAIL CAP DEDUP HEALTH ALTROOT
zdedup01 63.5G 1.73M 63.5G 0% 40960.00x ONLINE -

In practice it should give a ratio that is on pair with the actual duplication when dealing with ordinary files such as binaries, executables, application installations, zones etc. The ratio is harder to estimate with virtual server disk images (or iSCSI LUNs). A very quick test with two VirtualBox Solaris 10 U8 (core installation) images showed 35 percent saved disk space:
NAME SIZE USED AVAIL CAP DEDUP HEALTH ALTROOT
zdedup01 63.5G 984M 62.5G 1% 1.35x ONLINE -

Deduplications of course also works with compression enabled (checksums used for dedup is for compressed data):
# zfs get compressratio zdedup01
NAME PROPERTY VALUE SOURCE
zdedup01 compressratio 1.43x -
# zpool list zdedup01
NAME SIZE USED AVAIL CAP DEDUP HEALTH ALTROOT
zdedup01 63.5G 709M 62.8G 1% 1.25x ONLINE -

2 comments:

David McClelland said...

Good stuff - looks like it could be handy. Do you have any view on the performance impact of enabling this dedup in the OS layer?

Henkis said...

I'll wait until a preview of OpenSolaris 2010.03 based on build 128 is available before evaluating the performance of dedup. Perhaps in a later post ;)