When Server 2012 DeDuplication goes bad

This is a really weird one.

Our old SAN was really struggling with space despite having EMC DeDuplication switched on, so I commissioned a Server 2012 guest on some new SAN storage, and started serving files through this guest off a dynamic VHDX file (sat on another chunk of the same storage). Because our old SAN used NDMP backups, I had to use RoboCopy to migrate the data which is slow but did the job. The reason for the intermediate VHDX is simply for portability- we could pick it up off the SAN and dump it on any Windows server (even stand-alone), whereas obviously anything stored directly on the SAN would need to be connected up to servers on the iSCSI network. The VHDX file is connected to the guest on a SCSI bus as this enables hot-removal unlike the IDE buses.

We’d also been waiting for Backup Exec 2010 R3 SP3, as this was the first release of 2010 that at least enabled the agent to work on Server 2012.

This was all good so far; RoboCopy had- as expected- kept all the NTFS permissions correct so it was just a case of sharing the folders out. As both shares now also sat on the same volume, I though Windows Server 2012 DeDuplication could really get to work by DeDuping across the shares, which previously hadn’t been possible due to the configuration. The DeDupe process started slowly but I didn’t think anything of this, because the Celerra took ages to fully DeDupe all its volumes. I reasoned that this should be fine as it was the Hyper-V guest doing the DeDuplication; I presumed that the guest should see the VHDX as just another block of storage rather than trying to DeDuplicate the VHDX file from a host machine.

This is where it all went a bit strange. Nobody had complained about access speeds or anything, yet Backup Exec was taking absolutely ages to back up this volume (I did some rough maths and figured out it would actually never complete a full backup inside a week, compared to “just” taking 48 hours or so previously). Then it turned out that actually, it was doing full backups in less than an hour because it wasn’t actually doing full backups.

This is when it became apparent that there is a clash between the presentation layer of Server 2012 (what I would normally refer to as the Explorer shell, but this is Core), DeDuplication and dynamic VHDX files. The storage system within Windows knows how much actual data is on the volume because it reports x TB used, correctly. DeDuplication seems to go mad and actually un-DeDuplicate everything, so you end up with a space saving of 0bytes (from 11GB… so it’s gone backwards). And the… “explorer” shell goes further and actually loses all reason, reporting insanely small “size on disk” numbers for vast amounts of data (real-world example: 5.1TB of actual data takes up just 37GB on disk. Yeah, right). So to be fair, this is why Backup Exec is being so erratic- it’s asking Windows for the amount of data, and Windows replies that although there’s supposedly 5TB of data, it’s only taking up 37GB on disk.

The fix is currently unknown, as apparently even Microsoft haven’t encountered this one. I’ve stopped the actual DeDupe jobs (even though it’s still enabled against the disk) and set a RoboCopy job off to rehydrate (fingers crossed) everything onto yet another volume (this time pass-through to the new SAN) so that at least we can start getting valid backups. We’re passing information on to Microsoft to see if they can come up with anything, and have discovered that this is perfectly repeatable: enable DeDupe against another (live but non-backed up) dynamic VHDX and the same thing happens, Windows reports the actual amount of data as X but the size on disk as X-a-lot-of-space.

Update 02-NOV-2013: still unresolved. Found a few discrepancies in NTFS permissions but this doesn’t explain much. The data seems to rehydrate onto another volume, but that’s maybe the wrong word as “rehydrate” implies it was deduplicated in the first place, which it wasn’t really. Still, at least we can get some form of backup for now.


