Yep, I screwed up, I made assumptions, didn’t double and triple check things, and made a mess of something I was working on, professionally none the less. I did fix it, but it was a stupid screw up none the less. The irony is not lost on me about how I harp on about backups regularly to everyone.
The other day I ended up with one of those Oh!! SHIT moments, I was migrating an older 2012 R2 file server to 2016 and whilst I was doing this I decided to kick the old server that was due to return for the leasing company from the Failover Cluster it was in. As standard I paused the node, removed all the references from it and I then hit the evict function on the node to evict it from the cluster, and it was from this that it all went to shit, doing this whilst doing migrations, what the first part of my mistake. What happened the Failover Cluster borked itself and crashed on the remaining servers, and it would not restart, to this day I have not got it to restart.
After spending an hour or so trying to get the cluster to restart, I relented and went to the backups to restore the offending server. Hitting the backups I go to the server I want to store, and notice that its only 16GB WTF!!!! the server should be several TB in size (it is a file server after all).
Upon further investigation it seems that I was missreading the backup reports and the old server, which has the same name on the old Hyper-V Cluster as the new server does on the new implementation was not getting backed up, it was the new one. I misread the report and assumed that it was backing up the old server, mistake number 0 (this had been happening for the 6 weeks before the backup failure) and the old restore points being more than our retention limit were gone. Ok I will hit the long term off-site backups might take a while but the data is safe, well it was not or so it seems, the other technican at the offsite location had removed the offsite backups for the fileserver from the primary site. Why, because they were taking up too much space on that site’s primary backup disk (The storage at each site is partitioned to provide onsite backup for each site, with the second partition being the offsite backup for the other site).
Damn, so this copy of the data is the only one.
Ok so I killed the cluster server that everything was on, and using the old evicted node I rebuild a single node “cluster” and mounted the CSV, mounted the VHDX and everything appeared as it should. Whoo Hoo access to the data, well not so fast there buddy.
After moving some data an error popped up stating that the data was inaccessible, ok no problem loss of a single file is not a real issue. Then it popped up again, then again.. the second Oh! Shit! moment within several hours.
I recovered and moved the data I could access leaving me purely with data I couldn’t. I tried chkdsk and other tools and after several hours I took a break from it, needing to clear my mind.
Coming back to it later I looked at the error, looked at what was happening, and recalled seeing an article on another blog about Data Deduplication corrupting files on Server 2016. With this I began wondering if it had effected Server 2012 R2, then the lightning struck deduplication, this process leaves redirects in place and essentially has a database of files that it links to for the deduplication. The server the VHDX was mounted did not know about the Deduplication, the database or how to access it.
Up until now I had only mounted the Data VHD. Now I rebuilt the server utilising the original Operating System VHDX to run the server. I let it install the new devices and boot.
Upon the server booting I opened a file I could not access before, and it instantly popped onto my screen. Problem Solved
Note to remember If you are doing Data Recovery or trying to copy data from a VHDX (or other disk, virtual or physical) that was part of a deduplicated file server, you need to do it from the server due to the deduplication database. You may be able to import the database to another server, I really have no idea, and I am not going to try to find out.