Just before Christmas 2018 a catastrophe occurred: due to a software glitch on my Windows 10 computer I lost access to all of my images and many other files when the directory of my network attached storage (NAS) unit was corrupted. It has taken a long while to get my files back to (almost) normality and it is a timely reminder to all NAS users and/or RAID users that comprehensive backups are just as essential as for locally attached hard drives.
Why Comprehensive RAID Backups Are Still Essential
Note to the ‘techies’ reading this: in a few areas I’ve simplified the explanation in order to reduce the length of the document and make it more digestible to the majority of Nikonians.
What is a NAS and a RAID?
First of all, let me explain what these two things are in simple terms:
A Network Attached Storage (NAS) device is simply a box containing a number of disk drives that is connected to a network rather than being plugged directly into your computer. This not only allows you access to far more storage than your desktop computer or laptop can hold, but also access from anywhere that you are connected to the internet. In most cases the extra time that it takes to access your files across the network is so minimal that you won’t notice the difference. A NAS is effectively a personal computer that doesn’t have the ‘human interface’ components such as video and sound cards, keyboard, mouse, etc.
A RAID, or Redundant Array of Independent Disks, is a way of providing protection in case of a disk failure, so that if a disk does fail, you can continue to access your data seamlessly. Once the bad disk is replaced, the NAS software will deal with bringing the new disk up to date. There are a number of possible RAID configurations but the safest, and most expensive, is where each disk is replicated on another so, for instance, a pair of 2TB disks are seen by your computer as a single 2TB unit: the NAS controls the data flow between the two disks without any end-user intervention.
- Having the disks in a NAS can also make your data management easier because it can present a number of disks to any users accessing it as if they were a single large drive.
My QNAP NAS drive is my primary drive for almost all of my data: documents, spreadsheets and around 100k images. It holds a set of four 3TB Western Digital “Red” hard drives configured in a RAID 10 array: that means that my computer sees a single 5.8TB drive mapped to drive letter “G” but within the NAS the drives are in two pairs, each being a duplicate of the other, so that if any one drive fails (or two, so long as they are not from the same pair) the computer continues to see all the data with no awareness of the failure. Simply plug in a new drive to replace the dead unit and the QNAP file system will bring the new drive up to date. Easy. 100% redundancy = 100% protection. Doesn’t it?
Backup Strategy Employed
Because of the fault tolerance within the RAID system I only backed up the data about once every 4-6 months to a pair of external hard drives which were then stored in an external building away from my house so that I have theft and fire protection. My local disks get backed up monthly (almost all of my non-system files, including the Windows standard folders of Desktop, Documents, Downloads, etc. are on a two-disk RAID within the system unit). All seemed satisfactory and had worked perfectly well for two years.
The Crash: What Happened?
When it was time for me to take another NAS backup, I plugged in my two external hard drives as usual and immediately noticed that Windows had assigned one of them (which was bootable, created by the Acronis True Image backup software) to Drive “G” – which should be my NAS. I immediately ejected the drive and checked my drive assignments. My NAS was still assigned to “G”, but when I looked at its contents I discovered that the external HDD had overwritten the directory on the NAS, leaving only a few hundred files showing. At this point I felt a tad cold and sweaty, and somewhat sick! Perhaps one reason that I’d never had this happen before is because there were always at least two spare drive letters before G, but as there were now new extra drives in the system unit, “G” was the next in line for that external hard drive – or so Windows assumed, clearly ignoring that fact that it was already in use.
I have not found any similar situation from my online searching, there is no doubt that the corruption was caused by the bootable external drive, because the files in the additional folders are predominantly “bootmgr.exe.mui” files. It was not the first time that this drive had been attached, so the cause is a mystery. No doubt it was a combination of events which may never happen again, but I do not feel inclined to try to replicate the problem!
The scenario which I had never anticipated was that if the file directory is corrupted then whole RAID becomes unusable and the only option is a full rebuild from backups, despite having 100% redundancy to provide fault tolerance within the RAID.
Step One: As the posters say “KEEP CALM AND DON’T PANIC”. That’s easier said than done. So, first off, turn off the NAS. I didn’t want to risk any further accidental loss, because even though the file directory was corrupted, I knew that the files themselves would still be there, but I’d need lots of time to think about what to do.
Step Two: Create a new Lightroom catalog. Although the Catalog was not on the NAS and creating a new one was not strictly necessary, I wanted to preserve its contents as at the time of the crash so that I could see later exactly what should be rebuilt on the NAS, since the Catalog would flag missing files.
Step Three: Develop a plan of action. Apart from emails and the Web, I didn’t use the PC at all for a few days while I got a clear head around what had actually happened and how I should recover everything. In the first 24 hours my head was not at all clear, but 40 years in the IT industry had taught me that when there’s a crisis it’s best not to make any knee-jerk recovery attempts as these can often lead to even more data loss.
Step Four: This was to document a full inventory of my ‘formal’ backups plus all other sources of data, such as Dropbox, memory cards, etc. I had backups of everything on the NAS as at Feb 2018 and also images as at June 2018 – all, that is, except my images from 2017. Why 2017 had been excluded from the backups is still a mystery. One thing I could be thankful of was that my Outlook email data files and the Lightroom catalog were on the PC-based RAID drive.
Step Five: Research RAID recovery. To my astonishment, the NAS manufacturer QNAP offered no assistance about possible recovery strategies, other that some discussion threads on its support forum. Once I started to research data recovery from a NAS/RAID configuration I discovered two key issues:
- Most data recovery software packages do not analyse the drives while still network attached, because that means access via the NAS file system only and thus the software can’t bypass the directory and look at the disk content directly.
- Most RAID configurations use a technique called ‘striping’ to increase performance, by writing alternate blocks of data to each disk in the RAID group, thereby speeding up writing and reading back. This means that the recovery software you use must be aware that it is dealing with a RAID since the data for every file of any size is split across more than one disk, with no disk holding the entire file.
From my research, the three recovery programs most likely to produce a good result were:
- DiskInternals RAID Recovery followed by NTFS Recovery
- PhotoRec Recovery
- ReclaiMe File Recovery
DiskInternals and ReclaiMe rely on the RAID disks being directly connected to the PC whereas PhotoRec is run on the NAS but controlled from the PC.
DiskInternals and ReclaiMe both have a ‘trial’ mode which completes the disk analysis then shows what files are found, claiming later recovery success once the license fee has been paid. PhotoRec is free and saves recovered files to another disk.
Step Six: Define a key recovery strategy, which was “Do not delete ANY data until after the NAS has been reconstructed and new backups have been taken” in order to ensure that I always had at least two copies of all my data. It was clear that to do so I would need lots of spare disk capacity, so two WD 6TB external drives and also four new disks for the NAS reconstruction, were all ordered.
1. DiskInternals RAID Recovery (v6.2) + DiskInternals NTFS Recovery
The RAID Recovery software is free and its purpose is to identify the technical configuration of the RAID and how it was distributed across the disks. These details are stored in an XMP file for input to the NTFS Recovery software. The whole process, including a full disk analysis, took nearly 24 hours and reported a significant number of files available for recovery, but only about 60% of the number that I expected, so I knew that this software would not produce a satisfactory outcome. The NTFS Recovery software licence costs $99.95.
2. PhotoRec Recovery
The rather more complex process required to run the free PhotoRec software on the NAS was well documented on the QNAP Forum by someone who had been through it before, though it assumed that you understood UNIX command formats. First it was necessary to download the PhotoRec software (which is part of the TestDisk package) onto an external drive, then connect this drive directly to the QNAP NAS. Using software called Putty running on the PC provides a “DOS box” style interface for running UNIX commands on the NAS. The recovered files are created on the external drive (or wherever specified) in a series of folders each holding between 500 and 750 files each, but there is no categorisation of which files are in which folder. Overall, PhotoRec did the best job of recovering files and finding the original file dates and tags, also taking about 24 hours to run.
The ReclaiMe package has a very intuitive interface which looks for all available disks on the system and asks for confirmation of which ones are to be analysed. Once this is done, everything is automatic. A good feature of the package is that it lists and totals recovered files by file type as well as showing a rebuilt directory (where that is possible) and it is possible to recover selected files while the recovery analysis is going on. Although you must pay the licence fee ($199.95) in order to recover any files, they do offer a 30-day 100% guarantee of a refund if the recovery is not considered satisfactory.
ReclaiMe did a very good job, but as with PhotoRec, it did not successfully recover many PSD or TIF files. I raised a fault ticket with them and within a few hours had a response suggesting that they connect to my computer using TeamViewer. This was scheduled for the following day and started at the agreed time. The issue concerning PSD and TIF files was identified as being due to them having been updated significantly since being created. The reason is as follows:
When data is written to disk it will be written to consecutive unused blocks if enough are available (which will generally be the case). If the file is subsequently updated and increased in size, it may need additional blocks, but if the adjacent block is already in use, the location of the new block is stored with the file’s details in the directory. Without the directory, the recovery software can identify the start of a file by recognising the file type’s “signature”, and it can continue to retrieve consecutive blocks of data until it recognises another file “signature” which tells it that it has reached the start of a new file. It cannot retrieve the entire file because it has no knowledge of the blocks that hold the rest of the file’s data. Because of this, none of the recovery software packages I’ve tried have been able to fully recover many of my PSD or TIF files as they have updated and, very often, have significantly increased in size since their creation. The files are recovered, but errors occur when opening them in Photoshop, typically “unexpected end of file”. In contrast, all NEF files appear to have been successfully recovered, since these are never updated by Lightroom or Photoshop.
As it was clear that no recovery software would be successful without the catalog, I raised a claim for a refund of my licence fee and to my surprise it was back into my account in just four days!
4. DiskInternals RAID Recovery (v6.4)
I discovered that there was a new version of this package which had the two parts combined, so I downloaded it to see if that would achieve a better recovery. Unfortunately this encountered an error immediately after starting up and would not progress further. I’ve raised it as an issue but it has yet to be resolved. This rebundled version is significantly more expensive than previously, starting at $249 USD.
Understanding The Recovered Data
Once the data had been recovered I then had to filter out that which I actually needed, but there are many obstacles that complicate that process:
- Because the file names are held in the directory (which has been trashed) the recovery software gives each file a ‘made up’ name such as “f00034586220198.jpg”. Fortunately, when files are originally written to the disk as a group (such as from a memory card upload) they will generally appear in a contiguous area of the disk and will therefore be processed sequentially by the recovery software, and thereby getting sequential numbers – which means that they can be sorted and re-filed as a group.
- As with the filenames, the Created and Modified dates are held in the directory, so with the exception of some formats which also hold the file name and metadata in the file itself, such as Excel and Word files, all file dates are set to the date of the recovery.
- Files that have previously been deleted can also be found by the analysis if they are still intact, so some files will appear more than once.
- JPEG thumbnails embedded within larger image files (such as NEF files) will also be found and ‘recovered’ giving you thousands of additional small (e.g. 150x100 pixels) JPEG images.
1) Some PSD and TIF files
This has been the area of major loss for me, not in numbers of files but in terms of work which I had done in Photoshop during the latter half of 2018. I have large JPG copies of the final images, but I cannot now go back to see how I had achieved some significant outcomes, especially for some very creative exhibition work.
Two images with lost Photoshop (PSD) files
2) XMP sidecar files
Not lost...just orphans, because the recovery software cannot pair up RAW files with their associated XMP files without the original file names. I had very few of these anyway, so I can live on happily without them.
3) The Unknown Unknowns
Almost certainly there are still non-image files that I created or updated during 2018 which I have yet to recover and will one day wonder “what happened to that file....?”. For that reason I shall keep all the files that PhotoRec recovered in perpetuity...just in case.
Rebuilding The NAS
Without the original catalog, there was no option other than to put four new disk drives into the NAS unit (or formatting the existing ones, but that was against my strategy) and set up again from scratch, creating a completely new RAID array ready to receive recovered files.
Once the NAS and RAID configuration were going again, I recovered the majority of the data from the Acronis backups I’d previously taken. Where there were gaps, particularly 2017 and the latter half of 2018, I had to create new folders to replicate what had been there before then drag and drop files from the restored folders and from various other locations where I still had copies of the data.
I then had to identify and find non-image files that had been created or updated in 2018 by looking individually through XLS, DOC, PDF and other formats to find the latest versions. Fortunately, PhotoRec found modify dates for many of these which meant that I could filter out much of the unwanted data, but by no means all.
In future I will not have the NAS assigned to a drive letter, but simply accessed via a network link/association. The downside of this is that some software does not recognise these links and so one must navigate to files via the ‘Network’ path, but I can live with that.
This was relatively straightforward: opening the pre-crash Catalog showed me what all the folder names were and every image was flagged with the “!” symbol to indicate that it couldn’t be located. Since Acronis restored most of the images from my backups to their original folders, Lightroom was immediately happy to apply the Catalog’s stored post-processing to them.
The outstanding issue that I’ll have to deal with is that although the PhotoRec recovered files are now located in the same place that Lightroom previously recorded, the original filenames have changed so Lightroom cannot apply the recorded post-processing to the new files. There are two methods of addressing this (i) compare the thumbnails seen in Lightroom with those of the recovered files to find matches, then renaming the “f000...” files to those which Lightroom understands, or (ii) ‘Remove’ the existing files from Lightroom then import (Add) the recovered files into Lightroom and apply new post-processing as and when required, keeping the new names.
Once I had Lightroom back again as at the point of the crash, I then added in the new work that I’d created under a different catalog. This merging process is very easy using Lightroom’s “Import from Another Catalog...” function contained in the File menu.
The last step of the process was to take new Acronis backups from the NAS and then tidy up all the ‘extra’ backups of data that I’d accumulated throughout the recovery process.
New Backup Strategy
With hindsight, I clearly needed to change my backup strategy to protect myself from such unusual glitches as well as hardware failures. A fellow Nikonian pointed out that I can attach an external hard drive to the QNAP NAS and get the QNAP system to initiate backups on a pre-determined basis, which also eliminates the workload on my PC as well as the network traffic while the backups take place. However, a new QNAP product was launched in the UK in March 2019, which is a 4-bay expansion unit that can be plugged into my existing NAS and can be configured in many ways, including as a RAID expansion, separate RAID system or just as individual disks. As I have acquired four 3TB HDDs as part of the recovery process, I shall re-use the original disks in the new expansion unit and schedule daily overnight incremental backups. Backups to the External drives will be taken monthly and stored offsite as before for fire and theft protection.
I’ve been lucky (relatively speaking, given in the circumstances) because I worked in IT all of my career and am therefore quite IT savvy. Despite that, it has been a long and stressful exercise to recover the data and rebuild my NAS. Quite expensive too. For the not so savvy it would probably be beyond their capability and lead to a much greater data loss (or significant expense for specialist support) so ensuring a comprehensive backup strategy is in place is vital.
Finally, I recommend to all of my fellow Nikonians that they should not simply wince at the pain such a catastrophe can exert but to learn from it too. There’s no telling what Windows will do to YOU next!
More articles that might interest you