
1) Search for zip files:
Why "zip" files?
Because OpenDocument files (ods, odt, odg, ...) are actually packed XMLs (and some other stuff) - but technically, they're plain .zip files.
In my case, I was looking for a text document (.odt in original).
2) Use "scalpel" to find the zips:
Great thing that GNU/Linux systems are equipped with top-notch professional forensic-tools, so my plan was to use "scalpel" to find all traces of ".zip" files on her disk.
2b) Enable the filetype "zip" in scalpel's config:
Edit the file /etc/scalpel/scalpel.conf and search for "zip". Then uncomment the zip line to look like this:
Code: Select all
#---------------------------------------------------------------------
# MISCELLANEOUS
#---------------------------------------------------------------------
#
zip y 10000000 PK\x03\x04 \x3c\xac
#
# java y 1000000 \xca\xfe\xba\xbe
#
3) Let scalpel search the disk:
Code: Select all
scalpel -v -o /path/to/results /dev/sdX
In my case it was a 500 GB disk - but it didn't take that long. Probably something around 2 hours or so.
4) Find the "good" files:
Since scalpel will carve out files which actually just looked like zips, but are junk, it's necessary to sort the good files out.
I've used "unzip" to check if the files were complete garbage or not.
In order to check several thousand files that scalpel carved out (> 20.000 in my case), I wrote a short bash script to sort all zips according to the error result code that unzip returned.
The script's a quick-n-dirty hack, but I'm sure you can adapt it to fit your needs.
What it does is:
- a) Run "unzip -l" (list) on each zip file
b) Take the return value of that execution (RESULT=$?)...
c) and if it's not '9' (=invalid file), copy the zip into a subfolder, according to the error code (usually 0, 1 or 2)
There were several odt files stored on that disk, but I was looking for a certain one that got lost.
IMPORTANT: You need to know a string that appears in the file you're looking for!
(hint: If you know the text-title headline of the document, you could use that)
6) Now, find the "right" file:
Unzip all the "valid" zips into subfolders for each file (you can use my script for that), and then use "grep" to search for the string:
Code: Select all
grep -lr "text I am looking for" *
That's it!
In my case, that narrowed >20.000 files down to 2 - and one of them was a match.
