[Geek] The human Perl interpreter
Jan. 16th, 2004 11:04 pmRemember the Perl question from a while back?
I've cracked it!
This is long and tedious but it might be useful to some.
I tried the tool I got off CIX but it died, repeatedly, trying to parse my CIX messagebase. I know there was some corruption hidden in there but with quarter of a million messages going back to the mid-80s across ¾ of a gigabyte of stuff it was hard to find.
So I decided to have a look myself.
I exported a small mail folder to a CIX scratchpad and opened it side-by-side with a Thunderbird MBOX-format mail file.
The Thunderbird "From " header is just a date & it's always the same one, so that's clearly not used. So, in principle, I thought, all I need to do is replace CIX's
This was accomplished with a smart search & replace in WinWord. I bunged the file into Thunderbird's Profile directory in an appropriate place and fired 'er up.
Well, it found the file, added it as a folder, and found all the messages... but not quite right. There's an extra delimiter in there, reading
or
The result of the malformed file was that it had the messages all dated 00:00:00 1/1/1970 with no sender/recipient/subject or anything else, and all the headers appearing unparsed in the message body.
So I tried again. Back to the original file, replace
Place the file in the Profile and...
Bang! Perfect! 250-odd sorted, threaded messages with headers parsed correctly! By Jove, I think I've got it!
Rather than try to export and edit my 30+ mail folders separately, I exported my entire Mail folder to a file. 107,200 and something messages. After a few min, it barfed. I moved some folders I knew contained corrupted stores out of the Mail folder and tried again.
Success. One 325MB file of CIX messages.
Now to try to load that into WinWord...
Watch the memory allocation rise. 15MB, 20, 40, 70, 80, 150, 200, 250, 300... 280... 290... 300... 305... 275... 175... 80... 35... 15. This took some time; I "only" have 512MB RAM.
Empty window, just an hourglass.
Kill Word. Try again.
Same thing.
Arse. WinWord can't handle such a big text file. No huge surprise there.
So, I need a text editor with smart search & replace, which can handle newlines. I always knew I should have mastered Emacs years ago.
5min with Google and I have a copy of Textpad instead. Read the search & replace help... It does RegExes... good. That should do.
Try opening mail file. Takes a few min but it does it.
Several searches later, each one doing nearly 110,000 swaps, I'm done. Save it out. This takes ages... has it died? Yes. Oh. Is the file processed? Look in DOS... yes, it is. Grown to ±375MB, though.
Move into Thunderbird profile. Launch Thunderbird. Wait.
There's a new folder! Click it... The new message count starts to climb. 250... 1000... 2000... 5000... I leave it to it. 10min later it has 107,276 unread. The mail file has grown to 384MB now. Kill some stuff on system drive to make room.
Then came the long process of splitting that down into year folders, 1 per year from 1994 to date. 4 from '94; few hundred from '95; 1500 from '96; 3000 from '97; 6000 from '98...
Slow job. Thunderbird ate more and more RAM 'til it was using 350MB and my machine is struggling. (Dual 2GHz Athlon, ½GB RAM, 80GB disk, XP Pro SP1.)
Now start recreating mail filters and making it sort the last 2yr into proper folders. Takes ages but it does it.
But it's still dog-slow and taking over a third of a gig of RAM. And the files haven't shrunk.
Google for "how to defragment MBOX files". Ah. Cunningly hidden as "Compact folders" on the File menu. Guess my mind must be too highly-trained.
That takes a while, but suddenly I have 500MB more disk space back. Still taking all that RAM though.
Quit, reload. 25MB RAM! And it flies!
I have done it. For the first time since 1994 I'm not using Ameol, but all my old messages are there. No attachments and messages which had them are blank, but that's not a major issue.
Perl, who needs it? :¬)
There. Hope that wasn't too dull.
Many thanks to all those who offered help!
I've cracked it!
This is long and tedious but it might be useful to some.
I tried the tool I got off CIX but it died, repeatedly, trying to parse my CIX messagebase. I know there was some corruption hidden in there but with quarter of a million messages going back to the mid-80s across ¾ of a gigabyte of stuff it was hard to find.
So I decided to have a look myself.
I exported a small mail folder to a CIX scratchpad and opened it side-by-side with a Thunderbird MBOX-format mail file.
The Thunderbird "From " header is just a date & it's always the same one, so that's clearly not used. So, in principle, I thought, all I need to do is replace CIX's
Memo (number)header with one saying
From CIX scratchpad (just as a marker for myself)... and that might do it.
X-CIX-Memo: (number)
This was accomplished with a smart search & replace in WinWord. I bunged the file into Thunderbird's Profile directory in an appropriate place and fired 'er up.
Well, it found the file, added it as a folder, and found all the messages... but not quite right. There's an extra delimiter in there, reading
!MF
or
!MFA
The result of the malformed file was that it had the messages all dated 00:00:00 1/1/1970 with no sender/recipient/subject or anything else, and all the headers appearing unparsed in the message body.
So I tried again. Back to the original file, replace
!MFand
Memo
!MFAwith
Memo
From CIX scratchpad
X-CIX-Memo:
Place the file in the Profile and...
Bang! Perfect! 250-odd sorted, threaded messages with headers parsed correctly! By Jove, I think I've got it!
Rather than try to export and edit my 30+ mail folders separately, I exported my entire Mail folder to a file. 107,200 and something messages. After a few min, it barfed. I moved some folders I knew contained corrupted stores out of the Mail folder and tried again.
Success. One 325MB file of CIX messages.
Now to try to load that into WinWord...
Watch the memory allocation rise. 15MB, 20, 40, 70, 80, 150, 200, 250, 300... 280... 290... 300... 305... 275... 175... 80... 35... 15. This took some time; I "only" have 512MB RAM.
Empty window, just an hourglass.
Kill Word. Try again.
Same thing.
Arse. WinWord can't handle such a big text file. No huge surprise there.
So, I need a text editor with smart search & replace, which can handle newlines. I always knew I should have mastered Emacs years ago.
5min with Google and I have a copy of Textpad instead. Read the search & replace help... It does RegExes... good. That should do.
Try opening mail file. Takes a few min but it does it.
Several searches later, each one doing nearly 110,000 swaps, I'm done. Save it out. This takes ages... has it died? Yes. Oh. Is the file processed? Look in DOS... yes, it is. Grown to ±375MB, though.
Move into Thunderbird profile. Launch Thunderbird. Wait.
There's a new folder! Click it... The new message count starts to climb. 250... 1000... 2000... 5000... I leave it to it. 10min later it has 107,276 unread. The mail file has grown to 384MB now. Kill some stuff on system drive to make room.
Then came the long process of splitting that down into year folders, 1 per year from 1994 to date. 4 from '94; few hundred from '95; 1500 from '96; 3000 from '97; 6000 from '98...
Slow job. Thunderbird ate more and more RAM 'til it was using 350MB and my machine is struggling. (Dual 2GHz Athlon, ½GB RAM, 80GB disk, XP Pro SP1.)
Now start recreating mail filters and making it sort the last 2yr into proper folders. Takes ages but it does it.
But it's still dog-slow and taking over a third of a gig of RAM. And the files haven't shrunk.
Google for "how to defragment MBOX files". Ah. Cunningly hidden as "Compact folders" on the File menu. Guess my mind must be too highly-trained.
That takes a while, but suddenly I have 500MB more disk space back. Still taking all that RAM though.
Quit, reload. 25MB RAM! And it flies!
I have done it. For the first time since 1994 I'm not using Ameol, but all my old messages are there. No attachments and messages which had them are blank, but that's not a major issue.
Perl, who needs it? :¬)
There. Hope that wasn't too dull.
Many thanks to all those who offered help!