Home –> Programs (Download) –> mbx2eml [ Deutsch ]

 

mbx2eml  –  Program for splitting mbox files

Version 0.68                Freeware                 © Jürgen Lüthje 2003-2007


Contents

Description
Install
Uninstall
Usage
Features
Names of extracted e-mail files
Settings in the INI file
Note concerning FAT file systems
Background information about the mbox format
References
Credits
License


Description

mbx2eml is a 32 bit program for Windows 95/98/ME/NT/2000/XP/Vista/7, that splits mailbox files in mbox format as well as Foxmail's mailbox files into separate e-mail files (including all attachments), see also Examples of e-mail conversion.

Install

No installation required, just unpack the files in the ZIP archive to a directory on your hard disk or e.g. on your USB flash drive. Change the settings in the INI file if you want.
mbx2eml doesn't add entries to the Windows registry, nor does it alter anything else on the system.

Uninstall

Just delete the files.

Usage

Mbox files typically have the extension MBX (or sometimes e.g. MBS or no extension at all). These are just conventions. mbx2eml can process all mbox files, it's not necessary to rename them beforehand. The mailbox files remain unchanged.

Command:  mbx2eml <file specification> <output directory> [options]

( Example:  mbx2eml c:\data\*.mbx c:\temp\ /p /n )

In the file specification, use  *.*  for all files. With  *  or  *.  you'll get all files without extension (e.g. Mozilla Mail's and Thunderbird's mbox files).
mbx2eml splits all mbox files, that match the given specification, into separate e-mail files. It thereby normally creates a corresponding subdirectory in the output directory for each mailbox file. If the output directory already exists, then it should be empty. Otherwise existing e-mail files with the same name will be overwritten!

Options:
/p                In case of an error, the program does not ask the user but writtes the
                  message to the file %TEMP%\mbx2eml.log and tries to proceed.
/i                The program starts as an icon in the taskbar.
/n                Serial numbers with 8 digits are used as names of the generated e-mail files
                  (beginning with 00000001).
/d                All mails are unpacked to the output directory (without subdirectories).
/MinDate=...      Only mails with the given date or a more recent one will be extracted.
/MaxDate=...      Only mails with the given date or an older one will be extracted.
/Message-ID=...   Only mails with the given Message-ID will be extracted.

Within an option there must be no space. You can combine the options like you want.

The date must be given in ISO format, i.e. {Year}-{Month}-{Day}. So it looks like this:  /MinDate=2006-04-27.
Only dates in the range from 1980-01-01 to 2100-12-31 are allowed.

The message-ID must be written without angle brackets, and it's not case sensitive.
Like with the other filters, also here all messages that match the given criterion are extracted. However, a message-ID normally is unique. Therefore it is recommended to write an exclamation mark directly behind a message-ID that you want to search for. With this addition, the program ends after a matching message was found and extracted successfully. This can save much time.

When the program ends, it returns one of the following status codes to the operating system. E.g. batch programs can read this value using the ERRORLEVEL feature (works on Windows XP, but for some reason not on Windows 98):
0   Programm successfully completed.
1   No matching file found.
2   One or more errors occured, or the user has prematurely terminated the program.
3   A severe error occured, so that the program had to be aborted.

Features

Each e-mail file gets a time stamp according to its Date header field. Since this time stamp is expressed in Universal Time Coordinated (UTC) for all mails, messages from all parts of the world (e.g. on a mailing list) can be sorted by time in a consistent way. If the Date header field of a mail contains invalid data, then “01.01.1980 00:00:00” is used as time stamp.

mbx2eml can read mbox files that contain DOS/Windows line breaks (pairs of ASCII characters 13 and 10), UNIX  –  including Linux and FreeBSD  –  line breaks (ASCII 10), or Macintosh line breaks (ASCII 13), even when they are all intermixed in the same file. The generated mail files contain only DOS/Windows line breaks, and each file ends with a line break.
If the last line of a message only contains a dot, then that line will not be copied to the mail file. This is because there are programs, which cannot process mail files containing such a line correctly.

mbx2eml locks the mbox files while it reads them, so that they can't be altered by other programs at the same time. The program can read even corrupted files that contain binary data. When extracting messages, every ASCII character 26 (“End of File” marker) is replaced with the string “<EOF>”. So even corrupted messages can be opened with any text editor after extraction.

When mbx2eml copies very many mail files to a disk that has a FAT file system, then it automatically creates additional directories, if required. Their names have a trailing underscore and a serial number. This happens both without and with using the command-line option /d.

This software has been used to split e.g. mbox files with a size of about 200 MiB, containing more than 40 000 messages. Mbox files bigger than 2 GiB cannot be processed.

Names of extracted e-mail files

An mbox does not contain file names, so the program must create them itself.

Normally each mail file is named after its subject. Thereby special characters, that are not allowed in FAT32 and NTFS file names under Windows [1], are replaced:
    " is replaced with '
    : is replaced with .
    /\?*<>| and ASCII characters < 32 are each replaced with a blank


Superfluous blanks as well as certain expressions at the beginning of a name are removed. Long file names are truncated, so that they don't exceed a particular maximum. A truncated name is denoted by “...”. If a mail doesn't have a Subject header field, or the field body is empty, “[no subject]” is used as file name. The names of the generated mail files can be changed by means of the optional INI file.

In order to get a unique name, '_' and a hexadecimal number with 9 digits  –  representing date, time and time zone of the mail  –  is appended to the file name. This way, we'll almost always get a file name, that only depends on characteristics of the message itself.
When there are still duplicate names, serial numbers in square brackets will be added to all names except the first one, e.g.
    important_message_2F5B5B73A.eml
    important_message_2F5B5B73A[2].eml
    important_message_2F5B5B73A[3].eml

With the command-line option /n, serial numbers with 8 digits are used as names of the generated e-mail files. If you don't choose a file extension with more than 3 characters, then you'll get short 8.3 names, which are even valid on DOS.

Settings in the INI file

By means of an optional INI file, the user can change the names of the generated mail files. The file must be in the same directory as the program mbx2eml, and its name must be “mbx2eml.ini”.

Example of a file “mbx2eml.ini”:
-------------------------------------------
[MailFiles]
FileExtension = msg
PrefixesToRemove = Re:, Re^2:, Re^3:, Re^4:
ReplaceInFilename = "%_", " _", ",."
MaxFilenameLength = 50
-------------------------------------------


FileExtension (default: eml)
You can write an arbitrary file extension here, it is used for all generated mail files. With the setting
    FileExtension =
the mail files will not get an extension at all.

The following settings do not apply, when the command-line option /n is used:

PrefixesToRemove (default: Re:,Re[2]:,Re[3]:,Re[4]:,Fw:,Fwd:,Aw:)
Comma separated list of expressions, which will all be removed from the beginning of file names. With the setting
    PrefixesToRemove =
no expressions will be removed from the beginning of file names.

ReplaceInFilename (default: empty)
You can write an arbitrary number of character pairs here. They must be surrounded by double quotes, and separated by commas. Each first character of a pair will be replaced with the second character. This option is especially useful for IFS, where particular characters are forbidden, that are allowed on FAT32 and NTFS. That's the reason why in the example the character '%' is replaced with '_'.

MaxFilenameLength (default: 60)
All characters are counted, including dot and extension (if present). Valid values are whole numbers between 20 and 120 (inclusive). If the program reads an invalid number here, it uses the default.

In order to disable an option in the INI file, just turn it into a comment by putting a semicolon at the beginning of the line. When you only want to use the default settings of the program, you also can delete the INI file.

Note concerning FAT file systems

In contrast to the NTFS file system, on FAT file systems (see “My Computer” > Right click at respective drive > “Properties”) the number of entries in a directory is limited. That means if you write too many files into one directory, this directory sometime will be “full”, even if there is enough free space on the disk!

One must take into consideration that long file names (LFN) are stored using a series of linked directory entries. A LFN will use one directory entry for its short 8.3 name, and a hidden secondary directory entry for every 13 characters in its long name (including dot and extension). So if you had e.g. a 120 character long file name, this would use 11 entries.
A file with a short name uses on FAT32 under Windows XP 1 directory entry, but oddly enough under Windows 98 it uses 2 entries.

E.g. on FAT32 there seems to be a maximum of 65 536 entries (including “.” and “..”) per directory. Say we have an mbox file that contains about 20 000 mails, and for the sake of simple calculation let's assume that the names of all these mails have the same length. When the whole mbox file should be unpacked to one directory, the names of the mails must not be longer than 26 characters.

Background information about the mbox format [2,3,4]

This is a common format for storage of mail messages. There is no precise specification of it, though. An 'mbox' is a text file containing an arbitrary number of e-mail messages. Each message is preceded by a 'postmark', and the messages are formatted according to RFC 2822 [5]. The file format is line-oriented.
The 'postmark' is a line that begins with the string “From ” (note the space!), not followed by a colon. Because of the wide-range of variations in practice, nothing else on the “From ” line should be considered.

However, this software does not regard every such “From ” line as the beginning of a new message, because sometimes it is a normal line in the text body of the mail (e.g. “From now on ...”). Only if the lines immediately following the “From ” line look like an e-mail header, then this “From ” line is regarded as delimiter between two messages. Thereby the program is robust and recognizes even syntactical incorrect headers, if they are not too seriously damaged.

Foxmail
Instead of a “From ” line, the program Foxmail uses a line consisting of ASCII characters 16,16,16,16,16,16,16,17,17,17,17,17,17,83 to denote the beginning of a new mail. mbx2eml recognizes these mailbox files automatically, and can process them as well.

Eudora [6]
The Eudora mailbox format is nearly mbox format, but contrary to popular belief it is not identical to it. It is not supported by mbx2eml. Unfortunately Eudora uses the file extension MBX, too.
The Date header field is often left off of Eudora messages, presumably because it is contained in the initial “From ” line. This does not correspond to RFC 2822 [5]. Also in contrast to the mbox format, Eudora extracts all attachments, and saves them as separate files.

References

File systems
[1] http://en.wikipedia.org/wiki/Comparison_of_file_systems

Mbox format
[2] http://www.faqs.org/rfcs/rfc4155.html
[3] http://www.qmail.org/man/man5/mbox.html
[4] http://en.wikipedia.org/wiki/Mbox

Internet message format
[5] http://www.faqs.org/rfcs/rfc2822.html

Eudora mailbox format
[6] http://eudora2unix.sourceforge.net/details.html

Credits

The program was written in Euphoria, and translated using the Euphoria To C Translator 3.0.2. Thanks to RDS for this good, free and open-source general purpose programming language, and for outstanding support.

The program uses the Euphoria programming library ARWEN 0.93c. Thanks to Michael <vulcan {AT} win.co {DOT} nz>.

The generated C code was compiled with the Borland C++ 5.5.1 Command-line Compiler. Thanks to Borland Software Corporation for having provided this powerful tool free of charge.

For suggestions and bug reports I want to thank Erik Kerger, Ton Kerkers, Mattias Nyholm, Mark Finney, Marc Schneider, and Dominik Runggaldier.

License

If you do not accept the following license, then you are not allowed to use or distribute this software.

1. Copyright
mbx2eml is copyright 2003-2007 by the author Jürgen Lüthje, all rights are reserved.

2. Right to use
mbx2eml is freeware. You may use the program free of charge and unlimited in time.

3. Copying
You may copy and distribute the software and its documentation, as long as the file mbx2e068_en.zip is not modified. This means, among other things, that you are not allowed to rename the file, or split it into pieces.
Without clear written permission from the author, you are not allowed to distribute the program as part of another archive or file.
You are not allowed to sell the program, or to enclose it with a commercial program or a commercial collection of programs. The program may be distributed as part of freeware/shareware collections, e.g. on accompanying DVDs of computer magazines, though.

4. Support
You are not entitled to support by the author. However, the author tries to answer inquiries by e-mail.

5. Disclaimer
This software is distributed WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. The author does not accept responsibility or liability for any effects, adverse or otherwise, that this code may have on you or your computer. Use it at your own risk.


Last updated 3. October 2012  –  Contact
I am not responsible for the contents of external websites.