4.3 - The Queue

The Queue (often referred to in full as the Queue Manager) is the subsystem that controls the storage of messages which are in transit through this mailserver. In other words, this is where messages live in between being submitted into the MTA, and being forwarded on to their next-hop destination.
The queue is of vital importance to the integrity of an MTA, as it contains messages which we have accepted responsibility for, and not yet handed off to the next-hop MTA on their journey. At any one time, exactly one MTA holds official responsibility for any particular message.

The Queue encompasses the Spool, which is the subsystem that stores the messages proper, and a meta queue that contains control information about those messages. When we talk of the "queue" without further qualification, we generally mean the meta queue.

The meta queue comes in two flavours, one of which is based on the filesystem, and the other on an SQL database.
The choice of which queue manager to use ultimately comes down to preference.
The filesystem queue obviously works right off the bat without requiring any extra software to be installed or a database to be available. On the other hand, "installation" of the embedded database flavours requires nothing more than copying a JAR file into place, and the SQL-based queues do offer a simpler listing facility.

Performance-wise, H2 and HSQLDB appear to be the fastest embedded databases, and benchmarking comparisons between them and the filesystem-queue mode vary, but it may be that over a more balanced work load which includes deferring and removing messages, the embedded databases would gain an advantage. This is one area where users are recommended to employ their own benchmarks.

4.3.1 - Generic Settings

These config settings apply to any flavour of queue, and are shown with their default values.

<queue>
    <maxmemoryqueue>10000</maxmemoryqueue>
    <retry_maxtime>72h</retry_maxtime>
    <retry_maxtime_reports>24h</retry_maxtime_reports>
    <retry_delays>30m | 2h | 4h</retry_delays>
    <spool>
        ...
    </spool>
</queue>

maxmemoryqueue
This sets the max size of memory cache the queue manager will return to a requesting naflet, regardless of the size of their request.
The default is 10,000 messages and this setting is best understood by referring to the various naflets that build a in-memory cache of queued messages.

retry_maxtime
This specifies the max time a message may remain in the queue before our attempts to forward it time out, and we declare it to be a bounce. At that point, an NDR (Non Delivery Report) will get generated and returned to the message's sender.

retry_maxtime_reports
Similiar to retry_maxtime, except this is the time limit applied to NDRs, as opposed to original messages.
If an NDR expires, it is simply discarded, as it makes no sense to generate NDRs in response to NDRs (and probably means the original email was spam with a faked sender, anyway).

retry_delays
This controls their retry schedule for messages, when we have fail to forward them to their next-hop destination.
You can specify any length sequence of any timeouts, and the defaults mean that after the original failure to send a message we will wait 30 minutes before sending it again. If that fails, we will wait 2 hours before the next attempt, and then another 4 hours until the attempt after that. A retry interval of 4 hours will then continue to apply, until the message reaches the limit specified in retry_maxtime (or retry_maxtime_reports for NDRs) is reached, and the message is deemed to have expired.

spool
See §4.3.4


4.3.2 - Filesystem Queue

This version of the queue manager stores control information about queued messages as plain disk files.
They are known as MMQ (Mailismus Meta Queue) files, and have the extension .mmq.
If a message has multiple recipients, Mailismus will generate a different MMQ file for each destination domain, but multiple recipients within a domain will be aggregated into the same MMQ file. Note that this only applies within a particular message, as common recipients of distinct messages are always stored in different MMQ files.
The format of MMQ files is proprietary and subject to change without notice, but for now anyway, their filenames embody the destination domain and the SPID (see §4.3.4)

It should go without saying, but for performance reasons, the queue should be located on a local filesystem, not one that's been remotely mounted across the network from a fileserver.

The Filesystem Queue (also known as FilesysQueue, after its Java class) is the default queue manager, so it will be in effect even if you don't specify the class attribute below.
This means that the config block in §4.3.1 above also represents a filesystem queue. Indeed, you can omit the queue block from mailismus.xml altogether, and you will still have a filesystem queue with functional defaults.
The first reference to "QMGR" in any of the logfiles will tell you which Queue implementation is in effect.

This is what the queue config block looks like for the filesystem queue, illustrating the default values of the relevant settings.

<queue class="com.grey.mailismus.mta.queue.queue_providers.filesystem.FilesysQueue">
    <rootpath>%DIRVAR%/queue</rootpath>
    <quota_deferred>10</quota_deferred>
    <retry_granularity>15m</retry_granularity>
</queue>

rootpath
This specifies the root of the directory tree under which FilesysQueue stores its control files. See the NAF Guide for the derivation of the %DIRVAR% token.
Subject to the usual disclaimers that layout is subject to change without notice, you will note 3 top-level subdirectories under this root:
incoming: This holds newly submitted messages which we have not yet attempted to deliver.
deferred: This holds messages which we have already attempted to deliver at least once, and failed. They will all be at various stages of the retry schedule.
bounces: This holds messages which have failed, and they're waiting for the Mailismus Reporting task to pick them up and generate the NDR.
Messages are obviously deleted from the system after being successfully delivered, so if a message is successfully delivered at the first attempt, it will never appear anywhere except the Incoming area.

quota_deferred
This is a percentage figure, which tunes the priority of deferred messages that are ready to be retried versus newly received messages, when the number of messages in the queue exceeds the Delivery cache (see Delivery Task).
The default of 10% means that at least 10% of each delivery batch will be taken from the deferred queue, thus avoiding starvation if new messages are flooding in. If there are insufficient new messages to fill the remaining 90% of the cache, then more messages will be taken from the Deferred queue, up to the cache-size limit.

retry_granularity
This is a rounding-up factor which serves to cluster as many deferred messages as possible into common directories.
Deferred directories are named after the timestamp at which their messages will be eligible to be retried, so in the most degenerate case we could end up with one directory per deferred message, which would obviously be very inefficient. At the cost of adding a minimal delay factor (15 minutes at worst) to some messages, this mechanism avoids that and optimises the message storage.
This setting cannot exceed half the value of the first retry_delays interval (see §4.3.1), and will be silently reduced if necessary to enforce that.


4.3.3 - Database Queue

This version of the queue manager stores control information about queued messages in a relational database.
Mailismus accesses databases using the Java JDBC API, and it operates on a table called MMTA_SMTPQUEUE (which it automatically creates), containing one record per message per recipient.
See the application/database config block in §4.1, for database-related config.
See §2.3 for database-related installation and setup.

This is what the queue config block looks like for the database queue, illustrating the default values of the relevant settings.

<queue class="com.grey.mailismus.mta.queue.queue_providers.sql.SQLQueue">
    <bounces_interval>10m</bounces_interval>
</queue>

bounces_interval
This is a vendor-neutral setting, and specifies the frequency with which Mailismus checks which deferred messages have now expired, and flags them as bounces.
There should be no need to modify the default of 10 minutes.


4.3.4 - Spool

The spool subsystem sits within the queue manager, and is responsible for managing the raw message content, which it stores as message files with the extension .msg.
The spooler assigns a unique SPID (Spool ID) to each message, and spooled messages have a 1-to-many relationship with items on the meta queue, as multi-recipient messages will generate multiple entries on the meta queue, but are still represented by a single spooled message file.

It should go without saying, but for performance reasons, the spool should be located on a local filesystem, not one that's been remotely mounted across the network from a fileserver.

To keep directory sizes down for performance reasons (and for sheer manageability), the spooler partitions its storage area, and allocates message files to one of 256 subdirectories, in a 2-level tree, based on the final 2 digits of their SPID (which is hexadecimal). The SPID generation scheme ensures a uniform distribution of message files.
For example, if the SPID was CB891D28, then the message file would be stored as SPOOLROOT/8/2/CB891D28.msg. As you can see, we've walked backward from the end of the SPID, to a depth of 2, to trace out the 8/2 subdirectory path.

<spool>
    <rootpath>%DIRVAR%/spool</rootpath>
    <partitiondepth>2</partitiondepth>
    <bufsize>8192</bufsize>
</spool>

rootpath
This specifies the root of the directory tree under which the Spooler will construct its little hive. See the NAF Guide for the derivation of the %DIRVAR% token.
This path is what was referred to as SPOOLROOT in the example above.

partitiondepth
This modifies the structure of the 2-level 256-node spooler tree structure, by changing the 2-level bit. Reducing it all the way to zero would turn the spool area into a single flat directory.
There may be grounds for reducing the partition depth in some cases, but we strongly recommend against increasing it beyond 2, no matter how large your queue grows, as this introduces other issues.

bufsize
This specifies the OS buffer size used when writing the message files.


4.3.5 - NAFMAN Commands

There are some NAFMAN commands which you can issue to display info on the queue:

qlist
This displays the queued messages in a tabular format which is suitable for CSV import (but with semi-colon field delimiters), and the first row contains the names of the header fields.
The final "DSN" column simply indicates whether the email is a delivery (or non-delivery) report, or an actual message.
At the end of the listing, the total number of messages in the queue is shown, along with the number of new or incoming messages, ie. messages which have been accepted by the Submit task, but not yet acted on by the Delivery task.

qcount
This is a simpler query which only lists the totals displayed at the end of the qlist command.

Note that these commands are not available for the Filesystem-Queue, on the basis that it is amenable to manual inspection via the filesystem.