The SurgeMail 'Mirror' system allows you to link two systems together and read or deliver Email to either system and both systems will continually 'match' each other. This can be used in several ways:
Mirroring will work over a LAN or WAN connection and can be encrypted. Unlike using shared NFS drives there is no single point of failure in a SurgeMail Mirrored system so you have genuine fail over capability.
In almost all cases, you should be running a mirror of your mail server, it's the cheapest and most efficient way to keep a live backup of your system. The only cases we can think of where you don't need a mirror are if:
Some people forget that disk drives fail, they do, your mail server's disk will fail approximately once in the next 2-3 years. Some people think RAID 5 or similar systems provides protection from disk failure, it does not, we've had so many customers loose Raid 5 arrays (and we've lost so many) that we actually consider them less reliable than non raid5 disk arrays. (Speaking of which always use RAID 10 for high performance and reliability for a mail server, when possible, NOT Raid 5)
Many people think mirroring should be combined with a load balancer as you would do with a web server, this is NOT the case. A simple load balancer causes serious risks when used with mirroring because if the mirroring fails even briefly, and the user accesses both of the servers during that time, the new messages could be assigned identical UID values. Then one of those messages will be invisible and lost to the end user.
To avoid this and still have a fault redundant system you can do any of the following
In general nwauth is the only module that natively supports mirroring, but some other modules work where they can both access a common back end server (like mysql, ldap etc) ntauth doesn't work because it relies on some local files to fill in fields that are not available in the windows database.
Module Support? nwauth Yes ntauth No mysqlauth Yes but both servers must point to the same mysql database backend ldapauth Yes again both servers must point to the same ldap back end typically
If you want to add mirroring to an existing server, you'll need to read this.
Simply setup two mail servers in a similar manner, we recommend you copy the config from one to the other and then adjust any system specific settings (mail paths etc.) it's important that the configs have the same domains and forward rules and the same g_mirror_secret)
Example: (adding these settings to surgemail.ini)
Server 1: ip 10.0.0.1 (master)
g_mirror_nossl "TRUE" g_mirror_mode "master" g_mirror_host "10.0.0.2" g_mirror_secret "testing" g_mirror_config "true" (if you want to mirror config changes as well)
g_mirror_repair "true" (auto repair once a month)
server 2: ip 10.0.0.2 (slave)
Commands to issue after adding a new SLAVE to an existing system:g_mirror_nossl "TRUE" g_mirror_mode "slave" g_mirror_host "10.0.0.1" g_mirror_secret "testing" g_mirror_config "true" (if you want to mirror config changes as well)
issue "tellmail resync_config" on master (if using g_mirror_config) issue "tellmail resync_nwauth"' on master (if using nwauth) issue "tellmail resync_fast" on master
issue "tellmail resync_mkdir" on master (to create empty folders on slave)
above are the settings that go into each servers surgemail.ini.
That will give you a mirror, its that simple.
You may wish to add g_mirror_trash "true" if you want the trash folder to mirror as well.
Now you need to consider how users get to the server and how you can easily allow them to get to the 'working' server in the event of a failure.
For incoming messages you can just setup 'MX' records so that the backup server is listed as a low priority host. e.g.:
your.domain MX=10 mail.your.domain
your.domain MX=20 mail2.your.domain
But for user access to the server you have several options:
You can choose to enable config setting mirroring. This causes SurgeMail to send it's config from master to slave and vice-versa if/when config changes are made in the web interface (it does not notice manual changes done by editing the config file).
First make a backup of both ini files, just in case :-)
enable it set this on BOTH
(ON THE MASTER) tellmail resync_config
Of course, you do not always want to mirror all the settings, especially settings to handle mirroring like g_mirror_host for example. You may use g_mirror_config_except specify settings to be ignored when processing an incoming config, in addition there are a number of settings which are ignored by default, see g_mirror_config_except for details.
You need to install surgemail on a new system, then follow the instructions above in "How do I turn it on?" to add the correct mirror settings to both ini files (old system and new system), you should set the new system up as SLAVE, then...
issue "tellmail resync_config" on master (if using g_mirror_config) issue "tellmail surgehost_update" on master issue "tellmail resync_nwauth"' on master (if using nwauth) issue "tellmail resync_fast" on master
issue "tellmail resync_mkdir" on master
Note: Although this feature exists in earlier versions of surgemail, we recommend upgrading to 3.1 before using it as we made significant improvements to the fault tolerance of this feature (it's more idiot proof in version 3.1 :-)
Only the users mailboxes/folders, nwauth, and surgemail.ini are mirrored. Any files external to this will not be mirrored. It is sometimes wise on a new system to start by duplicating the surgemail root directory /usr/local/surgemail (c:\surgemail) first to pickup other odd files you may have tailored. This depends a lot on how much tailoring you've done of your system.
If you are using the caldav plug in then the database is backed up to the mirror once per day, if you are moving the master and want an up to date copy for some reason then manually copy the sqlite database from surgemail/scripts/data/caldb.sqlite
Always check in two ways, first check the status as below, then compare two directories manually to be 'absolutely' sure.
In the status window (near the end) you will see the following information
out: Que/sent add=612/611 (3343432 bytes)
Mirror in: Received add=0 del=0 rename=0 failed=0
This shows both halves of the mirroring operation. The "Mirror out:" line shows messages queued to be sent to the other system and the second number (/611) shows how many have been successfully sent (so one is still queued) and how many delete operations have been queued (612) and how many have been sent (/610), so 2 are still queued. Obviously these numbers should normally match.
The second line, "Mirror in:" shows how many new items or deletions have arrived from the other system.
To compare directories do this on both servers, and compare the directory listings:
tellmail path firstname.lastname@example.org dir [path it returns]/mdir/new
Lastly, issue a 'tellmail resync_fast' and check in the status to see how many corrections it needs to send.
For internal reasons we needed to establish a master/slave concept, although in almost all respects they are identical and neither is the 'master' in any behavioral sense, for example if you change something on the slave the change will appear on the master and versa visa. The one thing you should never do is swap the master/slave settings over as this will confuse the mirroring software! (It can be done reasonably safely if both servers are stopped at the time, but it's best avoided :-)
We do recommend that you generally avoid doing things on both servers randomly, it's best to make everything go through one server and do all changes etc. on one server, then use the other server purely as a 'hot' backup. In this way if something does go wrong, but goes unnoticed (e.g. they get unplugged from each other) you will know which one is in a 'good' state and which one is 'out of date'
Note: DLIST runs only on the 'master', so in the situation where your master is going to be down for several days, you will need to swap master and slave so that the dlist on the 'slave' will come to life.
How long is a piece of string :-), the time is mainly dependent on the 'number' of messages stored on the server, so it is not directly related to the number of users, or the size of your mail store.
But as a rough guide, to resync from scratch, you would expect it to take something like:
The mirroring is very forgiving, it will try to continue after a crash, when one server is down changes are 'queued' until it reappears. The only time you must issue commands is when one server's disk is lost/reformatted, then you must issue a 'tellmail resync' on the 'good' system.
Mostly because they can't, to implement mirroring it's essential to integrate it into the core mail server code at the design stage so they are too far down the path to add it.
Most other suppliers offer one of two alternatives instead, they either provide 'file system' level mirroring, which at best is much less efficient, and likely to be minutes or hours 'out of date'. Or they promote the 'shared network drive' approach even though this clearly fails to duplicate the 'data' and thus is completely ineffective as a fault redundant solution.
Assuming these systems currently run different domains, yes you can, you first add the other domains to each of the servers, and then turn on the mirroring settings. Then issue a tellmail resync on 'both' of the servers so that each one sends the new domains to the 'other' system.
The best test is to send an email to one system, then read it from the other, you can setup our 'watchdog' utility to do this automatically once an hour so that you will always know if anything goes wrong.
You can also check the mirror section of the 'status' page, here the cryptic errors are often not that important, the key thing to look for is the counters showing successfully mirrored items, are these counters ticking over.
The best answer we can give here is 'probably'. :-). We've tried to identify everything, but there may be things we've missed. In particular if you add settings to your config file which refer to files that are non standard then those files may not get mirrored. (And alias file setting for a domain would be one example).
DLIST currently only runs on 'one' of the servers (the master), this is to avoid problems of mailing list messages being sent twice by mistake :-), It's files are mirrored though so the data is duplicated.
The user database is not mirrored unless you are using NWAUTH and you turn on the setting. If you are using some other user database then you will need to consider if it needs mirroring in some way. Usually in this situation it won't be an issue as it will be a network accessed database anyway.
Please also note that the config mirroring is new and requires SurgeMail 3.1 or later for best results.
There is of course some and the data does need to be sent between the two systems. However, the load is by no means doubled, as the mirroring occurs at the delivery stage after much pre-processing has occurred (e.g.: spam & virus filtering). Also most mail servers will run at about 98% idle so the extra load is really of no relevance (even for quite large ISP operations). We run mirroring on the servers we host with 40,000 plus users on a system. So far we've had about 3 Raid/system failures on our own hosting systems where mirroring has 'saved the day' and resulted in no significant loss of data.
You can mirror over a WAN connection, but the round trip time may slow down the mirroring a bit so if the system is very heavily used it may struggle to keep in sync. On most systems this is not a problem. But on very large busy systems this would be a mistake.
For short periods there is no need to swap master/slave. The only thing that doesn't function on the slave is mailing lists.
If the master is dead or is being replaced but it may take a week then you may choose to swap them, do it like this:
1) Stop both servers.
2) Change the g_mirror_mode setting from "slave" to "master"
3) If needed change/swap ip addresses for the servers.
4) start the new master server (which was the slave)
5) When the old master server is repaired be sure to set it's g_mirror_mode to "slave" before starting it!!!
6) Issue a 'resync_config' and 'resync' on the MASTER (which was the slave) once the new 'slave' is running.