On-Premises Arbitration Mailboxes and Cloud Archives

An interesting thing happened the other day. I had a client who was proceeding down the normal Hybrid deployment path. Exchange 2010 / SP3 / UR4 and just installed DirSync. However, they were now having production issues with their Exchange on-premises environment. The Transport Service on several Hub Transport servers were shutting down with multiple errors.

Troubleshooting Process
The first steps were to examine the connectivity, protocol and message tracking logs for any irregularities. After combing through the logs, I was able to discern that poison messages were being generated and were jamming up the queues. As soon as the poison message was generated it would set off Events 10003 and 4999 and this would stop the Transport service on the Exchange 2010 server in question.


We also had this error messages which confirmed our connection between a message to a user that had an online archive and the poison message:


The same poison message also seemed to be generated only when a user in the Recipients field had an online archive. If the message was sent to a distribution group and a user in the DL had an online archive, a poison message would be generated. The same would happen if the email was sent just to user alone. There were no other clues.

In order to keep the environment functional while troubleshooting we turned off Poison Message Detection. The default setting is True:


We can just set this to false. Once this was done, Poison messages did not cause the transport service to crash:


At first we thought it had to do with malformed messages that were processing, causing the poison message and that Exchange 2010, because of its old update was not able to handle the message properly. Older versions of Exchange 2010 frequently had this issue as can be see by the fixes in this and this update. The customer did not want to perform the upgrade which would involve 16 Exchange 2010 servers. We then opened a case with Microsoft.

A call to Microsoft revealed that there was something indeed missing in the environment. The customer was missing an arbitration mailbox. Specifically the federated email mailbox with a well known name of “FederatedEmail.4c1f4d8b-8179-4148-93bf-00a95fa1e042”.

The problem had come about due to a dual problem. The Arbitration mailbox had been hosted on an old server which was deleted in a bad way and prior to us invlved as consultants. The other is that Exchange 2010 seems suceptible to poison message issues. Creating an online archive allowed both of these issues to surface at the same time, causing transport / mail flow issues.

Create a new Federated Arbitration mailbox which performs background tasks for Exchange Server 2013 with external federated organizations like Office 365. In particular, the “FederatedEmail.4c1f4d8b-8179-4148-93bf-00a95fa1e042” mailbox performs message arbitration for mailboxes with archives in Office 365. What this means is that without this account, emails to an on-premises account with an Office 365 archive will become a poison message. Normally poison messages would be handled well by Exchange, but the clients Exchange 2010 servers refused to do so.

Make sure to follow this TechNet Blog Article. Once the mail box is recreated all the problems with poison messages and users with Online Archives cleared up.

We also turned Poison Message Detection back to ‘True’ to verify we had no more issues and no more poison messages were generated.

Further Reading
Use the Shell to re-create the Discovery system mailbox
Recreate Arbitration mailbox in Exchange 2010 – Applies to Exchange 2013 as well.
Mail access issues in a hybrid Exchange deployment with cloud-based archive


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s