Adventures in Custom DLP Rules – Part One

Custom Data Loss Protection Rule (DLP) rules are useful for those wishing to block, report or otherwise work with messages that have data types other than what is built-in. These built-in DLP rules only cover the more generic situations. In part one of my series on Custom DLP rules I will cover the basic steps for the custom DLP rule which are as follows:

  • Create XML file.
  • Import XML into Exchange 2013 or Office 365.
  • Create new DLP Policy.
  • Create rule based off the DLP policy.

Pretty simple.

However, as you are about to find out, the most difficult part will be the creation of the XML file that is needed to define the policy settings. There are numerous resources out there to help with the creation of the XML file. What I have found to be most useful when creating the file is the following:

  • GUID Generator – the XML file has a few GUIDs that need to be replaced with new ones to ensure uniqueness of GUIDs. A good GUID generator can be found at this site.
    *** Note that GUIDs can be generated in PowerShell as well (see part two of this series for more information).

  • RegEx checker – RegEx is the syntax that allows the Exchange DLP feature to know what to look for in a message. Regexr is a good website to use for this purpose. The site allows testing of the regex syntax and can determine if test text matches the criteria you specified.
  • Good Notepad Editor – this is key for being able to properly work on XML files. Notepad++ is a prime example of this type of editor. While I believe it has some UTF16 limitations, UTF8 formatting works fine.

XML Creation
Where do we start? A good place is a Microsoft TechNet article which contains some sample code for a DLP XML file. The sample XML looks like this:

<?xml version="1.0" encoding='UTF-8'?> 
<RulePackage xmlns=""> 
    <RulePack id="b4b4c60e-2ff7-47b2-a672-86e36cf608be"> 
        <Version major="1" minor="0" build="0" revision="0"/> 
        <Publisher id="7ea13c35-0e58-472a-b864-5f2e717edec6"/> 
        <Details defaultLangCode="en-us"> 
            <LocalizedDetails langcode="en-us"> 
                <PublisherName>DLP by the Cloud Master</PublisherName> 
                <Name>Custom SSN Classification</Name> 
                <Description>Custom SSN Classification</Description> 
        <!-- SSN -->     
        <Entity id="0ba2cb9d-4ef1-4fdd-bd16-c3f431363d4b" patternsProximity="300" recommendedConfidence="75"> 
            <Pattern confidenceLevel="85"> 
             <IdMatch idRef="FormattedSSN" /> 
            <Pattern confidenceLevel="85"> 
             <IdMatch idRef="UnformattedSSN" /> 
        <Regex id="FormattedSSN"> 
        <Regex id="UnformattedSSN"> 
            <Resource idRef="0ba2cb9d-4ef1-4fdd-bd16-c3f431363d4b"> 
                <Name default="true" langcode="en-us"> 
                    Custom Social Security Number 
                <Description default="true" langcode="en-us"> 
                    A custom classification for detecting Social Security numbers 

What exactly is all of this? Let’s start with the easy part. GUIDs. There are 3 GUIDs listed in this file. One of them is listed twice. To generate these we need to either use the website or use PowerShell to generate this. Once we have three GUIDs, we need to replace them at these locations within the file:


Once the GUIDs are replaced, we can concentrate on the name and descriptions to help better identify the purpose of this DLP policy.

  • Line 8 – who created the rule (person, company, or department as some examples).
  • Line 9 – provide a name of your DLP policy (Social Security, Bank Account Numbers, etc).
  • Line 10 – provide a description of the policy, as short or as long as is needed.
  • Line 18 – Name of the condition to be referenced lower. Repeat for each condition.
  • Line 24 – repeat of line 18 and subsequent rule names
  • Line 33 – Name of the DLP policy.

Last section to concentrate on are the RegEx conditions used by the rule to determine when to trigger the rule:


For the RegEx code, the best option is to test it before trying to add it to the XML file:


In this case I have a SSN rule that is looking for a pattern of ###-##-#### or ### ## ####. There are plenty of options for this pattern and you can search the Internet for the various flavors and iterations. However this one works for my client. Once you have your RegEx syntax, this can be tested against live text to see if there are any matches. Once the RegEx syntax is perfected, place that into the XML file. Once the XML file is completed this file can be imported into Exchange 2013 on premises servers, or Office 365 servers.

To import the rule into Exchange 2013, simply run this one liner which will then make this DLP policy available for Transport rules:

New-ClassificationRuIeCoIIection -FileData -Path "<file name and full path>" -Encoding Byte -ReadCount 0))

To import the rule into Office 365, we need to use the Azure AD Module for PowerShell and run these lines:

$LiveCred = Get-Credential
$Session = New-PSSession -name ExchangeOnline -ConfigurationName Microsoft.Exchange -ConnectionUri -Credential $LiveCred -Authentication Basic -AllowRedirection
Import-PSSession $Session
New-ClassificationRuIeCoIIection -FileData ([Byte[]]$(get-content -Path "C:Xtemp\ssn2 .xml" -Encoding Byte -ReadCount 0))

Once the import is successful, you will see this:

You can verify the collection via PowerShell:



Now that the XML file is imported, we can create a DLP rule that references this:

Create a new DLP rule:


Select Sensitive Information Type:


Select the custom rule you created:


Click OK.


Once the rule is created, the DLP rule can be tested with a new email (See the Policy Tip that appears):


When trying to import the XML file into Office 365 I was consistently getting an encoding error.

Thinking I had an issue with the way the file was saved, I tried Notepad, WordPad and Notepad++ without success. After do some line by line verification, I found that my RegEx syntax was not quite correct. Once this was changed, I was able to import the XML file.

Another issue is the format of the XML file itself. To test the formatting, simply open the file with your favorite web browser. If there is an issue, the page will either be blank or some information will appear:


If the XML file is correct, something like this will appear:


Final Word
There is so much more that can be done with the custom DLP policies and this article only scratches the surface. Good sources of information:


9 thoughts on “Adventures in Custom DLP Rules – Part One

  1. Pingback: Adventures in Custom DLP Rules – Part Two | Just A UC Guy
  2. Pingback: Regular Expression Bölüm 3 – Exchange 2013 Data Loss Prevention | IT diaries by barisca & seldaa
  3. Pingback: Kurallı İfadelerle Exchange 2013 Veri Sızıntısı Önleme (DLP) – Sibergah
  4. Hi, Thank you for this. Its a very helpful information.

    I wish to know if there is a way to create a policy which checks for a 7 digit value match from a dictionary. We have certain 7 digit numbers and we want to make a policy that if any of those numbers show up in an email, it should be blocked.

    We are using RSA DLP currently which allows us to import a text file which looks something like this:

    Can Office DLP provide these functionality?

    Thank you

    • Yes, you can create RegEx to handle these numbers. RegEx usually is about matching just patterns, but it can be coded to handle a series of numbers. The RegEx search pattern would look like this:


  5. I got this error after loading my regex expression in xml in a DLP.

    The following rule(s) reference more than 20 distinct regular expression text processor(s) and may impact performance:

    can anyone help in this??

      • am using regex expressions for c language and java language for all the syntaxes.. how do import that into xml?? getting error as The following rule(s) reference more than 20 distinct regular expression text processor(s) and may impact performance:

      • have created a policy to identify syntaxes in c and Java using regex in a xml file. but was getting error as “The following rule(s) reference more than 20 distinct regular expression text processor(s) and may impact performance”. Do you have any idea about this?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s