Adventures in Custom DLP Rules – Part One

Custom Data Loss Protection Rule (DLP) rules are useful for those wishing to block, report or otherwise work with messages that have data types other than what is built-in. These built-in DLP rules only cover the more generic situations. In part one of my series on Custom DLP rules I will cover the basic steps for the custom DLP rule which are as follows:

  • Create XML file.
  • Import XML into Exchange 2013 or Office 365.
  • Create new DLP Policy.
  • Create rule based off the DLP policy.


Pretty simple.

However, as you are about to find out, the most difficult part will be the creation of the XML file that is needed to define the policy settings. There are numerous resources out there to help with the creation of the XML file. What I have found to be most useful when creating the file is the following:

  • GUID Generator – the XML file has a few GUIDs that need to be replaced with new ones to ensure uniqueness of GUIDs. A good GUID generator can be found at this site.
    *** Note that GUIDs can be generated in PowerShell as well (see part two of this series for more information).

  • RegEx checker – RegEx is the syntax that allows the Exchange DLP feature to know what to look for in a message. Regexr is a good website to use for this purpose. The site allows testing of the regex syntax and can determine if test text matches the criteria you specified.
  • Good Notepad Editor – this is key for being able to properly work on XML files. Notepad++ is a prime example of this type of editor. While I believe it has some UTF16 limitations, UTF8 formatting works fine.

XML Creation
Where do we start? A good place is a Microsoft TechNet article which contains some sample code for a DLP XML file. The sample XML looks like this:

<?xml version="1.0" encoding='UTF-8'?> 
<RulePackage xmlns="http://schemas.microsoft.com/office/2011/mce"> 
    <RulePack id="b4b4c60e-2ff7-47b2-a672-86e36cf608be"> 
        <Version major="1" minor="0" build="0" revision="0"/> 
        <Publisher id="7ea13c35-0e58-472a-b864-5f2e717edec6"/> 
        <Details defaultLangCode="en-us"> 
            <LocalizedDetails langcode="en-us"> 
                <PublisherName>DLP by the Cloud Master</PublisherName> 
                <Name>Custom SSN Classification</Name> 
                <Description>Custom SSN Classification</Description> 
            </LocalizedDetails> 
        </Details> 
    </RulePack> 
    <Rules> 
        <!-- SSN -->     
        <Entity id="0ba2cb9d-4ef1-4fdd-bd16-c3f431363d4b" patternsProximity="300" recommendedConfidence="75"> 
            <Pattern confidenceLevel="85"> 
             <IdMatch idRef="FormattedSSN" /> 
            </Pattern>             
            <Pattern confidenceLevel="85"> 
             <IdMatch idRef="UnformattedSSN" /> 
            </Pattern> 
        </Entity> 
        <Regex id="FormattedSSN"> 
        (?!\b(\d)\1+-(\d)\1+-(\d)\1+\b)(?!123-45-6789|219-09-9999|078-05-1120)(?!666|000|9\d{2})\d{3}-(?!00)\d{2}-(?!0{4})\d{4} 
        </Regex> 
        <Regex id="UnformattedSSN"> 
        (?!\b(\d)\1+\b)(?!123456789|219099999|078051120)(?!666|000|9\d{2})\d{3}(?!00)\d{2}(?!0{4})\d{4} 
        </Regex> 
        <LocalizedStrings> 
            <Resource idRef="0ba2cb9d-4ef1-4fdd-bd16-c3f431363d4b"> 
                <Name default="true" langcode="en-us"> 
                    Custom Social Security Number 
                </Name> 
                <Description default="true" langcode="en-us"> 
                    A custom classification for detecting Social Security numbers 
                </Description> 
            </Resource> 
        </LocalizedStrings> 
    </Rules> 
</RulePackage> 

What exactly is all of this? Let’s start with the easy part. GUIDs. There are 3 GUIDs listed in this file. One of them is listed twice. To generate these we need to either use the website or use PowerShell to generate this. Once we have three GUIDs, we need to replace them at these locations within the file:

XMLSample

Once the GUIDs are replaced, we can concentrate on the name and descriptions to help better identify the purpose of this DLP policy.

  • Line 8 – who created the rule (person, company, or department as some examples).
  • Line 9 – provide a name of your DLP policy (Social Security, Bank Account Numbers, etc).
  • Line 10 – provide a description of the policy, as short or as long as is needed.
  • Line 18 – Name of the condition to be referenced lower. Repeat for each condition.
  • Line 24 – repeat of line 18 and subsequent rule names
  • Line 33 – Name of the DLP policy.

Last section to concentrate on are the RegEx conditions used by the rule to determine when to trigger the rule:

RegEx

For the RegEx code, the best option is to test it before trying to add it to the XML file:

RegExChecker

In this case I have a SSN rule that is looking for a pattern of ###-##-#### or ### ## ####. There are plenty of options for this pattern and you can search the Internet for the various flavors and iterations. However this one works for my client. Once you have your RegEx syntax, this can be tested against live text to see if there are any matches. Once the RegEx syntax is perfected, place that into the XML file. Once the XML file is completed this file can be imported into Exchange 2013 on premises servers, or Office 365 servers.

To import the rule into Exchange 2013, simply run this one liner which will then make this DLP policy available for Transport rules:

New-ClassificationRuIeCoIIection -FileData -Path "<file name and full path>" -Encoding Byte -ReadCount 0))

To import the rule into Office 365, we need to use the Azure AD Module for PowerShell and run these lines:

$LiveCred = Get-Credential
$Session = New-PSSession -name ExchangeOnline -ConfigurationName Microsoft.Exchange -ConnectionUri https://ps.outlook.com/powershell/ -Credential $LiveCred -Authentication Basic -AllowRedirection
Import-PSSession $Session
New-ClassificationRuIeCoIIection -FileData ([Byte[]]$(get-content -Path "C:Xtemp\ssn2 .xml" -Encoding Byte -ReadCount 0))

Once the import is successful, you will see this:

Import-SuccessBR>
You can verify the collection via PowerShell:

RuleVerify1

RuleVerify2

Now that the XML file is imported, we can create a DLP rule that references this:

Create a new DLP rule:

DLPRule1

Select Sensitive Information Type:

DLPrule2

Select the custom rule you created:

DLPrule3

Click OK.

DLPrule4

Once the rule is created, the DLP rule can be tested with a new email (See the Policy Tip that appears):

DLPrule5

NOTES
When trying to import the XML file into Office 365 I was consistently getting an encoding error.

EncodingError
Thinking I had an issue with the way the file was saved, I tried Notepad, WordPad and Notepad++ without success. After do some line by line verification, I found that my RegEx syntax was not quite correct. Once this was changed, I was able to import the XML file.

Another issue is the format of the XML file itself. To test the formatting, simply open the file with your favorite web browser. If there is an issue, the page will either be blank or some information will appear:

Badxml

If the XML file is correct, something like this will appear:

goodxml

Final Word
There is so much more that can be done with the custom DLP policies and this article only scratches the surface. Good sources of information:

http://technet.microsoft.com/en-us/library/dn781122(v=exchg.150).aspx
http://blogs.technet.com/b/govcloud/archive/2014/04/15/dlp-creating-custom-rules.aspx#.VGzf2_9OUiR

Advertisements

10 thoughts on “Adventures in Custom DLP Rules – Part One

  1. Pingback: Adventures in Custom DLP Rules – Part Two | Just A UC Guy
  2. Pingback: Regular Expression Bölüm 3 – Exchange 2013 Data Loss Prevention | IT diaries by barisca & seldaa
  3. Pingback: Kurallı İfadelerle Exchange 2013 Veri Sızıntısı Önleme (DLP) – Sibergah
  4. Hi, Thank you for this. Its a very helpful information.

    I wish to know if there is a way to create a policy which checks for a 7 digit value match from a dictionary. We have certain 7 digit numbers and we want to make a policy that if any of those numbers show up in an email, it should be blocked.

    We are using RSA DLP currently which allows us to import a text file which looks something like this:
    1264967
    2052781
    1506868
    2044661
    1551890
    2137657
    1855740
    1336706
    1484340
    1653824
    1823220
    1964979
    1298132

    Can Office DLP provide these functionality?

    Thank you

    • Yes, you can create RegEx to handle these numbers. RegEx usually is about matching just patterns, but it can be coded to handle a series of numbers. The RegEx search pattern would look like this:

      “^(1264967|2052781|1506868||2044661|1551890|2137657|1855740|1336706|1484340|1653824|1823220|1964979|1298132)$”

  5. I got this error after loading my regex expression in xml in a DLP.

    The following rule(s) reference more than 20 distinct regular expression text processor(s) and may impact performance:

    can anyone help in this??

      • am using regex expressions for c language and java language for all the syntaxes.. how do import that into xml??..am getting error as The following rule(s) reference more than 20 distinct regular expression text processor(s) and may impact performance:

      • have created a policy to identify syntaxes in c and Java using regex in a xml file. but was getting error as “The following rule(s) reference more than 20 distinct regular expression text processor(s) and may impact performance”. Do you have any idea about this?

  6. Pingback: Office 365 – Custom Sensitive Information Types | Just A UC Guy

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s