Website Analytics

Regular Expressions (Regex) in Google Analytics

What are Regular Expressions?

A regular expression (also known as regex) is a special text string for describing a search pattern.
Searching with regular expressions enables you to get results with just one search instead of many searches.

Why use Regular Expressions?

In Google Analytics, you can use Regular Expressions to –

  • Create filters
  • Create one goal that matches multiple-goal pages
  • Fine-tune your funnel steps so that you can get exactly what you need

Regular Expression Characters

There are 13 regular expressions in Google Analytics. This includes combinations of the most common regular expressions –

Regular Expressions for Google Analytics

1. Backslash : (\)

  • backslash escapes a character.
  • It turns the Regular Expression character into ordinary, plain characters.

For example:

  • Suppose /folder?pid=123 is your goal page. The problem we have is that the question mark already has another use in Regular Expressions – but we need it to be an ordinary question mark. (We need it to be plain text.)
    We can do it like this:
    /folder\?pid=123
    The backslash in front of the question mark turns it into a plain question mark.

2. Pipe : (|)

  • The pipe symbol is the simplest one and it means or.
  • It is used to indicate that one of several possible choices can match.

For example:

  • Coke|Pepsi
    A soft-drink blog might use that expression with their Google Analytics keyword report to find all examples of searches that came to their blog using either the keyword Coke or the keyword Pepsi.
  • Let’s say you have two thank-you pages, and you need to roll them up into one goal. The first one is named thanks, and the second is named confirmation.
    You could create your goal like this: confirmation|thanks
    That says, match either of those pages.

3. Question Mark : (?)

  • The question mark is used to denote that either zero or one of the previous character should be matched. (“The previous expression” means the character that comes right before the question mark.)

For example:

  • Let’s say that you have an economics website and you only want to look at the referrers that have the word “labor” in their title. But some of those referrers come from countries where they spell it “labour“. You could create a filter like this: labou?r
    It will match -“labour” (which does have a “u,” which is the previous expression) and “labor” (which has zero of the previous expression, i.e. no “u” is included.)

4. Parentheses : ()

  • Parentheses allow you to group characters together.

For example:

  • /folder(one|two)/thanks
    This regular expression would match two URLs, folderone/thanks and foldertwo/thanks
    Here we are allowing the Regular Expression to match either the thanks page in folderone or the thanks page in foldertwo.
  • If a website has three thank-you pages: /thanks and /thankyou and /thanksalot
    If we only want the /thanks and the /thanksalot pages to be part of our goal, we could do it like this: /thanks(alot)?
    This means, the target string must include /thanks, but alot is optional. So it matches both /thanks and /thanksalot. And the /thankyou page will never get included because there is no s in its URL (so it doesn’t match the beginning of this RegEx, i.e. thanks).

NOTE: The pipe symbol, | and parenthesis often go together.

5. Square Brackets : ([])

  • Square brackets are used to define a set of characters.
    (any one of the characters within the brackets can be matched.)

For example:

  • [Dd]istilled can be used to match both distilled and Distilled.
    p[aiu]n will match panpin and pun. But it will not match pain, because that would require us to use two items from the [aiu] list, and that is not allowed.

NOTE: Characters that are usually special, like $ and ?, no longer are special inside of square brackets. The exceptions are the dash (-), the caret (^) and the backslash (\).

6. Dashes

  • Dashes are used to create a list of items.
  • Square brackets can be used with dashes to create a powerful list.

For example:

  • Instead of creating a list like this [abcdefghijkl], you can create it like this: [a-l], and it means the same thing – only one letter out of the list gets matched.
  • [a-z] – it will create a list of all lower-case letters in the English alphabet
  • [a-zA-Z0-9] – it will create a list of all lower-case and upper-case letters and digits.

7. Curly Brackets / Braces : ({})

  • Curly brackets or braces are used to define a specific number or range of repetitions.
  • Braces repeat the last “piece” of information a specific number of times.

(When there are two numbers in the braces, such as {x,y}, it means, repeat the last “item” at least x times & no more than y times. When there is only one number in the braces, such as {z}, it means, repeat the last item exactly z times.)

For example:

  • Lots of companies want to take all visits from their IP address out of their analytics. So let’s say that their IP addresses go from 123.145.167.0 to 123.145.167.99
    Using braces, our regular expressions would be: 123\.145\.167\.[0-9]{1,2}
    Here we used a backslash to turn the special character dot into an everyday character dot, we used brackets as well as dashes to define the set of allowable choices, i.e. the last “item”, and we used braces to determine how many digits could be in the final part of the IP address.

8. Dot : (.)

  • dot matches any single character. It can be any single letter, number, symbol or space.

For example:

  • The Regular Expression .ead would match the strings readbeadxead3ead!ead etc., but not ead.
  • Let’s say your company owned a block of IP addresses: 123.45.67.250 to 123.45.67.255
    You want to create a Regular Expression that will match the entire block so that you can take your company data out of your Google Analytics.
    Since only the last character changes, you could do it with this expression: 123\.45\.67\.25.

9. Plus Sign : (+)

  • A plus sign matches one or more of the previous character.

For example:

  • abc+ would match abcabcc, and anything like abccccccc.
  • [a]+ would match one or more occurrences of the lowercase letter ‘a’.
  • gooo+gle would match goooogle, but never google.
  • Alternatively, you can build a list of Previous Items by using square brackets.
    Like this: [abc]+
    This will return aabcabcbbbbb, etc.

10. Star : (*)

  • Star matches zero, one or more of the previous characters.
  • They function just like plus signs, except they allow you to match ZERO (or more) of the previous items, whereas plus signs require at least one match.

For example:

  • Let’s say that your company uses five-digit part numbers in the format PN00000, and you want to know how many people are searching for part number 34. Technically, the part number is PN00034.
    So, your regular expressions would be: PN0*34
    This will display all the searches for PN034 and PN0034 and PN00034 and PN00000034 and for that matter, PN34, since using the star means that the previous item doesn’t need to be in the search – zero or more of the previous items, it says.

11. Dot-Star : (.*)

  • Dot-Star is a dot followed by a star.
  • It matches zero or more random characters.

For example:

  • /folderone/.*index\.php
    This Regular Expression will match everything that starts with folderone/ and ends with index.php. This means if you have pages in the /folderone directory that end with .html, they won’t be a match to the above Regular Expression.
    This means that the dot could match any letter in the alphabet, any digit, any number on your keyboard. And the star right after it matches the ability of the dot to match any single character, and keep on going (because it is zero or MORE) – so it ends up matching everything.

12. Caret : (^)

  • Caret is used to denote the beginning of a regular expression.
  • Caret also means NOT when used after the opening square bracket. (When you put a caret inside square brackets at the very beginning, it means match only characters that are not right after the caret)

For example:

  • If you’re trying to match all URLs within a specific directory of your site, you could use ^products/. This would match things like products/item1 and products/item2/description but not a URL that doesn’t start with that string, such as support/products/.
  • [^0-9] means if the target string contains a digit, it is not a match.
  • [^a] means check for any single character other than the lowercase letter ‘a’.

13. Dollar Sign : ($)

  • Dollar Sign is used to denote the end of a regular expression or ending of a line.

For example:

  • Colou?r$ would check for a pattern that ends with ‘Color’ or ‘Colour’.
  • product-price\.php$ would check for a pattern which ends with ‘product-price.php

Conclusion: Regular expressions are an important tool to be mastered by anyone who needs to do significant work in Google Analytics. Spend some time learning about the basic setup of regular expressions and brainstorm ways to use regex in displaying data relevant to the site(s) you work on.

For a detailed tutorial about regular expressions, download the PDF here – Google Analytics – Regular Expressions

Shivani Singh

Shivani is a Dubai based SEO Consultant. She has been actively involved in SEO since 2010 and believes in smart-work rather than hard-work. She is constantly working towards providing the brands/websites with all their SEO needs – content and technical.