Filtering Yahoo Mail and Live Mail in Google Analytics
17
[Note: Please leave a comment if there is another type of filter that you would like to see? Or issue you have with GA data.]
One of the time consuming and annoying things about Google Analytics is that it handle the sub-domains from Yahoo and Live mail as separate referral sources. There is not sufficient documentation at Google to explain how to condense these mail programs into a single source so I will show you how.
First, if you don’t already have one, you should create a Test Profile. Just in case anything goes wrong — you don’t want to screw up your existing profile’s data.
Now make a Custom Advanced Filter:

Extract Campaign Source. In this I am extracting anything that ends in mail.yahoo.com and overwriting the source with “mail.yahoo.com”
- Field A -> Extract A: Campaign Source: (.*)\.mail\.yahoo\.com$
- Output To -> Constructor: Campaign Source: mail.yahoo.com
- Field A Required: Yes
- Override Output Field: Yes
- Case Sensitive: No
This filter will collect all URLs that end in mail.yahoo.com and condense them to only mail.yahoo.com. You can do a similar filter for Live or any other e-mail service that is being seen as multiple referrers so your goal conversion tab will be more accurate and useful.
Why a Custom Filter is Necessary
Google Analytics has a pre-made filter called Search and Replace that also accepts Regular Expression commands but require one search and replace per provider. A custom filter allows you a second extraction parameter for those that want to create a more powerful filter.
A note on regular expressions: The extract fields are give special meaning to some character (see below), make sure that you use a forward slash (\) before your periods that are supposed to be read as periods, otherwise you may get bad results.
From Google FAQ
Regular Expression Characters
Click on each character’s description to read a detailed article describing how to use it.
Wildcards
| . | Matches any single character (letter, number or symbol) | goo.gle matches gooogle, goodgle, goo8gle |
| * | Matches zero or more of the previous item | The default previous item is the previous character. goo*gle matches gooogle, goooogle |
| + | Just like a star, except that a plus sign must match at least one previous item | gooo+gle matches goooogle, but never google. |
| ? | Matches zero or one of the previous item | labou?r matches both labor and labour |
| | | Lets you do an “or” match | a|b matches a or b |
Anchors
| ^ | Requires that your data be at the beginning of its field | ^site matches site but not mysite |
| $ | Requires that your data be at the end of its field | site$ matches site but not sitescan Note: to understand why anchors are necessary, please read Tips for Regular Expressions at the bottom of this page. |
Grouping
| () | Use parenthesis to create an item, instead of accepting the default | Thank(s|you) will match both Thanks and Thankyou |
| [] | Use brackets to create a list of items to match to | [abc] creates a list with a, b and c in it |
| - | Use dashes with brackets to extend your list | [A-Z] creates a list for the uppercase English alphabet |
Other
| \ | Turns a regular expression character into an everyday character | mysite\.com keeps the dot from being a wildcard |
Tips for Regular Expressions
- Make the regular expression as simple as possible so that you and your colleagues can work with them easily in the future.
- Be sure to use a backslash if you have characters like “?” or “.” and you wish to match those literal characters — otherwise, they will be interpreted as special regular expression characters.
- Not all regular expressions include special characters. For example, you can specify that a Google Analytics goal be a regular expression, and even if you don’t have any special characters, your goal will be interpreted according to the rules of regular expressions.
Regular expressions are greedy. For example, site matches mysite and yoursite and sitescan. If site is your regular expression, it is the equivalent of asking to match to all strings that contain site. Therefore, you should use anchors whenever necessary, to get a more accurate match. ^site$, which uses both a beginning ^ and ending $ anchor, will ensure that the expression has to start with site and end with site and include nothing else. Notice, too, that there were no special characters in the regular expression site – it is interpreted as a regular expression only if it is in a regular expression-sensitive field.



July 28th, 2008 at 9:22 pm
[...] Hundred Dollar SEO You Get What You Pay For Skip to content ContactArchivesSitemapSexiest Man In SEO « Filtering Yahoo Mail and Live Mail in Google Analytics [...]
November 13th, 2008 at 8:26 pm
[...] you should be linking with an appropriate description of the content that you are linking to like: filtering mail in Google Analytics. If you need a full sentence to describe a link something is suspect — both as a reader and [...]
January 15th, 2009 at 8:26 pm
[...] Filtering Yahoo Mail and Live Mail in Google Analytics – 100 Dollar SEO [...]
January 16th, 2009 at 2:34 am
[...] Filtering Yahoo Mail and Live Mail in Google Analytics – 100 Dollar SEO [...]
January 19th, 2009 at 4:03 am
[...] Filtering Yahoo Mail and Live Mail in Google Analytics – 100 Dollar SEO [...]
March 12th, 2009 at 12:33 pm
Is there a way to put this on a trial basis?
March 12th, 2009 at 12:42 pm
I’m not sure what you mean by trial basis. You can do it on a separate profile.
March 15th, 2009 at 10:18 pm
This is very interesting information. I just bookmarked it now.
June 24th, 2009 at 7:07 am
I have to say I’m really impressed with your posts and blog overall. I stumbled on your site accidentally but am now happy I did. I’ll be stopping in to read more often now. Thanks again !
Thanks,
Lou
July 23rd, 2009 at 10:28 pm
Nice post. Thanks for sharing these tips.
August 18th, 2009 at 1:05 pm
Thank you for sharing such a complicated topic with the distinct screen shots. Great info to know when setting up our Analytics!
November 1st, 2009 at 3:30 am
Well, I don’t actually want any email hosts appearing in my referrals.
I already tag my emails so I’m wondering if GA is actually double counting when it lists email hosts.
IE When I look in my Medium section it shows me X number of visits came from email campaigns.
Now, when I look in referrers I see various email hosts. Have these already been counted under email campaigns.
If so I may as well filer them out altogether.
Any thoughts?
November 1st, 2009 at 6:32 pm
No, Google isn’t double counting, it is just a different way of labeling the visitors. Combining this filter will allow you to better see Yahoo and MSN mail as channels in conjunction with, and independent, of your tagged campaigns.
November 2nd, 2009 at 5:14 pm
Thanks Carlos,
that’s what I wanted to know.
November 2nd, 2009 at 5:26 pm
Another Question.
Could I also use this to fix my own url referring issues.?
IE: “mysite.com.au” often appears in the Referring sites report.
Apart from the fact that it’s annoying, I’d rather it was not counted as a referrer.
It’s not a really large amount, and I’ve done the usual things like setting up sub domain inclusion, excluding staff traffic etc, but I still get a certain amount of referrals from my own domain.
The standard explanation is users have loaded a page on their browser and left it for long enough for the session to expire. Then when they re-engage with the page it gets recorded as a referral from my own site.
Fair enough, but I still don’t want it appearing in the referrals report.
Can I adapt your method to overcome this?
January 21st, 2010 at 9:24 am
Great thank you for this post. When does the filter get applied to the data? Does it effect future visits or will it adjust past records of referrals? Thanks.
January 21st, 2010 at 9:25 am
It is applied immediately and only affects future visits.
January 21st, 2010 at 9:54 am
Thanks Carlos!
February 10th, 2010 at 1:01 pm
[...] the most important areas you can apply campaign tagging is e-mail. And though you can get fancy and create a filter that combines email sources you should not have to do this because your campaigns should be tagged to begin with. So definitely [...]
February 10th, 2010 at 4:06 pm
[...] the most important areas you can apply campaign tagging is e-mail. And though you can get fancy and create a filter that combines email sources you should not have to do this because your campaigns should be tagged to begin with. So definitely [...]
February 11th, 2010 at 12:27 am
[...] the most important areas you can apply campaign tagging is e-mail. And though you can get fancy and create a filter that combines email sources you should not have to do this because your campaigns should be tagged to begin with. So definitely [...]
February 11th, 2010 at 4:16 pm
[...] the most important areas you can apply campaign tagging is e-mail. And though you can get fancy and create a filter that combines email sources you should not have to do this because your campaigns should be tagged to begin with. So definitely [...]
February 12th, 2010 at 2:04 am
[...] the most important areas you can apply campaign tagging is e-mail. And though you can get fancy and create a filter that combines email sources you should not have to do this because your campaigns should be tagged to begin with. So definitely [...]
March 7th, 2010 at 5:26 pm
If a customer clicks on a pay per click ad and then sends us an email, we respond and the customer clicks on a tagged email link what does GA report?
March 8th, 2010 at 11:16 am
@Denis
In the case that you described GA would tag the initial click as referred from PPC, the second click as referred from e-mail AND as part of the user defined tag group.
The purpose of the above filter is to consolidate all users of a particular web e-mail client into a cohesive group, instead of treating each e-mail user as a separate referrer.
July 12th, 2010 at 6:44 pm
who knows how to add yahoomail into Live mail client?
July 12th, 2010 at 7:56 pm
You can roll them up, but I wouldn’t.
September 30th, 2010 at 7:41 am
[...] the visits between several ’sources’ on your reports. To get them all together, you can create an advanced filter in your Google Analytics profile, then wait to collect the data in a different way, OR you can [...]
November 23rd, 2010 at 1:55 am
Is it possible to do some sort of bulk filter for all website hosted email services?
I must have a hundred of them appearing in referrers, so i don’t really want to have to write individual filters for all of them.
December 1st, 2010 at 2:47 pm
Well, for what it seems you are trying to do you could use a regular expression to capture the word “mail,” but that will lose a lot of useful data and potential create some weird artifacts. It is best to make individual filters. You can look for common trends in URL construction, but the more complicated you make the regex the more likely you will create a broken filter.
May 1st, 2011 at 2:59 pm
If I understand correctly, I would perhaps point out that the changes will be only applied on any new data received. This will not affect your existing profiles’ data. Therefore, there’s no need to be overly cautious using the test profile.
Indeed, there also seems to be a way to do this using search and replace and regular expression “[a-zA-Z0-9]*\.[a-zA-Z0-9]*\.mail\.yahoo\.com” as documented in picture linked below (a) taken from clutterme.com blog post (b).
a) http://blog.clutterme.com/uploaded_images/analytics-filter-768409.gif
b) http://blog.clutterme.com/2008/04/combining-sources-in-google-analytics.html
May 1st, 2011 at 4:11 pm
@Frank
Yes, filters only affect data that comes through after the filter is put in place. The reason that you need to be careful is that this method of filtering is permanent and irreversible. If you write the filter wrong you will never be able to unfilter the data.
Yes, the expression that Clutterme.com uses also works. I prefer to use the advanced filter as demonstration because it is clearer about what exactly is happening.
July 1st, 2011 at 10:47 am
I didn’t know about the regular expressions and the wildcards, thank you!