Google Analytics Spam Removal Guide

A step-by-step guide to beating the spam ruining your Google Analytics data


Logging in to Google Analytics can be both an exhilarating and a disappointing experience. You’ve worked hard to try and drive traffic to your website, all those hours you’ve spent writing content, promoting through various channels, and now you come to check your results.

This is the exhilarating part, the build up to checking…I wonder how many people have checked out what I’ve got to say, what’s the bounce rate, how many of my goals have been triggered, what’s the highest converting path?

And initially you’re inflated by the results you are seeing…the traffic is high, referrals are high, bounce rate is low…but then you investigate further to find that your analytics is inundated with SPAM.

This is the disappointing experience of using Google Analytics.

There are consequences to your business linked to spam on your Google Analytics account.

1 – You don’t know that your results are spam, and make decisions based on the numbers Google Analytics gives you. It may show you that sessions are up, bounce rate is low, leading to business as normal perception. Even a “continue as we are” decision, based on incorrect data, could ultimately be a wrong decision.

And following any decision made, resources, both time based and financial based, are navigated towards the strategy. So, whether the spam does any harm to your site or not, the spam showing in your analytics is suddenly having a big impact.

2 – You realise that a lot of the data you are reading is affected by spam on your analytics, spend hours trying to resolve, but the next time you come back to your account there it is again. As a result, you stop using Google Analytics, your digital strategy is unquantifiable, and disappointingly your digital motivation dies.

Here at Leadfreak, we know the frustrations with spam having dealt with many client accounts. Our clients rightly expect clean reporting, and that is what we deliver. From this, we developed a methodology for removing spam data from your account for good. We want you to have this, the more businesses utilising analytics the better.

So, we created this guide.

This guide will:

– Introduce you to the spam you are very likely to be seeing in your Google Analytics account.

– Provide a step-by-step method to removing spam from your analytics.

– Show you how to remove historic spam data from your report view.

A Spam Overview


First, let’s begin by introducing you to the world of spam, and not the delicious tinned meat that can be found around the world, but the spam that is harming your analytics.

We’ve all experienced spam, be it in emails, unwanted phone calls on PPI, text messages…etc. etc. It’s the stuff you receive that you’ve not requested, signed up for, and generally comes from people who mass target certain systems to deliver their message. Generally, found advertising porn, Viagra, and in the latest case of analytics spam…Donald Trump (more on this later).

There are 3 types of spam that you may be receiving in your analytics data; ghost spam, crawler spam, and language spam.

Ghost Spam

Ghost spam inflates your traffic data without ever actually visiting your site. It does this by pinging your Google Analytics tracking code which can be gathered through scraping of your site code at some point by somebody in the past, or as the tracking codes are of the same format, by pinging lots of different numbers and hoping it hits a site.

See the image below to see how this looks:

Ghost Spam in Google Analytics

Ghost spam can appear in the form of referrals, organic traffic, and event tracking, which means that unless you know that there is no spam in your analytics, it’s not just in one isolated location. It can affect all your analytics.

Language Spam

Language spam is the latest in spam developments and was the fastest growing spam strategy in the back end of 2016.

It has multiple key points that you should be aware of:

  • Language spam replicates well known sites like twitter.com, secret.google.com, reddit.com, by using Latin characters to replicate the English language. Leading you to click onto the URL.
  • The metrics it logs with Google Analytics would make it appear to not be spam. The bounce rate is average (40 – 50%), registers multiple page views, and a long session duration. Past spam would hit your sessions but your bounce rate would be 100%.
  • The language spam only impacts data on your homepage and no other internal page on your site.

Don’t be fooled by these tactics, language spam is spam.

Note: In the build-up to the American Presidential elections in 2016 there was an influx of language spam reminding people to “Vote For Trump”. The man responsible for this is Russian spammer Vitaly Popov. As a revenge attack on Google for blocking his AdSense account, Vitaly turned to spamming Google Analytics, and he took this opportunity to push his political opinions on his unsuspecting victims. Vitaly wants fame and fortune, he wants his name to be remembered (which is why he is open as to who he is). We should all ignore Mr Popov.

Crawler Spam

Crawler spam are actual visits to your site from 3rd parties, but the reason this is spam is that it is automatically generated traffic by firms indexing your site such as Google or Bing. There is nobody on the other end of the line looking at your site, considering whether to purchase your products, or reading your latest article.

So as important as this traffic is to facilitate search engine ranking position, it’s not traffic that we want to be in our analytics data.

Why Do People Spam

People spam to get attention, and with attention they can make money.

Referral spam that lands on your site will contain an external link to the site they want you to view. Clicking on these links will then show you their products, be it porn, pharmaceutical goods, and we’ve even seen spamming services being offered.

Be careful not to click these links. As much as they may just contain products they want you to purchase, you’re also opening your computer to the risk of viruses and malware being installed through the connection.

How to Stop Spam in Your Analytics Data

In this next part of the guide we will run through our methodology to stop spam from distorting analytics data, so you can see the true website analytics for your site.

A word of warning, you may see as much as 50% of your traffic numbers disappear depending on your true site traffic values. As much as we tell you this is spam, meaningless data that is distorting the picture of what is happening, it can still be tough to see first-hand.

What this guide is not going to do is to build filters for every single spam site seen on a Google Analytics account. Although this works at the moment it is installed, it quickly becomes outdated once a new spam site attacks your analytics. Not only this, but it is also very inefficient in setting up.

We will be guiding you in the following:

– Creating a Ghost spam filter
– Creating a Fake language spam filter
– Creating a Crawler spam filter
– Creating an Internal traffic filter
– Enabling “Exclude all hits from known bots and spiders” setting

First, let us create the necessary views required on your Google Analytics account to reconfigure the set up.

Create a filtered and unfiltered view

Reconfiguring the way Google Analytics provides data is a great optimisation technique, but we need to be careful. To create a backup that allows us to maintain a base position that can be referred to later, if need be, we set up a separate view.

The original view will stay that way, original. We won’t make any changes to this unfiltered view. If any errors were to occur on our filtered view, we can always go back to this original view.

To create an additional view:

Step 1: Within the admin column of your Google Analytics account, select the grey view box.

Filtered view set up

Step 2: Select “Create new view”. This will be your unfiltered view, so enter a Reporting View Name that is appropriate. E.g. sitename-unfiltered, and click Create View.

Step 3: Select the grey view box once again, and make sure you are in the original view, e.g. not the view that you just created.

We want to change the name of this view to reflect it’s set up. So, select view settings once you are in this view.

Within this menu, you can change the View Name.

This view will contain the filters, so an apt name might be sitename-filtered.

Now we have our 2 views.

Blocking Google Analytics Spam

A few points to note before we begin setting up the spam filters.

– According to Google, any changes we make, any filters we install, will officially take up to 24 hours to come into effect. In our experience this takes minutes, but following what Google say it may take longer.

– You will only install filters in the filtered view selected above. Do not create these filters in both views (only for the sake of repeating work and gaining no benefit).

A) Ghost Spam

The filter we will set up for blocking ghost spam will stop it from affecting referral, organic, page and language data.

This method will show you how to create a Valid Hostname Filter, which is the most effective filter against ghost spam. As we are filtering your hostnames you will be in control of the incoming data, and below we’ll explain why.

Ghost spam, as seen in the diagram above, never visits your site. It uses a measurement protocol to reach your Google Analytics account directly. Using this method to spam an account will always leave a fake hostname, or leaves an undefined hostname, in your report.

In creating a filter that authorises which hostname (yours) can impact your analytics data, we can exclude automatically all ghost traffic.

Be careful, hostname and source are not the same thing. Source is where your visitors come from to land on your site. Hostname is where the visitor arrives to.

Creating a Valid Hostname Filter

Note; before we begin, make sure you are in your filtered view.

1) In the Reporting section, first select a wide time frame (to capture all data), and go to the Audience reports in the navigation menu on the left.

2) Select Technology, and then in the sub menu select Network.

3) Within the page, select Hostname. This will report all the hostnames associated with your data both real and fake.

4) Make a list of all the relevant hostnames you find. At least one will be your primary domain.

Other valid hostnames depend on your site configuration and the services you’re using your tracking code with. A few examples could be:

– Your main domain (in our case Leadfreak.co.uk)
– Translation services (Bing)
– CDNs (Cloudfare)
– Cache Services (Google cache)
– Payment Services (PayPal)
– Subdomains (support.leadfreak.co.uk)
– Shopping carts (Spotify)
– Video hosts (YouTube)
– IP’s

Essentially, all the hostnames in this report that you do not control, or do not know, will be invalid hostnames. These include:

– Hostnames linking to spam websites
– Known sites that may not look like spam but you do not control. E.g. google.com, mashable.com, blog.google.com.
– Common spam hostname (not set), where the spammer hasn’t been bothered writing even a fake name.

From the report above, we can identify the valid hostname as leadfreak.co.uk.

5) Create your hostname expression

Once you have gathered together your valid hostnames you need to create a hostname filter expression.

– To separate each hostname, you should use a bar or pipe character |, this works as OR, if you can´t find it, hold Alt + 124(Numeric pad)

– The dot . and the hyphen – are considered special characters in REGEX so you should add a backslash \ before them.

– Try to find a good way to match as many hostnames as you can, for example, if you want to match blog.leadfreak.co, uk.leadfreak.co, www.leadfreak.co, you don’t need to add all of them to the expression entering leadfreak, will be enough to match all of them (just avoid using common names).

– Don’t leave any spaces.
– The REGEX has a limit of 255 characters if your expression exceeds this limit try to optimize it to keep everything under one expression because you can only have 1 Include hostname filter.

– Don’t add a pipe/bar |, at the beginning or the end of the expression.

As an example, a hostname filter expression could be:

leadfreak\.co|translate\.service\.com|videoservice\.com|webcache\.service\.com|mail\-list\-manage\.com

Make sure you add all the relevant hostnames otherwise you will lose valuable data. To make sure, you could always create a Test view (following the same instructions as earlier) and run the filter in this view for a few days.

6) Create a Valid Hostname Filter.

a) Go to the Admin tab, make sure you are on the filtered view selected earlier.
b) Select Filters under the view column and select + Add Filter
c) Enter a relevant name such as Valid Hostnames
d) In Filter Type select custom.
e) Important: We are creating an “Include” filter for valid hostnames. Make sure to select this type.
f) Open the Filter Field box, and search for Hostname. Select Hostname.
g) Copy and paste your hostname expression into the Filter Pattern field. Test the filter using “Verify This Filter”.

h) After making sure your filter is okay, select Save.

B) Crawler Spam

We will now create a crawler spam filter.

Crawler spam is harder to detect as it uses a valid hostname, so we need to install a different filter with an expression that matches all known crawler spam.

We use an optimised Regular Expression (REGEX) that you will see below. We will set up filters like those in the above method, but in place of an include & hostname filter, we will set up an exclude & campaign source filter.

a) Go to the Admin tab, make sure you are on the filtered view selected earlier.
b) Select Filters under the view column and select + Add Filter
c) Enter a relevant name such as Crawler Spam
d) In Filter Type select custom.
e) Important: We are creating an “Exclude” filter for crawler spam. Make sure to select this type.
f) Open the Filter Field box, and search for Campaign Source. Select Campaign Source.
g) Copy and paste the crawler spam expression into the Filter Pattern field.

Crawler Spam Expression – Create 1 filter for each expression

Crawler Spam Expression #1
(best|dollar|success|top1)\-seo|(videos|buttons)\-for|anticrawler|^scripted\.|semalt|forum69|7makemon|sharebutton|ranksonic|sitevaluation|dailyrank|vitaly|profit\.xyz|rankings\-|dbutton|uptime(bot|check|\.com)

Crawler Spam Expression #2
datract|hacĸer|ɢoogl|responsive\-test|dogsrun|tkpass|free\-video|keywords\-monitoring|pr\-cy\.ru|fix\-website|checkpagerank|seo\-2\-0\.|platezhka|timer4web|share\-buttons|99seo|3\-letter|top10\-way

Test the filter using “Verify This Filter”.

h) Click Save.

C) Fake Language Spam

The hostname filter will contain much of the fake language spam, but we are to create a filter that will capture the hybrid crawler/ghost spam.

Again, we have utilised an optimised expression that you can use in your filter set up.

a) Go to the Admin tab, make sure you are on the filtered view selected earlier.
b) Select Filters under the view column and select + Add Filter
c) Enter a relevant name such as Language Spam
d) In Filter Type select custom.
e) Important: We are creating an “Exclude” filter for crawler spam. Make sure to select this type.
f) Open the Filter Field box, and search for Language Settings. Select Language Settings.
g) Copy and paste the language spam expression into the Filter Pattern field.

Language Spam Expression
\s[^\s]*\s|.{15,}|\.|,

h) After making sure your filter is okay, select Save.

D) Excluding Internal IP Traffic

Internal IP data isn’t technically spam, but it is data that you don’t want to be reflected in your analytics. Internal IPs are the computers you use to visit, make amendments, and manage your website.

To find your IP address, click HERE

For this filter, you will need yours and your team members IP addresses.

a) Go to the Admin tab, make sure you are on the filtered view selected earlier.
b) Select Filters under the view column and select + Add Filter
c) Enter a relevant name such as IP Traffic
h) In Filter Type select custom.
i) Important: We are creating an “Exclude” filter for IP traffic. Make sure to select this type.
j) Open the Filter Field box, and search for IP Address. Select IP Address.
k) Copy and paste the IP address / network IP address into the Filter Pattern field.

l) After making sure your filter is okay, select Save.

E) Enabling “Exclude all hits from known bots and spiders”

Known bots and spiders that crawl your site for indexing purposes will leave a trace that will be included within your traffic data. It is best practice to stop this data being recorded.

1) As earlier in the tutorial, head to the Admin tab and to the View column.
2) Select View Settings
3) Scroll to the bottom of the View Settings page and select the tick box “Exclude all hits from known bots and spiders”
4) Click Save.

Removing Historical Data From Your Results

In this section, we are going to run through the method of removing the historical spam data from view. Historical data in Google Analytics cannot be deleted, but we can hide it using segment views.

a) Go to the Reporting Tab, where it shows the blue circle named All Users, click here.
b) Click the + NEW SEGMENT button

c) In the segment window, on the left navigation menu, select Conditions (near the bottom).

Following the below image and instructions;

1) First Condition
a) Filter > Sessions > Include
b) Dropdown 1 > Hostname
c) Dropdown 2 > matches regex
d) Textbox > Paste the Hostname Expression used in the earlier filter.

2) Click + Add Filter

3) Second Condition
a) Filter > Sessions > Exclude
b) Dropdown 1 > Source
c) Dropdown 2 > matches regex
d) Textbox > Paste the below crawler spam expression

(best|dollar|success|top1)\-seo|(videos|buttons)\-for|anticrawler|^scripted\.|\-gratis|semalt|forum69|7make|sharebutton|ranksonic|sitevaluation|dailyrank|vitaly|profit\.xyz|rankings\-|dbutton|\-crew|uptime(bot|check|\.com)|datract|hacĸer|ɢoogl|responsive\-test|torrent\-to|magnet\-to|dogsrun|tkpass|free\-video|keywords\-monitoring|pr\-cy\.ru|fix\-website|checkpagerank|seo\-2\-0\.|platezhka|timer4web|share\-buttons|99seo|3\-letter|top10\-way

4) Click the “Or” button on the right-hand side of the condition you just completed

5) Third Condition
a) Dropdown 1 > Language
b) Dropdown 2 > matches regex
c) Textbox > Paste the below Language spam expression

\s[^\s]*\s|.{15,}|\.|,

6) Enter “All Users – Clean” as a name for the segment and Save.

After saving the segment you will be able to see spam free reports as long as the segment is selected. Eventually you will not need to use this as the filters start to remove the spam data.

END OF GUIDE


Questions & Feedback

We’ve tried to cover every important detail in this article, however, if there are any parts of this guide where you have struggled, or it hasn’t been clear, please get in touch and we’ll happily clarify this for you.

You can download this guide clicking here.

Leave a Comment