Could the New Google Spider be
causing issues with websites?
Google Error Page
Around the time
Google announced ďBig Daddy,Ē there was a new Googlebot roaming the
web. Since then Iíve heard stories from clients of websites and
servers going down and previously unindexed content getting indexed.
I started digging
into this and youíd be surprised at what I found out.
First, lets look
at the timeline of events:
September some astute spider watchers over at Webmasterworld spotted
unique Googlebot activity. In fact, it was in this thread that the
bot was first reported on. It concerned some posters who thought
that perhaps this could be regular users masquerading as the famous
Early on it also
appeared that the new bot wasnít obeying the Robots.txt file. This
is the protocol which allows or denies crawling to parts of a
on what the new crawler was until Matt Cutts mentioned a new Google
test data center. For those that donít know, Matt Cutts is a senior
engineer with Google and one of the few Google employees talking to
us ďregular folk.Ē This mention happened in November.
There wasnít much
mention of Big Daddy until early January of this year when Matt
again blogged about it asking for feedback.
Much feedback was
given on the accuracy of the results. There were also those that
asked if the Mozilla Googlebot (known as ďMozilla/5.0 (compatible;
in your visitor logs) and Big Daddy were related, but no response
Now Iím going to
begin some of my own speculation:
I do in fact
believe the two are related. In fact, I think this new crawler will
eventually replace the old crawlers just as Big Daddy will replace
the current data infrastructure.
Why is this
important? Based on my observations, this crawler may be able to do
so much more than the old crawler. For one, it emulates a newer
browser. The old bot was based on the Lynx text based browser.
While Iím sure Google added features as time went on, the basic Lynx
browser is just that ≠ basic.
However, with the new spider, built on the Mozilla engine, there
are so many possibilities.
Just look at what
your Mozilla or Firefox browser can do itself ≠ render CSS, read and
browsers. But thatís not all.
Iíve talked to a
few of my clients and their sites are getting hammered by this new
spider. It has gotten so bad that some of their servers have gone
down because of the volume of traffic from this one spider!
On the plus side,
I have clients who went from a few hundred thousand indexed pages to
over 10 million in just a few weeks! Literally since December, 2005
thereís been a 3500% increase in indexed pages over an 8 week
period! Just so you know, this is also the clientís site that went
down because of the huge volume of crawling happening.
But thatís still
not all. I have another client which uses IP recognition to serve
content based on a personís geographic location. If you live in the
US you get American content and pricing; if you live in the UK you
get UK content and pricing. As you may imagine, the UK, US,
Canadian and Australian content is all very similar. In fact about
the only thing noticeably different is the pricing aspect.
This is my
concern ≠ if the duplicate content gets indexed by Google what will
they do? Thereís a good chance that the site would be penalized or
even banned for violation of the webmaster quality guidelines set
forth by Google.
This is why we
implemented IP recognition ≠ so that Googlebot, which crawls from US
IP addresses only sees one version of the site. However, a review of
the server logs shows that this new Googlebot has been visiting not
only the US content but also the content of the other sections of
the site. Naturally, I wanted to verify that the IP recognition was
working. It is. This leads me to wonder then; can this browser
spoof its location and/or use a proxy?
Imagine that ≠
the browser is smart enough to do some of its own testing by viewing
the site from multiple IP addresses. If thatís the case then those
who cloak sites are going to have problems. In any case, from the
limited observations Iíve made, this new Google ≠ both the data
center and the spider ≠ are going to change the way we do things.
If you have
experienced anything similar in the past few months to do with
Google, be sure to add it to our comments section below.
Rob Sullivan is a SEO Consultant
and Writer for
http://www.textlinkbrokers.com. Textlinkbrokers is the trusted
leader in building long term rankings through safe and effective
Keywords and misspellings: website marketing
design brwser web site designing promation promoting