2010年1月26日 星期二

Indexation for SEO: Real Numbers in 5 Easy Steps

Posted by randfish

How many pages has Google indexed?


This question and the problems surrounding it run rampant through the SEO world. It usually arises when someone starts doing searches like this:


Indexation of SEOmoz According to Google


Google claims to have 93,800 pages indexed on the root domain, seomoz.org. That sounds pretty good, but when I ran that search query last week, the number was closer to 75,000 and when I run it again from Google.co.uk 60 seconds later, the number changes even more dramatically:


Indexation of SEOmoz.org on Google.co.uk


How about if I hit refresh on my Google.com results again:


Indexation on Google.com 3 minutes later


Doh! Google just dropped 8,500 of my pages out of their index. That sucks - but not nearly as much as managers, marketing directors and CEOs who use these numbers as actual KPIs! Can you imagine? A number that means nothing, fluctuates 300% between data centers, can change at a moment's notice and provides no actionable insight being used as a business metric?


And yet... It happens.


Fortunately, there's an easy way to get much, much better data than what the search engines provide through "site:" queries and this post is here to walk you through that process step-by-step.


Step 1: Go to Traffic Sources in Your Analytics


Google Analytics Step 1


Click the "traffic sources" link in Google analytics or Omniture (it can also be called "referring sources" in other analytics packages).


Step 2: Head to the Search Engines Section


Step 2 of the Indexation Process


We want to find out how many pages the search engines have indexed, so the obvious next step is to go to the "search engines" sub-section.


Step 3: Choose an Engine


Step 3: Choose an Engine 


Choose the engine you want indexation data on and click. We're not far now.


Step 4: Filter by Landing Pages


Step 4: Filter by Landing Page


The "Landing Page" filter in the dropdown will show you the traffic each individual page on your site received from the engine you've selected. This also produces the magical "total" number of pages that have received traffic, described in the last step.


Step 5: Record the Number at the Bottom


Step 5: Indexation Count Arrives


That count tells you the unique number of pages that received at least one visit from searches performed on Google. It's the Holy Grail of indexation - a number you can accurately track over time to see how the search engine is indexing your site. On its own, it isn't particularly useful, but over time (I usually recommend recording monthly, but for some sites, every 2-3 months can make more sense), it gives you insight into whether your pages are doing better or worse at drawing in traffic from the engine.


Now, technically I'm being a bit cheeky here. This number doesn't tell you the full story - it's not showing the actual number of pages a search engine has crawled or indexed on your site, but it does tell you the unique number of URLs that received at least 1 visit from the engine. In my opinion this data is far more accurate and more actionable. The first adjective - accurate - is hard to argue (particularly given the visual evidence atop this post), but the second requires a bit of an explanation.


Why is Number of Pages Receiving ≥1 Visit Actionable?


Indexation numbers alone are useless. Businesses and websites use them as KPIs because they want to know if, over time, more of their pages are making their way into the engines' indices. I'd argue that actually, you don't care if your pages are in the indices - you care if your pages have the opportunity to EARN TRAFFIC!


Being a row in a search index means nothing if your page is:



  • too low in PageRank/link juice to appear in any results

  • displaying content the engines can't properly parse

  • devoid of keywords or content that could send traffic

  • broken, misdirected or unavailable

  • a duplicate of other pages that the engine will rank instead


Thus, the metric you want to count over time isn't (in most cases) number of pages indexed, it's number of pages that earned traffic. Over time, that's the number you want to rise, the number you want marketers to concentrate on and the KPI that's meaningful. It tells you whether the engine is crawling, indexing AND listing your pages in the results where someone might (has) actually click(ed) them.


If the number drops, you can investigate the actual pages that are no longer receiving traffic by exporting the data to Excel and doing a side-by-side with the previous month. If the number rises, you can see the new pages getting traffic. Those individual URLs will tell a story - of pages that broke, that stopped being linked-to, that fell too far down in paginated results or lost their unique content. It's so much better than playing the mystery game that SEOs so often confront in the face of "lower indexation numbers" from the site: command.


I'd, of course, love your feedback. I know many SEOs are addicted to and supportive of the site: command numbers as a way to measure progress, so maybe there's things I'm not considering or situations where it makes sense. I also know that many of you like the number reported in Google Webmaster tools under the Sitemaps crawl data (I'm skeptical of this too, for the record) and I'd like to hear how you find value with that data as well.


p.s. Tomorrow we'll be announcing two webinars (open to all) about using Open Site Explorer to get ACTIONABLE data. Be sure to leave either Wednesday the 27th at 2pm Pacific or Thursday the 28th at 10am Pacific free :-)


Do you like this post? Yes No



http://tinyurl.com/yb443ax