2009年11月30日 星期一

Xenu's Link Sleuth - More Than Just A Broken Links Finder

Posted by Tom_C

There are literally a bazillion SEO tools on the internet (literally!), this post discusses just one such tool; Xenu's Link Sleuth. Many people in the SEO industry are already aware of this tool but many people I've spoken to only treat the tool as a broken link finder. It's so much more than that.



This post is aimed at those who haven't heard of it before and those who do use it regularly - there are lots of nifty features that solve all kinds of SEO-problems and hopefully beginners and advanced alike will learn something from this post.



What is Xenu?



Xenu's Link Sleuth is a FREE download (everyone loves free) that runs on all versions of Windows (but not quite on Macs unfortunately). It's a lightweight download and I've never had issues with it crashing or hanging. In a nutshell it's a site crawler and once you point it at a URL it will crawl around the site and spit out a report at the end. It's main focus and branding is all about finding broken links on your site (so where you link internally to a 404 error) but I've found that I use it to solve a whole host of different SEO-related issues which I will explain below.



Xenu's Link Sleuth




Problem - How do I find broken links on my site?



This is the most basic use of Xenu in my opinion, but also the most common use. Simply point the program at the homepage of your site, check 'skip external' to avoid it crawling the entire web, and set it going!








Click here to view a sample report provided by Xenu for the Distilled site (note that this is a sample report only, not run across the whole site).



You can see that there is a handy section which reports any broken links that it finds, though in this case I've chosen a rather poor example since there are no broken links on the homepage of Distilled :-)



Problem - How do I get a crawl of my site into microsoft Excel?



The answer to this one, as you may have guessed is also Xenu! Simply choose the following menu option once the report is run:







Click here for a google docs of a sample report from the Distilled site. As you can see you get some really useful data such as:


  • The status code of all pages crawled

  • The type of page crawled

  • The title tag of each page crawled


Problem - How do I check the length of my title tags across my whole site?



Looking at the above data sheet - simply filter for html pages and then check the length of the column titled "Title" - this will give you the length of the title tag. Filter for any above 65 and bingo - there's your to-do list!



Problem - How do I analyse my site's information architecture?



Yep, you guessed it - Xenu will do this too. This one requires a little more explaining however. Firstly, you see that in the spreadsheet above there is a column for "level" - what this column tells you is the number of links away from the initial link that you entered the crawled page is. So in the example sheet all the pages have a level of 1 since I restricted the crawl to just those pages 1 link away from the homepage.




This is really useful information as it tells you how many clicks it takes to get to a given page on your site from the homepage. Useful information! Especially in a large site where you have multiple levels of information architecture and several different types of navigation. Below is a quick screenshot of a report run 3 levels deep on the site. I've pivot-tabled the data (zomg - excel ftw) and selected the following options:








Of course, the beauty of pivot tables is that I can double click each of those rows and see which pages are contained within each level. This is of course, a pretty basic application of the data. But you see that once you start getting more data you can do more powerful things.



The second application of the very same data is the useful links in/links out column which looks like this:







There are other ways of getting this data for your site, Linkscape does it for example, but the good thing about Xenu is that you get the data structured in Excel and you have all the other page metrics alongside it. There's plenty more you can do with this but at a very crude level you can use it to identify pages with more than 100 links on the page across your site!



Taking this data to the next level - here's a glimpse at what's possible, an analysis of type of page vs number of internal links shows you that for this site (not the distilled site) the money pages are getting very few internal links compared to top level pages and something is broken in the information architecture:










Problem - How do I find any 302 redirects on my site?



Xenu to the rescue! In order to catch redirects on your site you need to modify one of the settings on the crawl preferences to "treat redirects as errors":












Then, when you run the report and export to excel redirects will no longer get the status code 200 but will get the true status code, be it 301 or 302! Perfect.



Problem - How do I check the indexing of a test version of my site?



Xenu of course! If your test version lies at a public URL such as testsite.distilled.co.uk then you can just point Xenu at that URL. However, if that's not an option then you can even run Xenu off a local HTML file which is pretty nifty:








Problem - How do I generate an XML sitemap for my site?



Although there are many many ways of generating an XML sitemap for your site, Xenu does this in a quite nice (if not particularly customisable) way. This is perfect for small site owners with limited technical knowledge I think:








Problem - How do I find images missing alt text?



If only Xenu would do such a thing.... Wait, it does! Simply filter your excel download to image files, then the "Title" column is the alt text of the image:








Well that's just a few of the many many applications of the Xenu tool - hopefully it's inspired you to go out and give it a try - I know I use it a lot for all kinds of things. I mean, once you get your data into Excel the world is literally your oyster. Mmmmmm data oysters.



But wait! That's not all - I reached out to Rich Baxter as I know he's a very knowledgeable and smart SEO and he uses Xenu a lot. I asked him if he had any killer tips and here's his killer tip. Thanks a lot Rich for getting me this at short notice:



Crawling web directories, looking for errors (By Rich Baxter)



Xenu’s not just a great tool to look inside your own site, it’s also pretty powerful for crawling external resources like directories, particularly if you’re looking for a domain to buy.



Try crawling dmoz.org, being sure to restrict Xenu’s access to “editors.dmoz.org”, but allow the crawler to “check external links”.



not-founds




Quite quickly you’ll start finding “not found” URL errors from directory entries that might have been forgotten, on domains that may not yet have expired. Just sort by “status” in the crawl results table in Xenu. Here’s one I found earlier. I’m pretty sure that with the right offer via SEDO, the owner of fridgemagnet.org.uk (with its 634 sub domain links) might be interested in selling before the domain expires.



I’ve always found the “Copy URL”, Google cache and Wayback Machine links invaluable on a right mouse click on the results you’re interested in:








As a side note: If you are crawling external resources, try to be a good citizen and crawl slowly. Set your maximum threads to a very low level, so as not to get your IP banned by your target host.



Thanks Rich! Great tips. Let's get link sleuthing! If anyone has any other creative/useful uses for Xenu please share them in the comments.

Do you like this post? Yes No



http://tinyurl.com/ykhwsce