Rabu, 05 Desember 2007

How to Get Out of the Supplemental Index Google Sitemaps

I'm trying to do a little maintenance on the blogging software I use to run this blog. For historical reasons, I use blojsom, although most of my other blogs are done using WordPress. Unfortunately, blojsom simply isn't as well-supported as WordPress, which means I haven't been able to find an equivalent to the Google Sitemap Generator for WordPress, although in honesty I could have gotten off my butt and written one myself. And I have, sort of, but that's a different post. There's now an official sitemap.xml file for this site that's been submitted to Google and hopefully will fix some of the wonkiness that's happened with this blog. That's what I want to talk about in this post: using Google Sitemaps to better index your blog or site.



What is a Google Sitemap?


You probably understand what a sitemap is, and it's something I recommend that each site have. A sitemap is just an HTML file that lists all the pages (or all the major pages) of your site. It's a simple way for both humans and search engines to find the other pages on your site.


In June of 2005, however, Google took the sitemap concept further and introduced Google Sitemaps, which is now part of Google's expanded set of Google Webmaster Tools.


A Google Sitemap is like a regular sitemap, except it's not HTML. It's in XML format, which looks a lot like HTML except that the tags are different. Here's a very simple sitemap for one of my sites:


<?xml version="1.0" encoding="UTF-8"?>

<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
<url>
<loc>http://www.NoDebtIsGood.com/</loc>
<priority>1.0</priority>
<changefreq>daily</changefreq>

</url>
</urlset>

Create a file like this called sitemap.xml at the root of your site or blog and add a <url> entry for each page you want indexed by Google. (Yahoo! and MSN will soon be supporting this format, too, by the way. You can find out more at Sitemaps.org, the new site for the common Sitemaps format.) All the gory details are found on Using the Sitemap Protocol.



Why Use Google Sitemaps?


So why use Google Sitemaps if you already have a perfectly good HTML sitemap? There are different reasons. One is completeness. Your human-readable sitemap may not list every page on your site in order to prevent information overload and also to avoid being flagged by the search engines for excessive linking. Another reason is that they provide more information about the page — how often it updates, when it was last updated, how important it is relative to other pages — than a simple link provides.


A really good reason, though, is to help deal with site mishaps. Say you've got pages accessible via a number of different URLs, possibly because you screwed up the configuration of the site once or twice. With a proper Sitemap you can “normalize” the site by providing Google with the list of “real” URLs and then taking steps to redirect all the other URLs to the normalized version. (This may or may not be simple to do, though.)


Supplemental Index Hell


Another reason to use Google Sitemaps is to get out of Google's supplemental index and back into the main index. The supplemental index where misbehaving pages go, although the Google Webmaster FAQ doesn't put it so kindly. Google likes to index those bad pages for completeness, but avoids exposing them to searchers unless absolutely necessary. It's like being in Limbo, the first circle of Hell.


Sometimes you get into the supplemental index by accident more than anything. Having duplicate pages is one way. Having confused URL structures is another. This blog's been in supplemental hell for a while now, but I was just too lazy (again) to do anything about it. The new sitemap for the site to normalize the URL structure is my first step in jumping back to the main index. I'm not expecting that a sitemap by itself will do this, I'll also need to fix the ways the blog entries link to each other, put some nofollows on certain links, and noodle with the robots.txt file. But hopefully I'll be able to make the transition within a month or two.



If you're running a site and are having problems getting properly indexed, try building and submitting a sitemap. A good place to start is with the Sitemaps FAQ.


Related Post :



0 comments:

R