Advertisement
Promo

Online business Toolkit

Use the sitemap standard to help search engines

Tony Patton

Published: 11 Dec 2006 12:59 GMT

  • Email
  • Trackback
  • Clip Link
  • Print friendly
  • Post Comment

The goal of every website is to increase site visibility and user traffic. One way to increase site traffic is through search engine optimisation.

Another method is to use sitemaps, which allow you to specify what pages a search engine should process or index. The sitemap concept was originally developed by Google, with Yahoo and MSN recently agreeing to use the standard. This week, I examine the sitemap standard.

The need for a standard
Search engines use spiders to crawl the internet to locate pages and index them in their database. The process is resource intensive, and sometimes the pages you want indexed are overlooked or non-essential pages are indexed as well. A good example is Google's Googlebot spider that traverses the web for changes and new pages and indexes and ranks them accordingly.

Sitemaps provide a way for websites to specify what pages within the site should be indexed and what new content has been added. Basically, it provides a communication channel between the search engine and the site. Theoretically, it can ease the resource burden on search engine spiders by reducing what it processes, but currently sitemaps do not replace the crawling process.

What is a sitemap?
A sitemap is an XML file that contains a list of site URLs and related attributes detailing what should be indexed within a specific site. It must be UTF-8 encoded. The following XML elements are required in the sitemap file:

  • <urlset> — The file begins and ends with this tag, and the opening tag must include the namespace (xmlns) attribute.
  • <url> — Each page included in the file is enclosed in this entity.
  • <loc> — The actual address of the page specified in the file. It is a child of the <url> element.

The following optional elements are available as well:

  • <lastmod> — A child of the <url> element. It specifies when the page was last modified.
  • <changefreq> — A child of the <url> element. It specifies how often the page changes (always, hourly, daily, weekly, monthly, yearly, and never).
  • <priority> — A child of the <url> element. It specifies the importance of the page in relation to other pages within the site with valid values of 0.0 to 1.0 and a default value of 0.5.

The following sample sitemap shows how these elements may be used for a sample site. It specifies the home page for a fictitious site, along with how often it changes, when it was last changed, and its priority within the site.

<?xml version="1.0" encoding="UTF-8"?>
<urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.test.com/</loc>
<lastmod>2006-11-20</lastmod>
<changefreq>daily</changefreq>
<priority>0.3</priority>
</url>
</urlset>

The location of the sitemap file is up to you, but its location determines the set of URLs that may be included in it. For example, if the previous sample sitemap file is located at http://www.test.com/sitemap.xml, then the sitemap file may include any URLs starting with http://www.test.com/. For this reason, it is suggested that sitemap files are placed in your site's root directory. The size of a sitemap file must not exceed 10 MB. If a sitemap exceeds this limit, you may compress it using gzip.

Creating a sitemap
Since XML is the basis for the sitemap, you can easily create and edit them via any text editor, but there are also special tools available. The following list provides a sample of currently available tools:

  • Node Map: A tool for the generation and validation of sitemap XML files.
  • Gsitemap: A sitemap generation tool built with the .NET Framework.
  • GSiteCrawler: A Windows-based tool for generating sitemap files.
  • phpSitemapNG: A free sitemaps generator written in PHP.
  • Google Sitemap Generator: A Python script that may be used to generate sitemap files.

Notifying a search engine
Once you have a sitemap file, it must be submitted to a search engine. Each search engine has its own interface for submitting sitemaps. Google includes a sitemap submission page as part of its webmaster toolset. You must sign up for an account before it can be used. Yahoo includes a freely available submission page for sitemaps, but you must sign up for an account before it can be used. Search engines will provide similar functionality as they follow the lead of Google, Yahoo, and MSN.

Another tool
The crawling process by which search engines index the web is slow and resource intensive. Sitemaps provide a way for web sites to specify what aspects of its contents are actually indexed for searching. They are created as simple text files formatted as XML, but there are plenty of tools available to assist you with their creation. At this time, they only serve as an addition to the current process.

  • Email
  • Trackback
  • Clip Link
  • Print friendlyPrint with EPSON

Did you find this article useful?
33 out of 57 people found this useful


Full Talkback thread

0 comments

Win a BlackBerry with Vlingo voice recognition

Win a BlackBerry with Vlingo voice recognition

What is ZDNet UK's usual tagline?

Competition closes - 14 Jan 2010

Video icon

Video

Google Chrome

Roundup: Full coverage of Google Chrome

The search giant has launched a beta of its own open-source browser, sending a clear challenge to Microsoft in the way it lets users work with applications More

Blog: Google Chrome has Microsoft's code inside, says MS manager

And furthermore, he says, that's a good thing... More

Blog: Google Chrome — nine things we've found since launch

Google must be very happy with the coverage Chrome has gathered. But it's not all good news... More


Skip Sub Navigation Links to CNET Brand Links

Help

Become part of the ZDNet community.

Newsletters