Introduction

Sitemaps are XML files sitting on your web server that are used to give hints to search engines about the existence of new pages, how often pages change and the priority a search engine should give to each page.

The format is very simple to understand and complete documentation is available on sitemaps.org. There is a root urlset element which contains multiple url elements. Each of these url elements must contain the loc element which specifies the location of that URL. It can optionally contain lastmod, the date and time at which the page was last modified, changefreq, how often you expect the page to change and priority, the priority of that page compared to others on your website.

Implementing sitemaps in Rails

Rails makes it very easy to generate your own sitemap.xml file dynamically. The sitemap can be implemented using a simple controller to generate a list of pages and a view that use Builder to generate the XML output.

The first step is to create the route in the config/routes.rb file, which will look like the this:

get 'sitemap', :to => 'sitemap#show'

This route simply routes GET requests for sitemap.xml, to the SitemapController’s show method. You do not need to worry about specifying .xml in the route, Rails will automatically figure it out and include the correct view.

The next step is to create the controller in the app/controllers/sitemap_controller.rb file. This needs to fetch information about all of your pages from the database or disk or wherever you are storing them, so the code will vary. In this example I’ll pretend that I’ve got a Photo model and a page for each photo, along with a Link model used by a single page of links.

class SitemapController < ApplicationController
  def show
    # grab info about all the photos since they each have their own page
    @photos = Photo.all

    # grab info about the most recently-updated link as they share a page
    @link = Link.first :order => 'update_at desc'
  end
end

This is fairly simple code that should be easily understood by anyone with a basic understanding of ActiveRecord.

The final part is to create a view called app/views/sitemap/show.xml.builder. The name signifies that this is a view for the XML format (as I said earlier, Rails uses this to automatically detect the view to use) using Builder as a template engine.

Builder uses a domain-specific language implemented on top of Ruby to allow you to easily create XML documents.

In general, your calls to Builder look like this:

xml.tag_name :attribute => 'value of attribute' do
  # nest tags here, can include ruby code to do loops, etc.
end

Here is the code in the view that we can use to produce the sitemap:

# this produces the <?xml ... ?> tag at the start of the document
#   note: this is different to calling builder normally as the <?xml?> tag
#         is very different to how you'd write a normal tag!
xml.instruct! :xml, :version => '1.0', :encoding => 'UTF-8'

# create the urlset
xml.urlset :xmlns => 'http://www.sitemaps.org/schemas/sitemap/0.9' do
  # photo pages
  @photos.each do |photo|
    xml.url do # create the url entry, with the specified location and date
      xml.loc photo_url(photo)
      xml.lastmod photo.updated_at.strftime('%Y-%m-%d')
    end
  end

  # links page
  xml.url do
    xml.loc links_url
    xml.lastmod @link.updated_at.strftime('%Y-%m-%d')
  end
end

We’re almost done now. First I’d fire up WEBrick and test to see if it works fine and fix any issues. The last step is to create or modify your robots.txt file to specify the location of the sitemap.xml file.

To do this, simply add a line like the following to the bottom of your public/robots.txt file:

Sitemap: http://example.com/sitemap.xml

where example.com is your domain name.

Caching your sitemap

If you’re running a small site then you probably don’t need to worry about this. However, on a large site each request to your sitemap could end up pulling a large amount of data from your database so you may wish to cache it to speed things up.

On my sitemap I use page caching, to enable this you simply need to add the following line to the top of your controller:

caches_page :show

This will make Rails save the page to the disk when it is generated, so Rails isn’t even involved when a client requests the page for the second time.

However, the final thing you need to do is to expire the sitemap when something changes. To do this you need to simply add the following snippet of code:

expire_page :controller => :sitemap, :action => :show

You might also want to look into sweepers to avoid copying and pasting that snippet of code around everywhere in a complex application, but if you’re just running a simple blog or personal site the code above will probably be sufficient.

Submitting your sitemap to search engines

Once you’ve done, you’ll probably want to submit your sitemap to search engines. Generally this happens automatically, but some of them provide tools to see how often they look at your sitemap or if there are any problems with it.

For Google, you can do this with the webmaster tools. The official sitemaps website also has more information about this.