Dynamic Sitemap with Codeigniter

Sitemaps are a great tool to help your site’s SEO. A Sitemap is just a list of URL’s for your site that search engines may consider when indexing your website. Without a Sitemap search engines have to organically discover pages in your site by following links. With a Sitemap you get an advantage by presenting the list of pages you believe are important for indexing. There is no guarantee that Google or other search engines will actually index each page in your Sitemap, but without a Sitemap are you just relying on simple discovery. A great resource to start with is Google Webmaster on Sitemaps.

On a recent project for a recipe site using Codeigniter, I wanted to create a Sitemap that would be dynamically updated every day as recipes are added, deleted or modified. My plan was:

  1. Write a library class and method to regenerate the sitemap.xml file with the current list of links along with the last modified date.
  2. Call this method from the command line interface (CLI) as needed.
  3. Schedule a cron job to run this command daily. I considered updating the sitemap “live” as recipes were added or updated, but I didn’t want the additional page submit overhead on committing each recipe. Plus, a daily update is more than enough for most search engines.
  4. Finally submit the updated Sitemap to Google.

Most of the plan went well, but I hit a snag attempting to generate the sitemap from the command line on the host for our website, Hostmonster. That was the hardest part to debug, but read on for the solution.

The Sitemap

The goal is to create an XML file that adheres to the Sitemap Protocol syntax. The resulting file should look like this:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>http://example.com/</loc>
  </url>

  <url>
    <loc>http://example.com/main/about</loc>
  </url>

  <url>
    <loc>http://example.com/recipe/show/1/Brocolli-Salad</loc>
    <lastmod>2012-07-12</lastmod>
  </url>

  <url>
    <loc>http://example.com/recipe/show/3/Chicken-with-Peanut-Curry-Sauce</loc>
    <lastmod>2012-06-12</lastmod>
  </url>
 </urlset>

For small websites with less than 50,000 links (and results in a file less than 50MB in size) a single Sitemap file is fine. If your site is larger, then you can break this into multiple Sitemaps with a Sitemap Index file to reference the separate Sitemaps. For most of us, this is not an issue. I’ve also included last modified dates where possible; there are other properties you might consider including as well.

The Custom Library

This library class contains all the code to generate the Sitemap. We could have simply put this all in a controller method, but it seemed cleaner to keep this as a custom library. Change the file prefix from MY_ to the custom prefix you set in $config['subclass_prefix'].

 

<?php if ( ! defined('BASEPATH')) exit('No direct script access allowed');

/**
 *  Sitemap Class
 *  Copyright (c) 2008 - 2013 All Rights Reserved.
 *
 *  Props to Mike's Imagination for the approach
 *  http://www.mikesimagination.net/blog/post/29-Aug-12/Codeigniter-auto-XML-sitemap
 *
 *  Generates sitemap
 */

class MY_sitemap {

    // CI instance property
  protected $ci;

  /**
   *  Constructor
   */
  public function __construct()
  {
      // Get the CI instance by reference to make the CI superobject available in this library
    $this->ci =& get_instance();
  }

  /**
   *  Generate sitemap
   */
  public function create()
  {
    // Begin assembling the sitemap starting with the header
    $sitemap = "<\x3Fxml version=\"1.0\" encoding=\"UTF-8\"\x3F>\n<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">\n";

    // Add static pages not in database to sitemap
    // Home page
    $sitemap .= "\t<url>\n\t\t<loc>" . site_url() . "</loc>\n\t</url>\n\n";
    // About page
    $sitemap .= "\t<url>\n\t\t<loc>" . site_url('main/about') . "</loc>\n\t</url>\n\n";

     // Get all recipes (records) from database. Load (or autoload) the model
    $this->ci->load->model('recipe_model');
    $recipes = $this->ci->recipe_model->find_where();

    // Add each recipe URL to the sitemap while enclosing the URL in the XML <url> tags
    // Since my database tracks the last updated date, I am including that as well - but with the date only in YYYY-MM-DD format
    foreach($recipes['results'] as $recipe)
    {
       $sitemap .= "\t<url>\n\t\t<loc>" . site_url('recipe/show/' . $recipe->get_nice_url()) . "</loc>\n";
       $sitemap .= "\t\t<lastmod>" . date('Y-m-y' ,strtotime($recipe->updated_date)) . "</lastmod>\n \t</url>\n\n";
    }

    // If you have other records you wish to include, get those and continue to append URL's to the sitemap.

    // Close with the footer
    $sitemap .= "</urlset>\n";

    // Write the sitemap string to file. Make sure you have permissions to write to this file.
    $file = fopen('sitemap.xml', 'w');
    fwrite($file, $sitemap);
    fclose($file);

    // If this is the production instance, attempt to update Google with the new sitemap.
    // (The instance is set in the index.php file)
    if(ENVIRONMENT === 'production')
    {
      // Ping Google via http request with the encoded sitemap URL
      $sitemap_url = site_url('sitemap.xml');
      $google_url = "http://www.google.com/webmasters/tools/ping?sitemap=".urlencode($sitemap_url);

      $ch = curl_init();
      curl_setopt($ch, CURLOPT_CONNECTTIMEOUT,2);
      curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
      curl_setopt ($ch, CURLOPT_URL, $google_url);
      $response = curl_exec($ch);
      $http_status = curl_getinfo($ch, CURLINFO_HTTP_CODE);

      // Log error if update fails
      if (substr($http_status, 0, 1) != 2)
      {
        log_message('error', 'Ping Google with updated sitemap failed. Status: ' . $http_status);
        log_message('error', '    ' . $google_url);
      }
    }

    return;
  }
}

// End of file MY_sitemap
// Location: ./application/libraries/MY_sitemap.php

The Controller

We are going to invoke the create() method from the command line using the CLI feature in Codeigniter. However the Codeigniter CLI utility will accept arguments as controller + method + parameters; so we need a simple controller method to call the library. Because we don’t want to refresh the Sitemap if someone stumbles on to this URL, we can add a check to see if this was called from the CLI or not with the $this->input->is_cli_request() function. Add something like this to an appropriate controller class:

/**
 *  Updates Sitemap.xml when called from the command line. Not available via URL
 */
public function generate_sitemap()
{
  // If not a command line request
  if( ! $this->input->is_cli_request())
  {
    // 404 error or maybe just redirect somewhere else
    show_404();
  }
  else
  {
    $this->load->library('PP_sitemap');
    $this->pp_sitemap->create();
  }
}

Running the Sitemap Generator

After moving all the code to production you will want to test this out on your hosting server. According to the Codigniter CLI documentation, the command should be:

$ php index.php controller method [arguments]

So in our case the command should be:

$ php index.php [your controller] generate_sitemap

But when I ran this statement, the application just returned the HTML from the default controller to my shell! For some reason it was not accepting the [your controller] generate_sitemap as arguments, and was simply invoking the index.php file. I was stumped. After much Googling, I found a clue:

# php -v
PHP 5.2.17 (cgi-fcgi) (built: Oct 29 2012 18:51:17)

The default PHP binaries were complied as CGI-FCGI, not as the Command Line Interface that I was seeking. Instead, I needed to reference the PHP CLI binaries in /ramdisk/bin/php5-cli:

# /ramdisk/bin/php5-cli -v
PHP 5.2.17 (cli) (built: Oct 29 2012 18:51:22)

That was it, now the proper command to run this from the command line is:

/ramdisk/bin/php5-cli ~/public_html/sitefolder/index.php main generate_sitemap

And it worked! A sitemap.xml file was generated in the root of the website public folder. To schedule the cron job I simply used the utility in the cPanel, although you could also use the command line, too. I scheduled it to run once a day at night. I checked the application log file after the first run to make sure there was no error in submitting the updated Sitemap to Google.

Publishing Your Sitemap.xml

Now that we have a Sitemap, you need to tell the search engines where to look. You can do this with your robots.txt file. If you don’t have a robots.txt file, then just create one in your site root and add this line (or add the line to your existing robots.txt):

Sitemap: http://perisplaceforrecipes.com/sitemap.xml

That’s all. If you have a Google Webmasters account, then go there and view your site Optimization > Sitemaps. There you can test whether Google can view your sitemap. Google also has a sitemap validator, which would be a good thing to check.

Although it’s too early to tell if we are getting any better SEO, I do know from Google Webmaster data that Google has now indexed 20 more pages than before (after just one day). Well worth the effort.

What Do You Think?

* Required