SEO Tips

How to Generate Sitemap.xml On The Fly In Umbraco 7.4.0?

In simple terms, a Sitemap is an XML file that is full of your individual webpage’s URLs. It’s like an archive of every webpage in your website. This file should be easily discoverable in your site in order for search engine crawlers to stumble upon it.

Simple Sitemap.xml Umbraco Handler

Umbraco is a fully-featured open source content management system with the flexibility to run anything from small campaign or brochure sites right through to complex applications for Fortune 500’s and some of the largest media sites in the world.

Sitemap.xml is important component of SEO which is responsible for indexing your website. Search engine robots are generating indexes of your website based on this file.

Since content in Umbraco website is dynamic, it makes not so much sence to have static sitemap.xml file for indexing your content. Instead it is a lot better to have this XML structure generated on the fly based on the content and structure.

The logic for generating sitemap.xml should be to include only those pages which are visible to the visitor, which means excluding pages which dot not have templates defined (data nodes in Umbraco for storing the data) and pages which are marked as invisible based on some property (usually boo property umbVisible)

This means the code which generates the output XML needs to take this in consideration when generating the XML.

Regarding the structure, it should follow recommendation from search engines such as Google as dominant search engine these days, but it also needs to simply with others like Bing and Yahoo. Recommendations about the stricture of sitemap.xml can be found in help documentation of these search engines.

Since this output does not need to be managed direly from the back-end as it automatically generates the output, I decided to implement it as HttpHandler. To make it work it need to be defined in web.config file.

<configuration>
	<system.webServer>
		<handlers>
			<add verb="*" path="sitemap.xml" name="Sitemap" type="Umbraco.Cms.Custom.SEO.SitemapHandler, Umbraco.Cms.Custom" />
		</handlers>
	</system.webServer>
</configuration>

Rapidly build your website using Umbraco with powerful API’s and easy extensibility. The Umbraco API gives you programmatic access to everything in the Umbraco CMS, plus the API is easy to use from Visual Studio or any other development tool.

The following code can be used out of the box and it mainly implements Google search engine recommended structure for sitemap.xml.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Web;
using System.Xml.Linq;
using Umbraco.Web;
using Umbraco.Core.Models;
using System.Xml;
using System.IO;
using System.Globalization;
using Umbraco.Core;
using Umbraco.Web.Security;
using System.Web.Caching;

namespace Umbraco.Cms.Custom.SEO
{
    public class SitemapHandler : IHttpHandler
    {

        public bool IsReusable
        {
            get { return true; }
        }

        public void ProcessRequest(HttpContext context)
        {
            UmbracoContext.EnsureContext(
                new HttpContextWrapper(HttpContext.Current),
                ApplicationContext.Current,
                true);

            GetSitemapXml(context);
        }

        private static readonly string CACHE_KEY = Guid.NewGuid().ToString();

        public static void ClearCache()
        {
            HttpContext.Current.Cache.Remove(CACHE_KEY);
        }

        private void GetSitemapXml(HttpContext context)
        {
            string uri = context.Request.Url.AbsoluteUri.ToLower();
            UmbracoHelper uHelper = new UmbracoHelper(UmbracoContext.Current);

            IPublishedContent siteRoot = uHelper.TypedContentAtRoot().First();
            HttpResponse response = context.Response;
            XDocument xdoc = null;

            if (context.Cache[CACHE_KEY] == null || !(context.Cache[CACHE_KEY] is XDocument))
            {
                xdoc = new XDocument();
                XNamespace ns = "http://www.sitemaps.org/schemas/sitemap/0.9";
                XNamespace xhtml = "http://www.w3.org/1999/xhtml";

                XElement root = new XElement("urlset",
                    new XAttribute("xmlns", ns),
                    new XAttribute(XNamespace.Xmlns + "xhtml", xhtml));

                xdoc.Declaration = new XDeclaration("1.0", "utf-8", "yes");
                xdoc.Add(root);

                foreach (IPublishedContent content in siteRoot.Descendants().Where(d => d.TemplateId > 0))
                {
                    root.Add(new XElement("url", new XElement("loc", content.UrlWithDomain()),
                                new XElement("lastmod", content.UpdateDate.ToString("yyyy-MM-ddTHH:mm:sszzz")),
                                new XElement("changefreq", "weekly")
                           ));
                }
                context.Cache.Insert(CACHE_KEY, xdoc, null, DateTime.Now.AddDays(1), Cache.NoSlidingExpiration);
            }
            else
            {
                xdoc = context.Cache[CACHE_KEY] as XDocument;
            }
            response.Clear();
            response.ContentType = "text/xml";

            using (StreamWriter streamWriter = new StreamWriter(response.OutputStream, Encoding.UTF8))
            {
                XmlTextWriter xmlWriter = new XmlTextWriter(streamWriter);
                xdoc.WriteTo(xmlWriter);
            }
            response.End();
        }
    }
}

Since the code runs through the whole site structure it makes sense to cache the output to reduce site load generated by search engine crawlers. Depending on the content update frequency, you should set expiration period.

error: Content is protected !!