Sitecore Development / Kim Hornung

Thursday, August 30, 2007

Performance of Sitecore Query

As many Sitecore developers know, using Sitecore Query to retrieve items is a really powerful concept (if you don't know, have a look at http://sdn5.sitecore.net/Reference/Using%20Sitecore%20Query.aspx).

Sitecore Query allows you to write XPath-like expressions, so in a single line, you can for example retrieve all items based on a specific template, filtered by various field values:

Item[] activityItems = activityFolderItem.Axes.SelectItems("descendant::*[@@templatekey='intranet.activity' and (@title != '' or @corporatetitle != '' or @headline != '')]");

This is really cool!

But what about performance?
It turns out that the previous code snippet is not as efficient as it could be. The flexibility of Sitecore Query does come with a price.

I did some testing on a relatively small data set, and it turned out that:

  • The first time the code is executed, it took anywhere from 300ms to 1 second! [updated Aug. 31, 2007: the original test was performed with a debugger attached, without this the timings are more like 70-200 ms]
  • For subsequent executions, it took 3-7ms

So how to optimize the code?
The trick to optimize the code is to avoid the predicates. It is (at least with the current implementation of Sitecore Query) cheaper to simply return all descendants, then filter them with regular ASP.NET code:

Item[] activityItems = activityFolderItem.Axes.GetDescendants();
if (activityItems != null)
{
for (int i = 0; i < activityItems.Length; i++)
{
if (item.Template.Key == "intranet.activity" && (item["title"] != "" item["corporatetitle"] != "" item["headline"] != ""))
{

With this simple change, the code runs much better:

  • The first time the code is executed, it now takes 1-3ms [updated Aug. 31, 2007: this is not true, my test case was buggy. In fact, it takes almost as long as the original code - most of the time is spent fetching items from the database]
  • For subsequent executions, it now takes ½-1½ms

Your measurements might vary, but the difference should be significant - especially for the first execution.

Happy coding :o)


Wednesday, August 01, 2007

Understanding Sitecore's path cache

Just a quick note about Sitecore's so-called "path cache". Knowing how it works enables you to write more efficient code.

Let's look at some code
I recently saw the following lines of code in a Sitecore solution that I was reviewing:

public Item GetReference(string referenceName)
{
Item root = Sitecore.Context.ContentDatabase.Items["/sitecore/content/home/references/"];
Sitecore.Diagnostics.Error.AssertObject(root, "References folder");
Item reference = root.Children[referenceName];
return reference;
}

So what's wrong with this code?
Actually, there's nothing wrong with the code seen from a coding point-of-view. I really like this approach where you assert that the references item actually exists.

But the above code isn't as efficient as it could be. And this becomes more and more apparent as the users add more and more subitems below the /references folder.

The problem with the code is the use of the Children[] collection. This is a potentially expensive operation, even if the collection of childs is cached (which it normally would be).

How can the code be improved?
A simple restructuring of the code can make it a bit more efficient - and the more subitems below /references, the larger a difference:

public Item GetReference(string referenceName)
{
Item reference = Sitecore.Context.ContentDatabase.Items["/sitecore/content/home/references/" + referenceName];
if (reference == null)
{
Item root = Sitecore.Context.ContentDatabase.Items["/sitecore/content/home/references"];
Sitecore.Diagnostics.Error.AssertObject(root, "References folder");
}
return reference;
}

Notice how we have gotten rid of accessing the Children collection, while stile checking whether the references folder exists (and yes, you might as well make an event handler to prevent the user from moving or deleting or renaming this, but that's a completely different discussion).

And why is this more efficient?
The first time you look up a referenceName, Sitecore will traverse the path and therefore also load the children. But when finding the item, the path and the corresponding item ID will be stored in the the "path cache", making any additional lookups for this referenceName lightening fast - no matter how many subitems there is below /references.

Conclusion: Taking advantage of Sitecore's path cache gives you better performance. You should of course NOT substitue this for structuring your subitems in a sensible way. But even with a reasonable amount of subitems, looking up items by their full path will be more efficient than accessing the Children[] collection of the parent item.