Dictionary is Not Suitable for Caching

September 15, 2015 Steve Hawley

Let’s start off by first defining a cache. A cache is a table of values that are associated with a set of keys, where the values are derived from the keys. We use caches because we either want to associate a single object with each key; perhaps because each object is huge and we want to avoid allocating multiple ones or perhaps the initial calculation or retrieval time for one of the values is onerous.

Dictionary<T,U> is a handy generic class that gets used frequently for writing caches. It’s got the tools in it that allow you to use it as a cache – it associates keys with values, which is what we want.

Unfortunately, the interfacing is into Dictionary<T, U> makes it barely suitable for the task.

To illustrate the problem, let’s start by writing a cache the way a lot of us do, by solving the problem in front of us and not the general problem (which is a violation of D.R.Y. and you know you’ve done that too):

private Dictionary<string, Person> _cache = new Dictionary<string, Person>();

public Person Get(string name)
{
	Person person = null;
	if (!_cache.TryGetValue(name, out person))
	{
		person = RetrievePerson(name);
		_cache.Add(name, person);
	}
	return value;
}

I’ve written this in the typical style. The problem stems from the issue that I don’t know what the value for person will be unless it’s in the table. If it’s not in the table, I have to hash the name twice. The problem is that Dictionary is missing the right level of granularity. In reality, what I want is a process like this:

  1. Find the right location in the table and determine if the table contains the item
  2. If the table doesn’t contain the item, perform the calculation and put the value in the table
  3. Return the value

Given that, we should never be writing the above code whenever we need a cache. What we should be doing is generalizing this code. I’ll give a sample chunk of code that will at least prevent us from violating DRY. In reality, this code is a specific example of memoizing. You can use any typical memoizing code, but you should be aware that the textbook examples of memoizing rarely have a way of limiting the size of your cache and that’s a bad thing if your dataset is big. This example will at least give the tools to manage it yourself.

First we’ll start with an interface. Why? Because when we use a cache in a public property or method, we don’t have to specify how the cache is implemented. This is similar to the IList/List pairing.

 

public delegate U CacheFactory<T, U>(T key);

public interface ICache<T, U>
{
	U Get(T key);
	U Get(T key, CacheFactory<T, U> f);
	int Count { get; }
	void Clear();
}

In the interface, we declare a delegate which is the factory function that can turn a key into a value. You’ll notice also that there are two flavors of Get, one that takes a factory and one that does not. This is a concession that we’re likely to have two flavors of factories: one that can be known at the time of cache creation and one that can be known at the time of cache use. It’s a reasonable concession to making the experience of using the cache simpler.

It should be noted that I don’t bother with Contains() because in reality we don’t care if the cache contains an item or not. We only care about getting a value. If you really wanted, you could add methods that return an IEnumerable<T> for the keys and an IEnumerable<T> for the values, but again this is not typical use.

Here’s a simple implementation that is based on Dictionary<T,U>. I’m not going to implement a full hashtable here; exercise for the reader and all that.

   

internal class SimpleCache<T, U> : ICache<T, U>
{

	private Dictionary<T, U> _cache = new Dictionary<T,U>();

	private CacheFactory<T, U> _factory;

	public SimpleCache(Factory factory)
	{
		_factory = factory;
	}

	public SimpleCache() : this(null) { }

	public U Get(T key) { return Get(key, _factory); }

	public U Get(T key, CacheFactory<T, U> factory)
	{
		factory = factory ?? _factory;
		if (factory == null)
			throw new ArgumentNullException("factory");

		U value = default(U);
		if (!_cache.TryGetValue(key, out value))
		{
			value = factory(key);
			_cache.Add(key, value);
		}
		return value;
	}

	public int Count { get { return _cache.Count; } }

	public void Clear() { _cache.Clear(); }
}

Now you can take your textbook implementation of a hashtable and plug it directly into the get method above and you will have an even better implementation of a cache. Even without going that far, the above code is suitable enough to be a win because at least you won’t be writing WET code.

 

About the Author

Steve Hawley

Steve has been with Atalasoft since 2005. Not only is he responsible for the architecture and development of DotImage, he is one of the masterminds behind Bacon Day. Steve has over 20 years of experience with companies like Bell Communications Research, Adobe Systems, Newfire, Presto Technologies.

Follow on Twitter More Content by Steve Hawley
Previous Article
New"-ish" Feature: AtalaImage Debugger Visualizer
New"-ish" Feature: AtalaImage Debugger Visualizer

How to use the new AtalaImage Debugger Visualizer.

Next Article
Image Compression and Why You Should Care
Image Compression and Why You Should Care

This article discusses image compression and the challenges of scanned documents in an increasingly paperle...

Try any of our Imaging SDKs free for 30 days with Full Support

Download Now