Being Lazy is a Virtue

December 19, 2014 Steve Hawley

One of the things that I admire in solid programmers is laziness.  Some programmers will go to extreme lengths to avoid doing certain classes of work and frankly, I can’t blame them really.  For example, if you need to have a set of look up tables in code from some specification, you could just pull of the spec and start typing in the tables.  A clever programmer will look at the task and start asking the following questions:

  1. How much typing will each entry in the table take (and by extension, how long will the whole task take)?
  2. How many mistakes will I make just typing?
  3. How long will it take to find all of them?
  4. How long will it take to fix all of them?
  5. How irate will my customers get because of 3 and 4?

When 1 gets measured in any units greater than ‘minutes’ and 3 gets measured in units of days to years, a very clever programmer will also ask, “how can I automate this?”

This week I was looking at creating a number of tables (22 in all), each of which had somewhere around 200 entries in them.  That’s around 4000-4800 entries to type in.  Yuck.  That’s easily 3 days of typing and the number of mistakes would probably be about 10%, which would easily be another day to catch 95% of those errors and weeks for the rest.

Enter the robot.

First trick, look at your source data and see if you can easily parse it.  For example, I needed to parse Adobe Font Metrics files and the spec is blissfully simple.  It’s mostly a collection of one key/value pair per line as well as a set of separately subsection with differently formatted key/value pairs, several per line.  In this case, I decided to put reflection to work for me, since all of the keys overlapped the set of names allowed in .NET object properties.  On top of that, the tokenization can be done with String.Split().  I’m already on board.  Here’s the guts of the parsing code:

 
while (true)
{
    string line = reader.ReadLine();
    if (line == null)
        break;
    string[] pieces = line.Split(new char[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries);
    if (pieces.Length == 0)
        continue;
    string delimeter = pieces[0];
    switch (delimeter)
    {
        case "StartCharMetrics":
            ParseCharMetrics(metrics, reader, GetInteger(pieces[1]));
            break;
        default:
            SetProperty(metrics, delimeter, pieces);
            break;
    }
    if (line == null)
        break;
}
 

SetProperty is very simple – it just looks for a property named by delimeter with a setter and when it finds it type coerces the string value to the property type and sets it.  Most of the conversion is done with System.Convert, which is easy.

For the main AFM file, I project the data onto an object that looks like this:

 
class GlobalFontMetrics
{
    public string FontName { get; set; }
    public string FullName { get; set; }
    public string FamilyName { get; set; }
    public string Weight { get; set; }
    public double[] FontBBox { get; set; }
    public string Version { get; set; }
    public string Notice { get; set; }
    public string EncodingScheme { get; set; }
    public int MappingScheme { get; set; }
    public int EscChar { get; set; }
    public string CharacterSet { get; set; }
    public int Characters { get; set; }
    public bool IsBaseFont { get; set; }
    public double[] VVector { get; set; }
    public bool IsFixedV { get; set; }
    public double CapHeight { get; set; }
    public double Ascender { get; set; }
    public double Descender { get; set; }
    public double StdHW { get; set; }
    public double StdVW { get; set; }
    public double XHeight { get; set; }
}

Now, all of this code is scaffolded into my unit tests and the output of each test is one or more .cs file containing code that constructs the tables from the AFM data. These source files are then added into the main project – but not automatically.

Now, suppose that you have source data that isn’t quite so easy to parse as AFM files.  In my case I needed to build encoding tables for the various supported standard encodings in PDF, including StandardEncoding, MacRomanEncoding, and WinAnsiEncoding.  These are listed out in the spec as tables.  Fortunately, I could copy them within Acrobat and then paste them into Excel.  From Excel, I ran a set of functions that transformed the data into abbreviated lists that I could use and then was able to turn that into C# with a handful of regular expressions.  This is not as ideal as the tables are not as easily reproducible, but hopefully this won’t have to be done again.

All in all, the code and Excel chicanery took me two days to implement, verify, and unit test, far less than just typing in the tables and I’m far more confident that the data is correct.

About the Author

Steve Hawley

Steve was with Atalasoft from 2005 until 2015. He was responsible for the architecture and development of DotImage, and one of the masterminds behind Bacon Day. Steve has over 20 years of experience with companies like Bell Communications Research, Adobe Systems, Newfire, Presto Technologies.

Follow on Twitter More Content by Steve Hawley
Previous Article
Failure Modes
Failure Modes

Software engineering is funny.  It shares a lot with Computer Science...

Next Article
Atalasoft 10.3 Released - Major Version Includes Mobile Annotations
Atalasoft 10.3 Released - Major Version Includes Mobile Annotations

Hello! This time, we have lots of exciting updates across our entire...

Try any of our Imaging SDKs free for 30 days with Full Support

Download Now