Your Whole Programming Language is a Set of Domain-Specific-Languages

May 5, 2014 Steve Hawley

 

A Domain-Specific-Language (DSL) is a small language used to make routine tasks in a particular problem easier. Examples of DSLs include spread-sheet macros, the Unix software build utility known as Make, and the virtual machine I wrote to parse PDF implements a simple DSL.

When you consider the syntax of most modern-ish programming languages (I’m looking at you C++, Java, C#, F#), nearly all of them are a hodge-podge of DSLs jammed together. This is sometimes a horrible thing, and unfortunately it’s our own fault. It stems from how we got here in the first place and how we saw our problem domain.

The first thing that comes to mind is assignment, which is the first DSL.  Value mutation is a direct reflection of the initial implementation of hardware.  We had memory that was used to hold numbers and we needed a way to put/get values into/from cells.  Thus was born the “move” instruction (or load/store instructions in accumulator machines) which was directly reflected in assignment.  In C/C++/C#/Java/F#, this looks like variable = value. And (yes, I do understand that F# discourages you from using assignments by making mutability a specific annotation you have to take before you can apply the <- operator).

Next there’s arithmetic: arithmetic expressions.  Nearly every programming language has a parser that includes an arithmetic expression handler and this is because we’re all taught traditional arithmetic using the usual PEMDAS rules.  This isn’t a surprise since mathematical computations were the first applications that ran on computers. The problem is that infix is something that doesn’t really fit in the rest of the syntax.  I wonder if maybe FORTH and LISP/Scheme had it right by using strictly postfix or prefix operations. C took that to an extreme with the addition of bit operations, and pre and post increment and decrement. As a result, you had to have what one of my professors referred to as the periodic chart to figure out precedence.

Next there are conditionals – if/else expressions again are fundamentally different from everything else and are there to match up with our typical thinking.  Lisp/Scheme again have something more uniform: (if expr true-clause else-clause) and it more or less matches the syntax of arithmetic except that the semantics are very different since neither true-clause nor else-clause are evaluated until expr has been evaluated. So again, another DSL created by a necessary semantic island.  As an aside, this type of problem can be completely avoided by making evaluation of arguments lazy rather than eager.  When all arguments to functions are lazy, then there need not be a difference between an if-construct and a function.

Next are loops – we like to be able to do repetition, so most languages have at least three looping constructs: for, while, and do-while/repeat-until.  I’ve always liked the C DSL for for-loops of having for (init-expr; test-expr; post-expr) body-expr, but we need to keep in mind that again the loop expressions are unlike anything else heretofore, so they represent their own little sub-language.  Once again F# discourages you from using for-loops by making them broken and making tail-recursion cheap.  To be fair, F# inherited the broken syntax from OCAML

Function calls are yet another DSL which is nothing like the rest.

All of these little DSLs exist either because they map onto the hardware task, onto a known abstraction, or to make a common task easier.  A prime example of the latter in C# is the event model which allows you to implement the Observer pattern with only a modicum of fuss.  It still stinks, by the way – the amount of repeated code in order to implement the pattern correctly and in a thread-safe way is unacceptable.  I guess it’s a small selling point to say “at least it’s better than Java/C++”.

So what are we left with? We have programming languages that are far less uniform than we think with upwards of a dozen different DSLs that differ from each other syntactically or worse, semantically.

Should we have uniform languages? No. This really is a case of “a foolish consistency is the hobgoblin of little minds”. A DSL within a language is supposed to be there to make it easier to express a particular idiom.

Some of the real questions, when designing syntax, are:

  • is this idiom unimportant/useless?
  • can I live without this?
  • are there any reasonable alternatives?
  • am I putting this in just for me-too?
  • is my syntax a leaky abstraction only to appear consistent?

If the answers to any of these are ‘yes’ then perhaps you might consider an alternative in your syntax.

These same questions are important when you’re doing operator overloading/creation in languages that allow them. In those circumstances, you’re creating a new (and sometimes inconsistent) DSL extension to your language. Sometimes this is acceptable and provides some nice syntactic sugar, but other times it creates a maintainability headache when someone new looks at your code and has to first understand the semantics of your new DSL.

There is a sad trade-off between uniformity and readability/utility. LISP is probably the most uniform language I’ve encountered.  If you ignore the special forms, the syntax is incredibly uniform, but since it steers so close to lambda calculus and so far away from convention grammar (both mathematical and spoken), that it ends up being a real lose for most people – even though LISP is an incredibly useful language. With functional languages, the first barrier to entry is to first get used to thinking recursively (and more importantly, tail recursively)

There is an interesting bit of graphs done by Simon Peyton Jones about the life cycle of programming languages. If a language is to succeed it has to be better than the alternatives. The tricky part is to define what it means to be better.  Terse expression is certainly not a good definition, as seen in this analysis of the Roslyn compiler and the F# compiler. I think ‘better’ needs to be defined in terms of how well the syntax lends itself to managing complexity of the target domain as well as how well the syntax lends itself to being read.  I think one should also keep in mind that there are some classes of problems that are far easier to solve in a structured/OOP idiom than in a functional idiom and vice versa. Keep your mind open.

About the Author

Steve Hawley

Steve was with Atalasoft from 2005 until 2015. He was responsible for the architecture and development of DotImage, and one of the masterminds behind Bacon Day. Steve has over 20 years of experience with companies like Bell Communications Research, Adobe Systems, Newfire, Presto Technologies.

Follow on Twitter More Content by Steve Hawley
Previous Article
Compilers as a Commodity

Next Article
Improving OCR Results: Adding Spellcheck
Improving OCR Results: Adding Spellcheck

With the new Tesseract 3.2 engine available as an add-on for Atalasoft...

Try any of our Imaging SDKs free for 30 days with Full Support

Download Now