Domain Specific Language

From APIDesign

(Difference between revisions)
Jump to: navigation, search
(Evolution (Versioning, Deprecation))
Current revision (10:27, 17 May 2011) (edit) (undo)
(Evolution (Versioning, Deprecation))
 
(7 intermediate revisions not shown.)
Line 1: Line 1:
-
[[wikipedia::Domain Specific Language|Domain Specific Language]]s will be topic of [[User:RichUnger]] and [[User:JaroslavTulach]] JavaOne 2010 shootout.
+
[[wikipedia::Domain Specific Language|Domain Specific Language]]s will be topic of [[User:RichUnger]] and [[User:JaroslavTulach]] [[JavaOne]] 2010 shootout. A very interesting introductory external resource can be found [http://www.ibm.com/developerworks/java/library/j-eaed13/index.html?ca=drs- here].
= Overview =
= Overview =
 +
 +
We had a presentation about [[DSL]]s during [[JavaOne2010]]. Here are the slides: [[Image:Domain-library-shootout.pdf]].
===What is a [[wikipedia::Domain Specific Language|DSL]]?===
===What is a [[wikipedia::Domain Specific Language|DSL]]?===
Line 21: Line 23:
:* Or the format is close to [[HTML]], like in case of [[Docbook]].
:* Or the format is close to [[HTML]], like in case of [[Docbook]].
-
Jarda: Does this mean that [[XML]] a [[meta]] [[DSL]] language then? A language to help creation of [[DSL]]s without writing a parser? What other languages like this we can find and where is the boundary? For example sometimes, when using a library written in some highlevel language like [[Scala]], [[Haskel]], [[Clean]], then I feel like using some other, completely different language. To give a [[BNF]] grammar example, compare for example [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.1785&rep=rep1&type=pdf Haskell Parser Combinators]]. Of course you are bound by the underlaying language (but the same applies to [[XML]]), but the program code itself is quite close to [[BNF]] and gets processed automatically without writing anything special. Conclusion? Good language ([[Haskel]]) and good library (the parser generators for example), gives you same comfort more easily.
+
Jarda: Does this mean that [[XML]] a [[meta]] [[DSL]] language then? A language to help creation of [[DSL]]s without writing a parser? What other languages like this we can find and where is the boundary? For example sometimes, when using a library written in some highlevel language like [[Scala]], [[Haskell]], [[Clean]], then I feel like using some other, completely different language. To give a [[BNF]] grammar example, compare for example [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.1785&rep=rep1&type=pdf Haskell Parser Combinators]. Of course you are bound by the underlaying language (but the same applies to [[XML]]), but the program code itself is quite close to [[BNF]] and gets processed automatically without writing anything special. Conclusion? Good language ([[Haskell]]) and good library (the parser generators for example), gives you same comfort more easily.
Rich: Yes, this is essentially correct, though I would draw a distinction between [[XML]] and these other examples. Martin Fowler calls the latter an "[http://martinfowler.com/articles/languageWorkbench.html#InternalDsl internal DSL]". Languages like [[LISP]] are particularly suited to this, though [[Scala]] has been making much hay of it lately. Also, the extreme operator overloading in [[Fortress]] theoretically makes it an excellent foundation for DSLs.
Rich: Yes, this is essentially correct, though I would draw a distinction between [[XML]] and these other examples. Martin Fowler calls the latter an "[http://martinfowler.com/articles/languageWorkbench.html#InternalDsl internal DSL]". Languages like [[LISP]] are particularly suited to this, though [[Scala]] has been making much hay of it lately. Also, the extreme operator overloading in [[Fortress]] theoretically makes it an excellent foundation for DSLs.
Line 73: Line 75:
FORWARD 100
FORWARD 100
LEFT 90
LEFT 90
-
 
= Developer Experience =
= Developer Experience =
Line 99: Line 100:
==== Accessing Database ====
==== Accessing Database ====
-
Jarda: Is it easier in [[DSL]] to access database structure, than in [[Java]]? I would expect so, I would expect that [[DSL]] can make it easy easy as [[Ruby]] on Rails or Grails - e.g. you just access object representing the database and all columns are accessible as fields. As soon as you change the database schema, new field will immediately be visible. This could be hard to simulate when writing [[Java]] libraries, or not?
+
Jarda: Is it easier in [[DSL]] to access database structure, than in [[Java]]? I would expect so, I would expect that [[DSL]] can make it easy easy as [[Ruby]] on Rails or Grails - e.g. you just access object representing the database and all columns are accessible as fields. As soon as you change the database schema, new field will immediately be visible. This could be hard to simulate when writing [[Java]] libraries, or not? Actually not - see [[LiveDB]] demo!
Rich: I would agree with this, but more because of the "Domain" part of "Domain Specific Language" than the "Language" part. The Apex runtime is aware of the database that is available, already has access to a connection pool, and already knows the tables and columns that have been configured by the organization. A library could take advantage of the same information, but it is really runtime information, and so would rely on a custom, domain-specific runtime. Once you have a general purpose language running in a domain specific runtime, you're really 95% of the way to what we did with Apex, which is to say "as long as we've got our own runtime, let's make things a bit easier on our customers by adding some keywords for the really common stuff". That starts you down the road of a full DSL.
Rich: I would agree with this, but more because of the "Domain" part of "Domain Specific Language" than the "Language" part. The Apex runtime is aware of the database that is available, already has access to a connection pool, and already knows the tables and columns that have been configured by the organization. A library could take advantage of the same information, but it is really runtime information, and so would rely on a custom, domain-specific runtime. Once you have a general purpose language running in a domain specific runtime, you're really 95% of the way to what we did with Apex, which is to say "as long as we've got our own runtime, let's make things a bit easier on our customers by adding some keywords for the really common stuff". That starts you down the road of a full DSL.
Line 138: Line 139:
VoiceXML interpreters can be compliant with both the 1.0 and 2.0 specifications by simply reading the root element, and then using different parsers depending on the value of the version attribute.
VoiceXML interpreters can be compliant with both the 1.0 and 2.0 specifications by simply reading the root element, and then using different parsers depending on the value of the version attribute.
-
In Apex, we do this a bit differently. Because all Apex classes are stored in our database, we simply have another column in the ApexClass table for the version. When users create a new class, this column defaults to the latest version number, but it is user-editable, in case they want to "upgrade" their class later to take advantage of new features.
+
I once worked on a NetBeans Platform project where the data files were of a particular XML schema with this type of version attribute. With each version, we tracked the changes we'd made to the file format (additions and modifications), and we'd write an XSLT script that could upgrade each version to the next one. So we had a set of files like:
 +
from1.xsl
 +
from2.xsl
 +
from3.xsl
 +
...
 +
Whenever my module opened a user's data file, it would check the version. If the version was not up to date, the XSLT scripts would be invoked sequentially, behind the scenes, to bring the data file up to date.
 +
 
 +
In [[Apex]], we do this a bit differently. Because all [[Apex]] classes are stored in our database, we simply have another column in the ApexClass table for the version. When users create a new class, this column defaults to the latest version number, but it is user-editable, in case they want to "upgrade" their class later to take advantage of new features.
One example of a change that is versioned in this manner is the way Apex handles floating point literals. In a class with version 17.0, the literal
One example of a change that is versioned in this manner is the way Apex handles floating point literals. In a class with version 17.0, the literal
Line 153: Line 161:
DSL versioning is a very powerful concept. Unlike in a Java API, we can completely remove concepts from the latest version of the language instead of deprecating them, keeping the semantics of the language clear and simple. We do this because the old parser logic is still there, and old code will always use the old logic.
DSL versioning is a very powerful concept. Unlike in a Java API, we can completely remove concepts from the latest version of the language instead of deprecating them, keeping the semantics of the language clear and simple. We do this because the old parser logic is still there, and old code will always use the old logic.
 +
 +
''Jarda'': Additional note here is that this requires existence of a shared abstract binary interface behind the language. All the versions of the language then need to compile to the same ''ABI''. If so, then each unit of compilation can be written in different flavor of the language. However the shared ''ABI'' is crucial - I remember example of a language designed without ''ABI'' and problems with its [[evolution]] (e.g. the complete incompatibility of various versions of Java FX Script).

Current revision

Domain Specific Languages will be topic of User:RichUnger and User:JaroslavTulach JavaOne 2010 shootout. A very interesting introductory external resource can be found here.

Contents

Overview

We had a presentation about DSLs during JavaOne2010. Here are the slides: Image:Domain-library-shootout.pdf.

What is a DSL?

A programming language or specification language dedicated to a particular problem domain, a particular problem representation technique, and/or a particular solution technique. --wikipedia

DSL examples

  • LOGO (a language for children), like Karel? (Yes, they're very similar)
  • SQL, HQL
  • regex
  • ZIL (Zork Implementation Language)
  • Graphics rendering (POVray, Postscript)
  • Building, dependency management (Makefiles)
  • Document formatting (TeX, CSS)
  • BNF Grammars (YACC, Antlr)
  • XML variants (Ant, VoiceXML, XSLT, SVG, Docbook) Note: XML variants count as DSLs, and are a valid option if
  • You want to code the parser very quickly
  • Human readability and performance are not major concerns
  • Or the format is close to HTML, like in case of Docbook.

Jarda: Does this mean that XML a meta DSL language then? A language to help creation of DSLs without writing a parser? What other languages like this we can find and where is the boundary? For example sometimes, when using a library written in some highlevel language like Scala, Haskell, Clean, then I feel like using some other, completely different language. To give a BNF grammar example, compare for example Haskell Parser Combinators. Of course you are bound by the underlaying language (but the same applies to XML), but the program code itself is quite close to BNF and gets processed automatically without writing anything special. Conclusion? Good language (Haskell) and good library (the parser generators for example), gives you same comfort more easily.

Rich: Yes, this is essentially correct, though I would draw a distinction between XML and these other examples. Martin Fowler calls the latter an "internal DSL". Languages like LISP are particularly suited to this, though Scala has been making much hay of it lately. Also, the extreme operator overloading in Fortress theoretically makes it an excellent foundation for DSLs.

The distinction I would draw for XML is that there is a formalized syntax for specifying the DSL (several, actually, including XML schemas and DTDs). Standards bodies such as the w3c write entire specifications for individual XML-derived DSLs (e.g. xHTML, VoiceXML, etc).

The reason I wasn't intending to include internal DSLs in this discussion was because they're not particularly suited (in my opinion) for the task at hand, which is to expose proprietary technology to an outside group of developers.

When is it good to use a DSL?

When you're targeting domain experts, not java programmers

  • ZIL: lets authors program whole games
  • TeX: used in academia across many disciplines
  • Excel formulas: non-programmers do amazing things with excel

Note: This is a sliding scale. The more you limit your domain, the wider an audience you can target. Apex is almost a full language, but we have many users who would be far too intimidated to write their app in Java.

When you can do validation or eliminate boilerplate based on domain assumptions

For example, in Apex...

  • No DB connections, pools, etc
  • Static type checking for domain objects
 Account[] myaccounts = [select firstname, lastname from Contact]; // compile error
  • Bring in complex set of data visibility rules specified by the admin in the UI
 public class Foo with sharing { ... }
  • Create a SOAP endpoint with no extra configuration or code
 webservice String getSomething(integer someParam) { ... }

Note: Yes, you can use an annotation to create something similar to the webservice keyword. However, as a keyword it is more expressive. It is a type of visibility, just like public and private. Public means visible to other code on that server. Webservice means visible to other code on other servers. To say:

 @webservice public foo();

...would be using annotations to present a syntax that is as wrong as:

 @public private foo();


When domain lends itself to an idiom that can be clearly expressed in the syntax

  • BNF syntax
 expression: unaryExpression | binaryExpression | constant;
 binaryExpression : expression binaryOp expression;
 binaryOp : '+' | '-' | '*' | '/';
  • LOGO
 FORWARD 100 ; draws a square with sides 100 units long
 LEFT 90
 FORWARD 100
 LEFT 90
 FORWARD 100
 LEFT 90
 FORWARD 100
 LEFT 90

Developer Experience

Clarity of syntax

With DSLs you can bake pertinent concepts right into the language:

 public class Foo with sharing { ... }

The with sharing key phrase tells the interpreter that access to objects in this code should enforce visibility restrictions set up by the organization's administrator. For example, say there are 2 Account objects in the system, and you only have the right to access one of them. The following code will return one Account:

 public class Foo with sharing {
   public static List<Account> getAccounts() {
     return [select id, name from Account];
   }
 }

...whereas the same code without sharing will return 2:

 public class Foo without sharing { // FYI, without sharing is the default
   public static List<Account> getAccounts() {
     return [select id, name from Account];
   }
 }

Accessing Database

Jarda: Is it easier in DSL to access database structure, than in Java? I would expect so, I would expect that DSL can make it easy easy as Ruby on Rails or Grails - e.g. you just access object representing the database and all columns are accessible as fields. As soon as you change the database schema, new field will immediately be visible. This could be hard to simulate when writing Java libraries, or not? Actually not - see LiveDB demo!

Rich: I would agree with this, but more because of the "Domain" part of "Domain Specific Language" than the "Language" part. The Apex runtime is aware of the database that is available, already has access to a connection pool, and already knows the tables and columns that have been configured by the organization. A library could take advantage of the same information, but it is really runtime information, and so would rely on a custom, domain-specific runtime. Once you have a general purpose language running in a domain specific runtime, you're really 95% of the way to what we did with Apex, which is to say "as long as we've got our own runtime, let's make things a bit easier on our customers by adding some keywords for the really common stuff". That starts you down the road of a full DSL.

Learning Curve

If you're targeting java programmers, a large set of assumptions that go with an API are well understood.

Even so, if the technology you're presenting represents a foreign paradigm to your audience, it's often easier to convey that paradigm as a DSL.

Also, if the paradigm doesn't lend itself to object orientation (e.g. relational data). SQL isn't going away anytime soon, despite very concerted efforts over the past 10 years.

Vendor Perspective (That's You)

Providing Tooling

With APIs you get the standard tooling for free, which is great, IF:

  • You are targeting java programmers
  • You cannot afford the modest investment necessary to support a DSL in a major IDE
    • NetBeans, Eclipse, IntelliJ all have pretty straightforward ways to support at least minimal code completion and syntax coloring of DSLs
    • Language workbenches and RCP platforms are lowering barrier to entry for language tooling

Time to Market

  • Java API, XML-based DSL: about the same
  • DSL with custom syntax: a bit longer

Evolution (Versioning, Deprecation)

Jarda: There are strict rules for evolution of Java API. The Chapter 6 touches on this topic slightly, moving methods up and down the class hierarchy, adding methods, etc. When doing library design, you need to adjust to the Java (or other framework) rules. When writing a DSL you are basically creating your own rules, aren't you Rich?

It took me few years to understand the Java rules, how long it takes to specify own? There are tools which help you discover backward incompatibilities in Java like Sigtest. This all needs to be written from a scratch for DSL (btw. we would need something like that for Ant - as Ant XML build scripts supports overrides, inheritance, etc. - nothing like that is available, nobody will likely bother to write anything like that and rather we will rely on visual inspection and early testers to report bugs).

Rich: The great thing about DSLs is that you can completely change the syntax and semantics from version to version without breaking backwards compatibility. This is because you have complete control over the parser.

The clearest example of this is in certain XML derivatives. In VoiceXML, for example, the root element has a version attribute:

 <vxml version="2.0">

VoiceXML interpreters can be compliant with both the 1.0 and 2.0 specifications by simply reading the root element, and then using different parsers depending on the value of the version attribute.

I once worked on a NetBeans Platform project where the data files were of a particular XML schema with this type of version attribute. With each version, we tracked the changes we'd made to the file format (additions and modifications), and we'd write an XSLT script that could upgrade each version to the next one. So we had a set of files like:

 from1.xsl
 from2.xsl
 from3.xsl
 ...

Whenever my module opened a user's data file, it would check the version. If the version was not up to date, the XSLT scripts would be invoked sequentially, behind the scenes, to bring the data file up to date.

In Apex, we do this a bit differently. Because all Apex classes are stored in our database, we simply have another column in the ApexClass table for the version. When users create a new class, this column defaults to the latest version number, but it is user-editable, in case they want to "upgrade" their class later to take advantage of new features.

One example of a change that is versioned in this manner is the way Apex handles floating point literals. In a class with version 17.0, the literal

 12.4

is a double. If you change the version of the class to 18.0, the same literal is a BigDecimal. If you want to specify a double literal, you'd type

 12.4d

When implementing this change in behavior, we basically add an if statement to our abstract syntax tree like:

 Object value;
 if (currentVersion > 17.0)
   value = new BigDecimal(parsedStringValue);
 else
   value = Double.valueOf(parsedStringValue);

DSL versioning is a very powerful concept. Unlike in a Java API, we can completely remove concepts from the latest version of the language instead of deprecating them, keeping the semantics of the language clear and simple. We do this because the old parser logic is still there, and old code will always use the old logic.

Jarda: Additional note here is that this requires existence of a shared abstract binary interface behind the language. All the versions of the language then need to compile to the same ABI. If so, then each unit of compilation can be written in different flavor of the language. However the shared ABI is crucial - I remember example of a language designed without ABI and problems with its evolution (e.g. the complete incompatibility of various versions of Java FX Script).

Personal tools
buy