What is gene and which capabilities justify it's existence? To
answer this, I can probably best tell you about how it came to be.
Our story begins with a tool I found on the web called parserbuilder (which
is an IDE for COCO/R). This opened up a whole new world to me called EBNF.
With it came the joy of being able to say what I wanted without saying
how, together with the frustration that you can only define the syntax.
In EBNF, semantics need to be defined in another language which you
weave between the EBNF statements. Making the whole source look
like a mess really. Defining output is just out of the question.
Which seemed pretty convenient to have in a programming language
I thought. Thus, gene's first major principle was born:
messages.
Messaging is not a new principle. Object oriented languages have
known this for years. Most implementations of OO however just
don't do it. Gene does, but with a twist.
EBNF uses productions, non terminals, terminals, conditional statements
and operators to define how a text stream needs to be read in.
I wanted to do the same but more general, that is, I not only
wanted to define what needs to parsed but also what needs to be
generated. Furthermore, I wanted to be able to define the actual
source of the stream and how this stream makes it's data available.
How does gene do all of this? Very simple really. First of,
every type in gene is able to define how another type must handle it,
if it is sent to/from it. Every complex type (for OO people, read
'class') can be a source or destination of a message stream. This
allows us to have a 2 way street (in and out baby) and it also solves
the 'general source' problem cause any object type can be just that, be
it a memory stream, file, linked list of structures, TCP/IP connection
or anything else you can think of. But what about productions,
terminals, non terminals,... you know, the blood and guts of
an EBNF definition. Well, productions become expressions
(functions for the C
programmers, methods for the OO adepts) to which you can say which
message handlers they need to use and in which direction. Non terminals
become calls to the expressions. Conditional statements (and the EBNF
operators) become the loop and option which doesn't sound much, but
they pack a pretty powerful punch (imagine loops with 'case' like
capabilities of fall through and the likes) . Finally, terminals
become constants and variables where constants tell you what is
expected to be received / should be sent and variables tell you where
to store the input or allow you to send variable content.
The second and third major principles of gene (containers and
automations) both have their origin in the same idea: automate coding.
I have always been a little ashamed of the fact that, although we
are in the business of automating, we still haven't been able to
truly automate our business. Sure, we have come a long way from
writing assembler code that calls a function or does a 'for' loop.
Things that were automated with procedural languages and later OO
programming gave us even more.
These days however, we tend to work a lot with code generators,
design/case tools, GUI designers and what will you. All of them are
very useful and I consider myself as being a happy user. But
these are just bandages, layers on top of older systems that really
weren't designed for this style of working.
Than there is this thing called 'data'. Over the years, programs
have gotten more and more complex. Complex programs need complex
data structures, thus over the years, data structures have become more
and more complex. Yes, I hear you say, that's why we have
relational databases, and the likes. But, those tables and
records eventually need to be accessed in real
world applications which still requires data structures (although this
is usually screened of through common interfaces). Besides, a lot
of applications don't use databases for various reasons (can you fit a
HTML type of data structure into a relational database model?, it's
probably possible, but not very clean is it). So we resort to the
bag of tricks we have had for decades now: algorithms for working with
data structures, which are part of the more global pool of algorithms
designed over the last 5 or 6 decades by some very bright people.
The latest addition being all the techniques grouped under
'Patterns'. Most of these techniques have been cast into common
library functions, templates, class structures and so on. Other
techniques don't lend themselves so good for molding into pretyped code.
This got me thinking: wouldn't it be nice if a common programing
language also supported data manipulating statements such as sql does.
- It was in search for this, by the way, that I came across
parserbuilder and COCO/R. - A programming language that provides
a common interface for adding, deleting, selecting and looping through
a set of records was the goal. And this is exactly what Gene
does, but how, I hear you say. The trick, as usual, is found in
'divide and conquer'. Gene splits 'the way' you store data from
'what' you store. This results in a new kind of type, called 'the
store'. A store simply defines how you store, add, delete and
access data (any kind of data). Other types can than say
which type they store (lets say an int or a pointer to a structure)
together with how they want to store it. Gene than provides
operators for adding, deleting, looping and selecting. This means
that you can easily change the storage mechanism which is great for
prototyping and optimizing. All of this forms the second principle of
Gene: containers.
Ok, but what about all those other techniques that don't have anything
to do with data structures. How do you automate these. This is
where the third major principle comes into play: Automations.
Automations allow you to create your own meta types. Again, we
need to make a split between how and what. To explain
the 'how' part, we need go back to the messages principle.
Remember that compiler generators used a syntax definition to
define what they had to read and that messages were able to turn the
direction so you could define what needed to be put out. Well,
now suppose that, what we generated was actually code itself.
Interesting. But how do you make a distinction between the code
that he needs to generate and the code that he needs to execute.
To do this, we need to turn our compiler into an interpreter that
is able to execute code at compile time and we separate the interpreted
code by putting a '#' in front of it. Now the only thing that's
left, is to define 'what' needs to be generated. This will
naturally be Gene code consisting of structures, stores, spaces (that's
how classes are called), expressions, variables and so on. But
that's not what I mean with 'what'. If you look at a space (class
for the OO people), it consists out of variables, properties,
expressions, constructors, destructors,... so if a space where an
automation, the 'what' part would consist out of these variables.
properties, expressions and so on. An automation calls them
chapters and you are free to define as many new chapters as you want.
They can map to variables, properties, expressions and so, but
they can also be integers, constant strings or identifiers. So, to put
it all together, an automation defines a new meta type. It tells
you which chapters are available and how it needs to convert all this
information into standard language constructions.
There are many more little things that makes Gene interesting.
Like default values for variables in structures and spaces
(written behind the variable and not in the constructor). And the
ability to overwrite these default values when you call constructor.
Structures can also have properties, expressions, constructors,
destructors, they can even inherit from another structure(s) (although
they can't have RTTI information).
There is also extensive support for libraries through it's own meta
type. When declaring variables of type 'lib' it will
automatically load the library (it has a special constructor that
allows you to define the name of the library to load and to define
function mappings).
Another neat feature are owned pointers, which are pointer types
that automatically call the default destructor (if there is any) and
frees the memory when they get out of scope (no, this is not the same
as garbage collection).
Cursors allow you to store a location in a container (a space) much
like sql cursors.
There is no need for a ';' or anything other (not even a return) at the
end of a statement cause Gene uses semantics to find the end of a
statement, not the syntax. This has some disadvantages for the
compiler designer (that's me) though, mainly for generating errors,
which becomes more difficult to manage.
It is also possible to define multiple processes in one unit since a
process is yet another type that can have expressions and variables.
It can even inherit from other processes. Each process also
defines the output type, that is, win32, win32console, Linux, ...
As I mentioned earlier, Gene has some unique conditional statements
which are very powerful. Think of looped case statements which can be
of any type, if - else -if statements with a fall through (a case part
without the break) or loops with multiple conditions so that the loop
runs for as long as 1 of the conditions is true (and only the code gets
executed for the condition that was true, or in other words, an eternal
loop with an if-else statement combined). But what's most important (at
least for me) about the conditional statements is how easy you can
change them from one type to another and how they allow you to 'see'
the structure of the code without reading it.
Gene doesn't introduce anything new that can't be done in another
language in another way. No language ever has (in the end, it
always gets transformed into assembler, and that really is what defines
the limits and possibilities). I do believe though that gene
introduces enough new 'nice to have' features for it to be useful and
fun to have. Although it doesn't have a cool IDE (yet) or any
other toys, I prefer it above any other language I have ever used - but
hey, I'm a bit biased here). I will definitly continue work on
this for as long as I am able to. I hope you will enjoy it as much as I
have.