The G2 preprocessor

Home > Software > G2 preprocessor
Contents of this document:

Introduction

This small document describes the G2 preprocessor, its purposes, and how to use it. We only limit ourselves to a general description here; there's another document with specific information about the TCL extensions of the preprocessor, in particular the use of roles in object-oriented programming.

The G2 preprocessor is an extremely simple tool written in C. It acts as a very generic preprocessor for text output. It is based on the idea that you take an existing language (e.g. TCL, Python, Perl) and use it as a preprocessor for another language (e.g. C++, HTML, XML, TeX, or even freeform ASCII output). I have made one particular implementation of this idea, with TCL as the preprocessor language, and C++ as the target language. So instead of learning to live with the limitations of "#define" and "#ifdef", you can now bring the full scripting capabilities of TCL to the preprocessing phase of your C development.

Automatic escaping

The easiest way to make TCL act as a preprocessor, would be to generate code with "puts":

puts "class A : public B \{"
if { $ctor_is_protected } {
   puts "protected:"
} else {
   puts "public:"
}
puts "  A() { printf(\"boo!\\n\"); }"
puts "\};"

This approach has some major setbacks. Typing "puts" all the time is quite annoying. And the escape characters don't make this kind of code generation very readable, either.

So if "puts" and some escape characters are a nuisance, why don't we automate them? That's exactly what the G2 tool does: it automatically adds "puts" and escape characters to your TCL scripts, so that you need not worry about them anymore. The only thing the G2 tool does, is properly "escape" your output for use in a TCL script. It is easy to adapt the tool so that it works for languages other than TCL (e.g. Python, Perl, even C/C++ itself).

If you do not believe that such a simple tool can be useful, remember that TCL is an extremely powerful scripting language, much more powerful than CPP. As a preprocessor, TCL allows you to do incredible things, such as write procedures, construct lists and arrays, and put traces on variables. So in fact, the power of this approach does not come from the g2 tool itself, but rather from TCL. The g2 tool simply provides some glue.

General example

Here's a small example that shows everything the g2 preprocessor can do. We use TCL as the preprocessor language; the target output is just some freeform text.

First, we write the following file:

# Example of all the features of the TCL preprocessor.
#
# The file starts in TCL mode (i.e. we just write a
# TCL script as we always do).  We switch to output mode using the
# @@ sign (two of them required here because otherwise we really
# switch to output mode!)
#
# This script shows all the features of the preprocessor
# one by one.

# Print some literal output.  The 'at' sign switches us to output mode.
@
This is the first line of the output.
We are now in output mode; everything you read here, goes to the
"g2puts" procedure implicitly.

Note that we do not need "puts" or any escaping!
We can use characters like 'single quotes', "double quotes",
\backslash\, {braces} and [square brackets] without having to escape
them.  There are only two special signs that we have to escape
in output mode:
- The dollar sign.  If you want a verbatim dollar in the output,
  write two of them: $$.
- The at sign. Again, you need two of them: @@.

Now let's go back to TCL mode.
@

# Back in TCL mode, we can now write pure TCL again.  As long as we are
# in TCL mode, we can write plain old TCL scripts; the only thing
# we need to escape, is the 'at' sign: @@.  Note that the $ need
# not be escaped in TCL mode!  It serves its usual purpose of variable
# substitution.

# Names of cities in Belgium
array set cities {1 Antwerpen 2 Brussel 3 Charleroi}
foreach i {1 2 3} {
   # Examples of variable substitution in output mode.
   @
      In output mode:
      - Variable substitution:
        i = $(i).  Parentheses are obligatory.
      - Variables can also be arrays:
        cities(i) = $(cities($i)).
      - They can even contain expressions:
        cities([expr 4-$$i]) = $(cities([expr 4 - $i])).
   @
}

set a 66
set b 20
# Example of command substitution.  We use a dollar for consistency.
@
   The array "cities" has these names: $[array names cities].
   And now for some math: [expr 55+22] = $[expr 55+22].
   Or a more complex example of command substitution:
   [expr [expr $$a - 11] + [expr $$b + 2]]=
      $[expr [expr $a - 11] + [expr $b + 2]]
   This shows that we do not need to escape the dollar sign within a
   command.
@

This example shows the following features of the tool:

In the next section, we will see how to run the preprocessor on this little piece of input, and we will look at the resulting output.

The code generation process

When you use the g2 preprocessor to generate code, you follow two successive steps:

The standard C preprocessor (CPP) can do this in a single step, so you may think that this new process is more cumbersome. But when you put the two steps in a makefile, you do not even notice them. And remember the extreme power you have at your fingertips now: A C++ preprocessor that can iterate over lists, extract information from arrays or data files, and put traces on variables. All that with only a stupid little 300-line C tool...

The result of performing the two-step process on our example, is the following output file:


This is the first line of the output.
We are now in output mode; everything you read here, goes to the
"g2puts" procedure implicitly.

Note that we do not need "puts" or any escaping!
We can use characters like 'single quotes', "double quotes",
\backslash\, {braces} and [square brackets] without having to escape
them.  There are only two special signs that we have to escape
in output mode:
- The dollar sign.  If you want a literal dollar in the output,
  write two of them: $.
- The at sign. Again, you need two of them: @.

Now let's go back to TCL mode.

      In output mode:
      - Variable substitution:
        i = 1.  Parentheses are obligatory.
      - Variables can also be arrays:
        cities(i) = Antwerpen.
      - They can even contain expressions:
        cities([expr 4-$i]) = Charleroi.
   
      In output mode:
      - Variable substitution:
        i = 2.  Parentheses are obligatory.
      - Variables can also be arrays:
        cities(i) = Brussel.
      - They can even contain expressions:
        cities([expr 4-$i]) = Brussel.
   
      In output mode:
      - Variable substitution:
        i = 3.  Parentheses are obligatory.
      - Variables can also be arrays:
        cities(i) = Charleroi.
      - They can even contain expressions:
        cities([expr 4-$i]) = Antwerpen.
   
   The array "cities" has these names: 1 2 3.
   And now for some math: [expr 55+22] = 77.
   Or a more complex example of command substitution:
   [expr [expr $a - 11] + [expr $b + 2]]=
      77
   This shows that we do not need to escape the dollar sign within a
   command.

Strengths and limitations

Strengths of this approach:

Limitations of this approach:

More general point of view

The two-step preprocessing approach I explained earlier, can be described in a more general way. You want to generate output in a certain language. Call that output language G0. You use another language as the intermediary language to generate the code; call it G1. In the examples above, the tool behaves as a TCL preprocessor for C++, so the target language is G0 = C++, and the intermediate language is G1 = TCL.

G2 is just the "language" that glues the other two together. It consists of an extremely simple "syntax" with only a few constructs: The '@' sign, the two execution modes, the substitution mechanism using '$'. The preprocessor reads a script that you wrote in G2, generates a new script in G1, then runs that script to finally generate G0. So now you know where the name "G2" came from.

I have implemented the tool so that it uses TCL as the G1 language. Feel free to adapt it for any other language you are more familiar with (it does not even have to be a scripting language).

How to run the preprocessor

You can run the g2pp tool with 0, 2 or 3 parameters:

If you're interested, here's the C implementation of the tool.

More examples

Let's look at a few more examples, to get an idea of what the G2 preprocessor is good for.

In the first example, I want to show you that you can provide your own implementation for the g2puts procedure. The preprocessor automatically sends all escaped output to that procedure, and the default implementation just sends it on to standard output. By rewriting g2puts, you have a way of capturing the escaped output and do something else with it. You can store it in a file rather than send it to stdout. Or you can store it in a string for postprocessing, as you can see here:

# Since all output is redirected to the 'g2puts' procedure,
# we can capture the output by re-implementing 'g2puts'.
# In this example, we make it store the output in a global
# string called 'song'.

set song ""

proc g2puts {str} {
   global song
   append song $str
}

# We start by outputting some text.  The G2 preprocessor will
# take the following piece of text, escape it, and send it to
# our new implementation of 'g2puts'.
@
Old NAME had a farm,
[E-I-E-I-O]
And on that farm he had a ANIMAL,
[E-I-E-I-O]
With a "NOISE" "NOISE" here and a "NOISE" "NOISE" there
etc etc
@

# The string 'song' now contains the above text.  Note that we
# did not need to escape the quotes and [square brackets].

# We now copy the string 3 times and use 'regsub' to fill in different
# values for the placeholders:
foreach animal { {Mcdonald horse neigh} {Bill goat baaaa} {Koen cow mooo} } {
   set tmp_song $song
   regsub -all NAME $tmp_song [lindex $animal 0] tmp_song
   regsub -all ANIMAL $tmp_song [lindex $animal 1] tmp_song
   regsub -all NOISE $tmp_song [lindex $animal 2] tmp_song
   puts $tmp_song
   puts "---"
}
(example20)

This rather silly example produces three verses from the classic song; you can also take a look at the intermediary TCL script that does all the actual work.

The example also shows that you can produce a kind of template string, with placeholders such as "NOISE" or "ANIMAL". Then, using regsub, you replace the placeholders with actual values. As a more interesting application of this trick, have a look at this input file called shapes.g2, which implements three C++ classes by means of template strings. (Compare it to the intermediary TCL script and the final output).



You can contact me at koen.vandamme1 at pandora.be. Or visit my home page.