A TCL preprocessor for C++

Home > Software > G2 preprocessor

When developing C++ applications, I have often bumped into the limitations of the C preprocessor. Sure, it has #define and #if, but why doesn't it have loops, lists, arrays, procedures, objects, file I/O and more? Wouldn't it be great if you could write something like

foreach animal { Cat Dog Snake } {
   class $animal : public Animal {
      ctor ( ) { }
      dtor ( ) { }
   }
}
and have three classes generated at once, each with its constructor and destructor? Or how about this:
class Dog : Animal {
   role_stream
   guard char* name = "Fido"
   guard tail_length = 12
}
I'm talking about a preprocessor that takes this innocent-looking piece of code, and produces a fully functional class for me, with two member variables (char* name and int tail_length) that have their default values set in the constructor (the string variable even gets the right amount of memory allocated for it), with get/set methods for both member variables, and with a stream operator that dumps an instance and its two members to output. As this example shows, I would like my preprocessor to take over much of the boring, repetitive and non-creative effort of C++ programming, like managing member variables, writing a copy constructor and assignment operator, making objects persistent on disk, etc. That's the itch I'm trying to scratch.

This document describes, step by step, how we can create a tool that does all of this and more. Rather than inventing a new preprocessor language (which would imply a steeper learning curve and a lot more development effort), I have chosen to use TCL as the preprocessor language for C/C++. The flexibility and extensibility of TCL are the main reasons for this choice; if you want, you can easily adapt most of the ideas for use with other languages (Python, Perl, Ruby, ...). The ideas in this document are implemented in a tool, the g2 preprocessor, which comes with a set of TCL scripts that offer full preprocessing power for your everyday C++ programming. Check my homepage for the latest downloads of the tool.

Avoiding "puts"

So, where do we begin? Of course you can generate C++ like this:

foreach animal { Cat Dog Snake } {
   puts "class $animal : public Animal \{"
   puts "public:"
   puts "   $animal () \{"
   puts "      printf(\"Creating a new \\\"$animal\\\".\\n\");"
   puts "   \}"
   puts "   ~$animal ();"
   puts "\}"
   puts ""
}
(example00)
(output)
This approach has some setbacks. First of all: you have to write "puts" all over the place, not to mention all the backslashes you have to juggle. "Cumbersome" is an understatement. It's ugly, it reads like hell, and it ends up slowing you down rather than increasing your proud programmer productivity. I probably don't have to tell you what happens when you forget to escape one little quote somewhere.

The g2 preprocessor deals with this setback "swiftly and with style". It automatically adds "puts" where needed, and it automatically escapes characters that have a special meaning in TCL (including [square brackets] and "quotes"). I call this process the decoration of the output: the tool decorates it with all the quoting that TCL expects. More details on how this is done, are in a separate document about the g2 preprocessor.

The preprocessor tool allows you to write plain TCL. When you want to generate output, you switch to output mode by means of an '@' sign. Here's how that works (with some annotations in color):

Start in "TCL mode"
foreach animal { Cat Dog Snake } {@

After the 'at' sign, we are now in "output mode"
class $(animal) : public Animal {
public:
   $(animal) () {
      printf("Creating a new \"$(animal)\".\n");
   }
   ~$(animal) ();
}

Another 'at' sign will now take us back to "TCL mode":
@}
(example01)

I know this looks weird, and we will make some improvements later. The main point here, is that the '@' sign switches us between two modes. You start in TCL mode, where you write a normal TCL script as you always do. To produce output, you do not invoke "puts", but instead you switch to output mode. Everything you write in output mode, is automatically decorated by the tool. You stay in output mode until the next '@' sign, which takes you back to TCL mode.

When I feed this snippet of code (minus the colored annotations) to the g2 preprocessor, it produces the following TCL script for me:

set G2_ROOT $env(G2_ROOT)
source "$G2_ROOT/g2_stack.tcl"
source "$G2_ROOT/g2_clip.tcl"
source "$G2_ROOT/g2_cxx.tcl"
source "$G2_ROOT/g2_roles.tcl"
source "$G2_ROOT/g2_files.tcl"

proc cr {{len 1}} {
   for { set i 0 } { $i < $len } { incr i } {
      g2puts "\n"
   }
}

foreach animal { Cat Dog Snake } {
g2puts "
class $animal : public Animal \{
public:
   $animal () \{
      printf(\"Creating a new \\\"$animal\\\".\\n\");
   \}
   ~$animal ();
\}

"
}
(animals.tcl)
You can see that the tool automatically inserts a number of "source" commands and a simple procedure called g2puts at the beginning; this code actually comes from a configuration file in which you can add your own commands too. Then you basically see the foreach loop from the original input, containing one large call to g2puts. That call is produced by the '@' signs in the input. The escape characters, as you can verify, are all nicely in place. (The g2puts procedure defaults to be the same as puts -nonewline. We'll discuss it in more detail below.)

That's it. There really isn't anything to it, is there? In fact, you can think of g2 as plain TCL with only one additional "command": A very strange command called '@', which behaves like a rather intelligent replacement for "puts".

You could stop here, and use the preprocessor as I described it. But we can go further: since we now have TCL power, we can write a couple of neat TCL procedures to make the preprocessor even more interesting. We will now introduce several new tricks, all written in pure TCL, making the tool more aware of C++.

Clipboards

Here is a subtle setback of the approach described so far: The output is generated as a "stream", meaning that I cannot come back and re-edit part of the generated code later. Once I do "puts" (or "g2puts"), the output is gone, and I cannot make changes to it anymore. We need some kind of buffer in which we can capture the output and alter it as much as we want. For example, we could output an empty class declaration (already including the closing curly brace) and then come back and fill up the class body at a later point, by inserting new declarations between the class {braces}.

I implemented such a buffering mechanism in pure TCL (why, because that makes it available to our preprocessor of course). Text is stored in a tree of nodes called a clipboard. The tree also contains plugs, where you can insert new text later. A typical situation is that a clipboard contains the text for a class declaration. The class body contains a plug at which I can later insert the methods and member variables. Each of the methods in turn has a plug in the method body, so that I can add its implementation at a later moment. The tree of text nodes in a clipboard more or less mimicks the hierarchy of scopes in a C++ program.

Since TCL is a very flexible language, it is not too difficult to implement special procedures called class, public, ctor and their likes. When you invoke class, it creates a new clipboard, writes some text into it, and places three plugs (one for each access section, public, protected and private). By calling the public procedure, output is automatically redirected to the corresponding plug. The ctor procedure adds a constructor declaration to the appropriate plug in the current class, and again inserts a new plug for the constructor's body. All these procedures work together to handle the following piece of input:

foreach animal { Cat Dog Snake } {
   class $animal : public Animal {
   public:
      ctor ( ) {@
         printf("Creating a new \"$(animal)\"\n");
      @}
      dtor ( ) { }
   }
}

add Snake:H PUBLIC {@
   void rattle();
@}
(example02)

I'm sure you noticed that the '@' signs span a much smaller part of the script now. Only the call to printf is still decorated as before; the rest is now handled entirely in TCL! Also check out the little addition we make at the end: the procedure add inserts new code into the class body of the Snake class, after the "body" of that class has already been closed. You can insert snippets of code almost anywhere you want, thanks to the clipboard mechanism. You can even clear an existing plug and write something new. For instance, let's change the implementation of only one of the three generated constructors:

cb_clear Dog:CC CTOR

add Dog:CC CTOR {@
   printf("The Dog constructor behaves differently now!.\n");
@}
(example02)

The g2 preprocessor turns this input script into a TCL script with correct decoration; running the script results in two C++ files implementing the classes Dog, Cat and Snake. Have a look at the generated header and generated source. Please forgive me for the disappointing indentation: the g2 preprocessor does not generate nicely indented code. If you prefer a cleaner style, just insert a beautifier such as GNU indent in your makefiles, right after running the preprocessor. For now, we will not care about the looks of the resulting code, as long as it compiles and does what we expect it to do.

Finally, I should inform you that the class procedure evaluates its final argument using uplevel. The final argument is the stuff that starts with the opening brace '{' and runs all the way to the closing brace '}', with the class "body" in between. We're actually kinda lucky here: TCL happens to have the same braces for grouping things together as C++, so that we can write TCL code that looks and feels a lot like C++. Be careful though: you must put the opening brace on the same line as the call to class, otherwise the TCL interpreter will not understand that the whole block is an argument to the class procedure. In other words:

class A : public B {      <--- correct
...
}

class C : public D
{                         <--- wrong!  Should be on same line as 'class'
...
}
It's easy to forget that you're writing TCL now, not C++ :-)

Functions and methods

Apart from the constructor and destructor in our previous examples, you can of course also add other functions/methods to your class. You can do this in two ways:

class Animal {
   ctor ( ) {
   }

   @
   // First way: just write the method in output mode, between
   // two 'at' signs.
   char* get_name() { return "Fido"; }
   @
   // Second way: in TCL mode, use the procedure 'f'.
   f :pvc void talk(num) {
      // Abstract function with 'int num' argument
   }
}
(example03)

You already knew the first way: just switch to output mode with the '@' sign, and you can add whatever you like in the class body (including methods, member variables, comments, preprocessor directives, or any total gibberish that tickles your fancy).

The second way uses a TCL procedure called f, which takes the method's name ("talk"), its return type ("void"), a weird string that I'll explain in a moment (":pvc"), and a list of parameters. This generates the following snippet of header file:

//
// class Animal
// <<Animal>>
//
class Animal
{
public:
   Animal();

   // First way: just write the method in output mode, between
   // two 'at' signs.
   char* get_name() { return "Fido"; }
   // Second way: in TCL mode, use the procedure 'f'.
   virtual void talk(int num) const = 0;
protected:
private:
};
(example03 generated header)

I should explain the following things:

You should find out for yourself whether or not you like this approach. Balance the disadvantage (the slightly different syntax, maybe the annoying modifier string) against the advantages (header and source are generated at once, default types and default values are automatic). Better yet, you can alter the implementation of f and send me your improvements. Or you can always write your own "nicer" wrappers around it. Isn't open source great?

Function tracing

Here's another feature that the preprocessor helps you with. By setting the TCL variable g2_trace_ftions to 1, we ask the preprocessor to put a snippet of trace code at the start of each function or method body. This trace code could be something simple like printing a message on the screen (e.g. "Now entering function A"). But you can change it to do other interesting things, such as setting up an automatic object, local to the function's scope. The object then gets deleted automatically when the function scope is left, so that you can print a "goodbye" message too. This allows you to build an on-line stack of objects, one for each function called, so that a C function can find out who called it. Or you can print special "hello" and "goodbye" messages to draw automatic Message Sequence Charts for a running application.

The tracing code is produced by the following TCL procedure (its default implementation is shown):

proc f_trace {f_kind f_mod f_typ f_class f_nam f_tag params} {
   if { $f_class != "" && $f_kind != "ctor" && $f_kind != "dtor" } {
      set f_nam "${f_class}::${f_nam}"
   }

   if { $f_tag == "" } {
      g2puts "   printf(\"Entering $f_kind $f_nam.\\n\");\n"
   } else {
      g2puts "   printf(\"Entering $f_kind $f_nam <<$f_tag>>.\\n\");\n"
   }
}
The procedure is called for every function or method (including constructor and destructor) when tracing is active. It receives the following parameters: The default implementation of the f_trace procedure is to print a message; it does not use the params argument at all. But as I said, you can change the implementation (TCL allows you to change the implementation of an existing procedure as often as you want).

See example04 (nothing much to see, except the first line that sets g2_trace_ftions). The generated source file shows the trace message in each method.

If you want to activate tracing for only a limited number of functions, you can set g2_trace_ftions back to 0 (its default value), and place a 't' in the modifier string for each function you want to trace. For example:

   ctor :it () {@
      blah blah blah
   @}
This switches on tracing for the constructor only. By the way, the 'i' stands for "inline", which means that the constructor will not be generated int the cc file but in the header file, with an "inline" keyword prepended. So inlining a function is now as easy as adding a single letter to the modifier string (rather than copy/pasting the entire implementation from one file to another).

Roles

Object oriented design hinges on the fact that each object has a specific task. One of the most difficult jobs in software design is dividing the tasks in a unique and efficient way. When your objects have no clearly defined task from the start, you will almost always run into problems later.

Tasks can usually be divided into smaller tasks recursively. The simplest tasks are things like "managing the value of a data member", or "dispatching to a set of other objects", or "creating a new object of class X". You can model most of these simple tasks with design patterns, which are described in terms of simple problems and solutions. When you assign a task to an object, you can typically break it down into a combination of well-known design patterns plus a (hopefully not too large) number of application-specific "patterns" or tasks.

The point of all this theory is that a good preprocessor can extend design patterns into our implementation. We should not only use design patterns to ease the design of our object oriented system; we should also find a way to ease the implementation and reuse "implementation patterns". And what do you know: the perfect place for "implementation patterns" is in a strong, object-aware preprocessor.

The implementation equivalent of a design pattern is commonly known as a role. I was first introduced to this concept by my colleague Luc De Ceulaer, and I adapted his original ideas for code generation. A role is a small piece of the functionality of an object; its implementation can be distributed over many of the object's methods, including the constructor, destructor and operators. I know that this is not a very tight and formal definition (you can shoot holes in it by blowing your nose, so to speak). Rather than bore you to death with the fine details and the consequences on object oriented methodologies X and Y, let me just paint the picture with a few examples: regular classes, guarding data members, and an output operator.

Regular classes

A class that has a default constructor, destructor, copy constructor, and assignment operator, is called regular. The g2 preprocessor is aware of this and can generate the signatures of these four methods for you. You just have to declare that your class is regular, by giving it the "regular" role:

set g2_trace_ftions 1

class Animal {
   role_regular
}
(example05)

Note that we request function tracing by setting g2_trace_ftions to 1. The preprocessor now knows enough to generate the following header and cc files (only the Animal class is shown):

header file:
class Animal
{
public:
   Animal();
   ~Animal();
   Animal(const Animal& other);
   Animal& operator=(const Animal& other);
protected:
private:
};

source file:
// <<CTOR>>
Animal::Animal()
{
   printf("Entering ctor Animal <<CTOR>>\n");
   // Animal ctor
}

// <<DTOR>>
Animal::~Animal()
{
   printf("Entering dtor ~Animal <<DTOR>>\n");
   // Animal dtor
}

// <<COPY>>
Animal::Animal(const Animal& other)
{
   printf("Entering ctor Animal <<COPY>>\n");
   // Animal copy
}

// <<ASSIGN>>
Animal& Animal::operator=(const Animal& other)
{
   printf("Entering method Animal::operator= <<ASSIGN>>\n");
   if(&other != this)
   {
      // Animal assign
   }
   return *this;
}

All this code comes from one call to role_regular! Since we enabled function tracing, all the generated functions print a message when they are entered. The assignment operator even checks for self-assignment (which is a typical thing to forget when you're a busy programmer), and returns a reference to the object itself.

Obviously, these methods are not very useful the way they are, but you can insert your implementation into them by using plugs and references:

You can fine-tune the code generation by providing options to the role_regular procedure. For example, append -ctor "no" to the call if you do not want the default constructor to be genarated; the other three are still generated as before. You can also pass -ctor_mod "it" to make the constructor inline ('i') and traced ('t'), or any other valid combination of modifier letters. Similar options exist for the other three methods. (I should write a reference guide at some point, describing all the available options.)

Not every little detail of the code generation can be tweaked with options, though. In a previous implementation of the tool, I had options for everything, including the name of the argument of the assignment operator (which is now just called 'other'; the option allowed you to avoid name clashes with data members or functions that happened to be called 'other'). As the number of options grew, the usability of the roles was jeopardized. In the current implementation, I decided to keep only the most frequently used options. If you run into a problem (such as the name clash with 'other') that you cannot fix with options, the best thing you can do is make a backup copy of the role_regular procedure, and change its implementation to fit your needs. The TCL code is available open-source, so you can alter it in any way you like.

So only the most common options are available as actual "options", and the rest should be solved by altering the role's implementation. That way, simplicity is maintained (which is very important in this kind of tools), while still providing maximal flexibility (tweaking an open source procedure is the ultimate form of "options" :-) ). By the way, when you make changes to any of the TCL procedures in this tool, please send them to me so I can include them in the distribution for everybody's enjoyment.

Data members

"Having a data member" does not really sound like a complicated task. But it may involve quite a bit of work: allocation/deallocation of memory, access through a get/set function, assignment of default values. All this is taken care of by the "guardian" role, implemented by the guard procedure.

You guard a data member like this:

class Animal {
   role_regular

protected:
   guard char* name= "Fido"
}
(example07)

With a few restrictions due to TCL syntax, you can write an almost normal C++ declaration after the call to guard. You can even provide a default value right there, in the class body. Here is the resulting code:

header file:
class Animal
{
public:
   Animal();
   ~Animal();
   Animal(const Animal& other);
   Animal& operator=(const Animal& other);
   void init();
   char* get_name() const;
   void set_name(char* _v);
protected:
   char* name;
private:
};

// <<GET_NAME>>
inline char* Animal::get_name() const
{
   printf("Entering method Animal::get_name <<GET_NAME>>\n");
   return name;
}

// <<SET_NAME>>
inline void Animal::set_name(char* _v)
{
   printf("Entering method Animal::set_name <<SET_NAME>>\n");
   if(!_v) return;
   if(name) delete [] name;
   name = new char[strlen(_v)+1];
   strcpy(name, _v);
}

source file:
// <<CTOR>>
Animal::Animal()
{
   printf("Entering ctor Animal <<CTOR>>\n");
   // Animal ctor
   name = new char[strlen("Fido")+1];
   strcpy(name, "Fido");
}

// <<DTOR>>
Animal::~Animal()
{
   printf("Entering dtor ~Animal <<DTOR>>\n");
   // Animal dtor
   delete[] name;
}

// <<COPY>>
Animal::Animal(const Animal& other)
{
   printf("Entering ctor Animal <<COPY>>\n");
   // Animal copy
   delete[] name;
   name = new char[strlen(other.name)+1];
   strcpy(name, other.name);
}

// <<ASSIGN>>
Animal& Animal::operator=(const Animal& other)
{
   printf("Entering method Animal::operator= <<ASSIGN>>\n");
   if(&other != this)
   {
      // Animal assign
      delete[] name;
      name = new char[strlen(other.name)+1];
      strcpy(name, other.name);
   }
   return *this;
}

Pay attention to the following features:

Again, you can provide some options for the generator. For example,

guard int i= 5 -assign no
prevents generation of the assignment in the assignment operator (so that every object keeps its original value of i when assigned to). And
guard int i= 5 -get_mod "vc"
generates a virtual const get-method (the default, as you can see in the sample code above, is inline const).

The combination of role_regular and guard shows that roles can work together, enhancing/augmenting each other's generated code thanks to the plugs and clipboard references they create. Then you can go and add some of your own code in addition.

Printing an object

Here's one final example, without much further explanation. By adding only one line of g2 code (printed in red below), we give the object a new role: streaming itself to output in a human-readable form.

class Owner {
   role_regular
   role_stream

protected:
   guard char* name= John
}

class Animal {
   role_regular
   role_stream

   f init() {
      ref Animal:CTOR
   }

protected:
   guard char* name= "Fido"
   guard Owner* owner= 0
}
(example08)

If we now stream an object to output, like this:

   Animal a1;
   Owner o1;
   a1.set_owner(&o1);
   cout << "a1= " << a1 << endl;
we obtain a nicely indented print of the object's contents:
Entering function main.
Entering ctor Animal <<CTOR>>
Entering ctor Owner <<CTOR>>
Entering method Animal::set_owner <<SET_OWNER>>
a1= Animal {
   name= "Fido"
   owner:
   Owner {
      name= "John"
   };   // End of Owner
};   // End of Animal

The Animal object streams its owner too, correctly indented and with all its data members. This is possible since the "Owner*" data member is recognized as pointing to another class. (You'll need to change the role's implementation for objects that have circular pointers to each other, because they would cause an infinite loop when printing. I may fix this in a future version.)

Also note that even though function tracing is on, no message is printed for the '<<' operator. This is because the role_stream role temporarily switches tracing off when it generates this operator (it knows that the operator will print some message anyway).

One interesting option for this role is the -streamtype, which defaults to ostream. You can replace this with any other class that has the '<<' operators defined, so that you can stream objects to other media than cout or cerr.

Putting it all together

To wrap it up, I prepared a silly example that combines some of the techniques we have discussed: using TCL as a preprocessor, doing variable substitution in the generated code, adding methods, getting the methods traced, and using roles for common tasks. Have a look at the input file, the generated header and source, and the output of the running program.

If you have any comments, and especially if you would like to cooperate by writing your own roles or other extensions to the preprocessor, please contact me here:

koen.vandamme1 at pandora.be
(or visit my homepage)

Many, many thanks to Luc De Ceulaer and the people on his team, who originally introduced me to the concept of role-based design, and who contributed quite a lot to developing the ideas presented in this paper. Though the implementation has changed a lot since those days, the principles are still the same.