The d-language interpreter Bart Dierickx 20march2004 ----------------------------------------------------------------- INTRODUCTION The syntax of d is very c-like. The main differences are -functions may return multiple values, which may be used as multiple arguments in other function or cell argument lists. -identifiers (alphanumeric words) are not variables in the same sense as in a c program. Variables do exist however, but these are implemented a pointers to a generic type (* operator). There is no distinction between int, double, string, etc. Variables can strings, doubles, integers or booleans, and implicitly pointers to arrays or structures, depending on the operator applied to them. -operators are largely the same as those known from c, with the same precedence. -quite some c kernel and library functions do not exist (are not relevant or not yet done) -the c preprocessor constructs do not exists (no #define, no conditional compilation, yet file inclusion is made by the include() statement) -a standard library of functions exists, to be called as include (standard.d); THE LANGUAGE token The unit of the language is the token. These are identifiers (strings of characters), and special tokens (separators and operators). Separator tokens { } ; ( ) , and the operators are parsed as separate tokens even if they are not separated by white space. Identifiers must be separated by special tokens or by white space. Comments are understood as white space. Tokens can include white space and even special characters, if surrounded by "". Note that "{" is effectively the same as { . Double quotes thus do not suffice to make a literal {. A longer string as "{}" however is a non-special token. comments exist in two flavors: -beginning with /* and ending with */ -beginning with // and ending with the LF character (ascii 11) expression may be a single token or the result of an hiearchical calculation statement list is either a single statement, ended by ; or a sequence of single statements, surrounded by { } Statements At the highest level d is composed from a list of statements. A statement is typically built from a series of expressions, and ends with a semicolon. statement: [(arguments)] [qualifiers ...] keyword The first token determines the statement type. There is an exhaustive list of allowed statement types. Examples: declare [=] [, ...] ; declares [and assigns] a variable. may be the result of an expression. declaration inside a function body is local. global [=] [, ...]; declares explicitly a global variable, also in a function body * = ; assigns the value pointed to by expression to . *); call of previously declared function while (expression) { statement list } if (expression) {statement list } [else {statement list} ] include (filename) ; includes file in stream. note: loops (while, ..) cannot be maintained across include borders (although an include completely embedded inside a while loop and which does not affect hierarchy is acceptable) function argument list Arguments must come between ( ) and are separated by , . Each argument is a single expression. qualifiers many statements can be extended with qualifiers. These are white space separated expressions, some of which may have argument lists. OPERATORS AND BUILT-IN FUNCTIONS Operators have a largely the "c-feel". The main differences are -there is a single type. the single type encompasses strings, doubles, integers and boolean, and is declared by the generic declare statement. not all c operators are supported. the following do NOT exist: -implicit assignments ++, --, +=, *=, /=, %=, etc -bit-wise logical as ~, >>, <<, &, |, -array and pointer: [], ->, ., & (which do exist implicitly, see further) -casting (which is useless), sizeof -the comma operator , operators that are different: -&* means: global address of local variable -/% explicit integer division -(expression expression): string concatenation declaration is an operator! It may happen at any place in the code. If a variable is not previously declared at run time, it will result in a run time error. juxtaposition of expressions inside () yields string concatenation note that even = is an operator. Assigment happens inside expressions: while(*i=(*i)-1) is thus possible If a token is recognized as an operator, the operator is executed; all other tokens are understood as identifiers. E.g. (1*+) will attempt to multiply 1 and the + character. ARITHMETIC simple arithmitic on double or integer: + - * / explicit integer division: /% modulo (remainder) of the integer division: % logic: && || choice of two: a ? b : c NOTE: unlike in the C-language, a, b and c ARE evaluated, if a is true b is returned else c. numerical order: < > <= >= == != string concatentation ( ) will yield the concatenated string of the values of expressions , and POINTER AND ASSIGNMENT pointer operator: * yield contents of variable named by note: * is the first token of the expression, otherwise * means multiplication &* address_of_pointer: yields the true address of a local variable, which can be passed as function argument function call: *() assignment: *= variable is set to the result of E.g. *name="Janssens" *index2= *(index 3)=(5-1) The last expression will result in 4, and have as side effect that variables index2 and index3 are set to 4. built-in functions emply the same format as runtime defined functions: NUMERIC AND MATH *chr(a) the ascii character a e.g. *chr(123) is string "{" *sqrt(a) the square root of a *pow(a,b) a to the power b *max(a,b,...) maximum value *min(a,b,...) minimum value STRING OPERATIONS *format(s,a) string formatting s is a format string, with EXACTLY ONE % field. This field may be any of the C printf formats. Depending on the format used a is interpreted as string, integer or double. Examples: print (*format("VDD[%.3d]",33)); results in VDD[033] *eq(a,b) return 1 of string a is equal to string b, 0 otherwise also: *ne(), *gt(), *lt(), *ge(), *le() *sins(a,f,b) checks if string a occurs in string b from position f. returns starting position+1 OTHER *item(n, ...) returns item n from the remainder of the argument list starts counting at 0 PRECEDENCE OF OPERATORS ) expressions between brackets; function arguments (left to right) ) unary operators, functions and their arguments, and = , are executed right-to-left ) * / % /% left to right ) + - left to right ) > < >= <= left to right ) == and != left to right ) || and && left to right ) conditional operator ...?...:... recursive at right of the ? is possible ) string concatenation (... ...) left to right VARIABLES AND CONSTANTS generic type variables (covering string, double, integer and boolean) appear in expressions preceeded by a *. Variables are declared in the declare and global statements. Operators are reserved tokens inside expression and act upon other expressions. non-operator tokens are called identifiers. Identifiers may be: -strings (any alphanumerical string not including operator or separator characters) -integers -doubles -in decimal (3.14) or in exponential (314E-2) notation -they may be supplemented with a metric unit character. u=micro, n=nano, m=milli hence 3.14u could mean 3.14 micrometer -boolean 0 is any expression evaluating to zero without suffix (!), or an empty string -boolean 1 is anything else Before being used, variables must be declared Variables declared inside a function body are local, i.e. only visible by the function body where they are used in. [In reality the local variables are stored with prefix "#\003" where # is the function nesting level, which is available as variable *function.nesting] FUNCTIONS (runtime defined) Functions are called in expressions as pointers to an identifier or a pointer to the result of an expression. The function must have been previously (in time) defined in a "function" statement. Example function plus(2) {result (*0 + *1);} ... *print (4* (*plus(2,3)); which would return 20 (4*(2+3)) FUNCTION RETURN VALUES Function may return zero, one or more values. Inside the function body values are returned in that sequence by the "result" statement. The result statement is highly similar to the c return statement, except that it does not end the function. Multiple result statements will yield multiple returned values. In the code calling the function, the list of return values acts as if the function call is replaced by a literal, comma separated, list of the return values. Example function count(1) { while(*0>0) { result(*0); *0=*0-1; } } ... *print (*count(5)); will act as: *print (5,4,3,2,1); ... *print (*count(5)*2); will act as: *print (5,4,3,2,1*2); In the above example we see a multiplication of a multiple return value and a single value. This is in principle invalid, but the interpreter accepts this. Details on the interpreter or compiler may differ so that the result is not portable. Multiple return values should only be used in argument lists. BUG (or feature): When no result() statement is called in a function, it still implicitly returns one, empty, argument! Thus: *print(*count(0)) effectively has one argument, and prints one empty string. PROGRAM FLOW STATEMENTS declare [= ] [, ...] ; declare the storage space and assigns the value for a variable. variable values are available in other statements by the * (pointer) operator. The expression can be a simple value, or an expression Variables declared outside function bodies are global. Variables declared inside function bodies are local (and non static!) Multiple declarations are possible, as: Examples: declare v = vdd; declare (yes)= (*v _sub); declare label= (*v(12) (*yes)) ; this results in *label containing the value vdd12vdd_sub declare a=1, b[12]=4, c; note that b[12] is not an array but a simple single variable global [ = ] [, ...]; as declare, but global, from inside a function body. * = ; assigns a value () to a variable pointed to by the expression . must have been previously declared either locally or globally. Note that * is the pointer operator. function (argument_number) is the declaration of a function. By subsequent calling the function, the statement list (=function body) will be executed. may be the result of an expression. The argument list is NOT declared as in c. The value between brackets is the required number of arguments. An empty () means variable number of arguments. The argument of the call will be available as the local variables *0, *1, etc. in the body. The total number of arguments really given in the call will be available inside the function body as *# * () ; Is the non-returning call of a pre-declared function. may be the result of an expression. But here name does not begin with * as an evaluated function! Note the difference between: *myfunction (12, 11); **myvar (12,11); **otherfunc (13,14) (12,11); where myfunction() is a declared function where myvar is a declared variable, containing the name of a declared function where otherfunc() is a declared function returning the name of a function. The argument list of a function is evaluated left-to-right before executing the body of the function. result(expression); as the c return statement, except that result does not end the function execution Multiple result() statements will yield a list of return values. Must be used inside a fuction body. system ; various system calls and debugging items system variables ; prints a variables and their values at that point in the program input [prompts ...]; the interpreter requires (keyboard?) input at this time. The prompts are evaluated and displayed. The string, stripped from all control characters, character is assigned to the variable. if(expression) statement_list expression is evaluated, and if true, the statement list is executed. else statement_list the statement list is executed if the previous "if" evaluation failed. while(expression) statement_list expression is evaluated, and as long as true, the statement list is executed include (filename) ; filename is an expression. The contents of the file is included at this position. note: loops (while, ..) cannot be maintained across include borders (although an include completely embedded inside a while loop and that does not affect hierarchy is acceptable) STRUCTURES AND ARRAYS As all variables are in fact pointers to allocated space, the distinction between pointers and variables vanishes. The & (address of) operator as in c is superfluous (except in the &* operator). Structures and arrays as such do not exist explicitly, but implicitly one may use freely use the c-feel syntax. Each member of an array or of a structure must be individually declared before being used. *a value pointed to by constant a *(a) value pointed to by expression that returns the constant a which is identical to the previous *a[10] value pointed to by the constant a[10] *(a[(1+9)]) value pointed to by the identifier a[10] *a[(1+9)] value pointed to by a[ , followed by 10, followed by ] This last example is typically a programmer's mistake! one must be aware that [ ] are just plain alphanumerical characters without precedence on the separators ( ) and the operator + *a.buffer[1].value[22] value pointed to by identifier a.buffer[1].value[22] *(a.buffer[(2-1)].value[(11*2)]) the same *a.buffer[(2-1)].value[(11*2)] certainly not the same! probably nonsens Variables can be passed to functions by value or by pointer By value: call myfunction(*a); By pointer call otherfunction (b); In the latter case otherfunction assumes that its 0th argument is a pointer to a global variable b. Note that it is normally not possible to pass a pointer to a local variable. These functions could have been: function myfunction(1) print *0; function otherfunction(1) print *(*0); One could thus easily pass pointers to structures or arrays, on the condition that these are global, or via the &* operator. function initmystructure(2) { global ((*0).numberinstock),((*0).date)=(*today),((*0).customer); *((*0).numberinstock)=(*1); *((*0).date)=(*today); *((*0).customer)=Jansens; } call initmystructure(product[77],0); Pointers to a local variable (and thus local structures and arrays) can be passed as function arguments using the &* operator Example: function printmystructure(1) { print *((*0).name) *((*0).city); print *0; //this would reveal the true address of the local structure global (*0).printed=1; /*new item added to the struct! note that variables accessed using address-of-pointer, are global variables! } function handlemystructure() { declare item[0]; declare item[0].name=jansens; declare item[0].city=antwerp; call printmystructure(&*item[0]); ... } STRING CONSTANTS By the way, strings are not arrays of char as in C. Variables may be anything thus also strings. Any identifier, thus also an operator or separator is potentially a string. spaces, special character and control characters can be embedded in identifiers by surrounding them with "". control character sequences are coded as follows: \n newline \t tab \\ \ \" " \123 ascii character 0123 (octal) example: *format("vvd(%.3d)",*a) RUNNING THE INTERPRETER ======================= Under the Windows command prompt: > D MYPROGRAM.D As a Unix shell command: d myprogram.d