Back.module



-- Copyright (C) International Computer Science Institute, 1994.  COPYRIGHT  --
-- NOTICE: This code is provided "AS IS" WITHOUT ANY WARRANTY and is subject --
-- to the terms of the SATHER LIBRARY GENERAL PUBLIC LICENSE contained in    --
-- the file "Doc/License" of the Sather distribution.  The license is also   --
-- available from ICSI, 1947 Center St., Suite 600, Berkeley CA 94704, USA.  --
--------> Please email comments to "sather-bugs@icsi.berkeley.edu". <----------

(* 

CGEN (Compilation to C)

There are many constraints and goals that the C generator has to meet.
Some of these are a result of experience with the C generation of the
previous compiler.

The code has to be readable, at least on demand, to allow debugging the
compiler.  This means more than indenting code; optional explanatory
comments are necessary, and the mangling of the Sather namespace to C
must be reasonable.

The code has to be portable; strictly ANSI-compliant C is emitted.

It must be possible to have symbolic information emitted for
debugging.  In the previ ous compiler, this was done by emitting a
special symbol table that was read in by a modified version of gdb.
The current compiler instead emits readable C structs and gdb may be
used as is.  "#line" directives may be inserted in the generated code.

C compilation must be possible; giant C files tend to break compilers,
time out, or thrash systems to death.  So multiple C files must be
generated.  The name mangling must avoid symbols with alternative
meaning, such as "printf".

C compilation must be fast; it is the bottleneck in compilation.  This
means that the generated files must be appropriate for a parallel make
utility.  In addition, because of overhead files must not be too small,
so more than one class must go in a file.  Only header information
actually needed by a file should be generated.

Because it is so expensive, C compilation should be incremental.  Most
changes to programs are very small, and should be reflected by smaller
compile times.  This also means that global headers are a bad idea,
because if they change all C files must be regenerated.  In addition
code should be clustered in the C files so as to keep changes local,
and the namespace mapping shouldn't propagate changes to other files if
it can be avoided.

Much effort has gone into making a C back-end which meets these goals.
There are numer ous "gotchas"; for example, the name mangling can't be
deterministic (because it might collide with a reserved identifier) but
if it isn't deterministic, then it is possible for "namespace pressure"
to change the mapping used in other files.

To meet the above goals a collection of heuristics is used which was
arrived at after ex hausting experimentation.  A separate namespace is
managed for each C entity (such as a struct).  C names are constructed
deterministically from the Sather namespace (for exam ple, the routine
FOO::bar(BAZ) is mapped to "FOO_bar_BAZ". When namespace colli sions
occur, or the mapping would collide with a forbidden identifier such as
"printf", an alternative name is generated by deterministically
appending the smallest integer which will resolve the collision.

Routines are clustered by the class they are in.  The decision about
what files to create and which classes to place in them is deferred
until all C code is generated.  Then the classes are merged, attempting
to create C files of approximately the same length.  Header informa
tion is generated for each of the resulting files separately, and must
be sorted into a canon ical order while respecting the struct's
topological order in order to guarantee the same order of generation
for each compile.  For each generated file, a "thumbprint" is generated
(several hash values of the text of the file) which is compared against
the previous file of that name, and overwritten only if it has
changed.  A make utility can then recompile only the files which have
changed.

It would be far more efficient to redesign the compiler to recompute
all of it's internal in formation incrementally, such that it could
output the C which had changed without gen erating the C and doing the
comparison.  However, the combination of the above constraints conspire
against this.

*)

cursor.sa
o_iter.sa
o_const.sa
o_cse.sa
o_prefetch.sa
cgen.sa
optimize.sa
code_file.sa
layout.sa
mangle.sa
-- invar.sa
print.sa