OpenC++
Architecture Overview
Grzegorz Jakacki <jakacki at acm
dot org>
Intro
This is reverse-engineered developer's documentation of OpenC++
architecture. This text assumes that you are somewhat familiar with
OpenC++, at least that you have read other documentation.
This document is by no means complete. Please help fill-in the gaps.
Data Flow
Originally OpenC++ was designed as a framework implemented as an
executable (occ) having
the following functionality:
- dload plugins, as requested on the command line.
- Prepocess src files given on the command line.
- Parse them according to C++ rules plus OCC features (e.g. metaclass, but also,
say, backquotes)
- Translate according to the rules given by the set of pre-linked
and dloaded subclasses of Metaclass
- Optionally preprocess again.
- if -S was specified
- Compile into dloadable module
- if -S was not specified
- Compile
- Link with libraries given on the command line, obtaining
an executable
Apart of occ executable
the distribution contains a library (libocc.{a,so,la}), which has exactly the same content as the occ (in particular it also
defines main()).
Observe that libraries in (7.2) do
not contain libocc.{a,so,la} if you
don't specify it explicitly on the command line.
AFAIU library is provided for static linking approach: instead of using
occ to create plugins, you
could write some new derivatives of Metaclass (or Class in practice), compile
them with $CC and link with libocc.a,
obtaining an executable that performs exactly the steps (1)-(7), with
your derivatives being among "pre-linked" classes affecting the
translation in item (4).
If you want to take an advantage of facilities provided by metaclasses
pre-linked into occ (one
of them being e.g. backquotes), you could
use occ instead of $CC
when taking static linking road.
Most of the above functionality is implemented in driver.cc and driver2.cc . Recently (2.8) it
has been moved to a BASH script called occ2.
AST Implementation
Basic Structure
Syntax trees are encoded as binary trees. The binary tree is
implemented using base class Ptree
and two derived classes Leaf
and NonLeaf. This is a
classic Composite pattern ([GoF]) with the followinf mapping of
participants' names:
GoF
Book
|
OpenC++
|
Component |
Ptree |
Component::Operation()
|
Ptree::IsLeaf()
Ptree::What()
Ptree::IsA()
Ptree::Display()
|
Leaf
|
Leaf
|
Composite
|
NonLeaf |
C++ parse trees are encoded as binary trees, e.g. "if (x) i = 0;" is
represented as:
Observe, that the node n1 is
a root of 'if-else' construct, while nodes n2
and n3 are not roots of any
constructs. OpenC++ parser encodes this additional information in the
type of n1object. Object n1 will actually have type PtreeIfStatement (derived
from LeafNode).
The Ptree inheritance
hierarchy contains many classes derived from Leaf (called LeafFoo)
and NonLeaf (called PtreeFoo)
that have no data members and act just as means of encoding more
information about nature of the node. The functionality of these
classes is usually limited to overriding What(), Translate() and Typeof() member functions.
Every tree can be viewed as a composite of just Leaf and NonLeaf object, or as a
type-rich structure of miscellaneous PtreeFoo
and LeafFoo nodes. The above tree in
type-rich view looks like this:
Syntax node discrimination
In OpenC++ syntax trees are usually represesented by a Ptree* pointer to the root
node. In order to process the tree one has to find out what the nature
of this node is. This can be done on two levels: coarse-grained
(discriminating between Leaf
and NonLeaf nodes) and
fine-grained (discriminating between different kinds of PtreeFoo
and LeafFoo nodes).
Coarse-grained discrimination can be achieved by calling Ptree::IsLeaf() on the node.
There are three ways to perform fine-grained discrimination (all
examples below assume that p
is of type Ptree*):
Dynamic Cast
The simplest way of figuring out the node nature is using RTTI:
if
(PtreeIfStatement* s = dynamic_cast<PtreeIfStatement*>(p)) {
/* process
"if-else" */
}
else if (PtreeWhileStatement* w =
dynamic_cast<PtreeWhileStatement*>(p)) {
/* process
"while" */
}
else {
/* proces
others */
}
Type querries
Member function Ptree::What() is guaranteed to
return unique identifier determining the type of PtreeFoo
or LeafFoo object:
switch
(p->What()) {
case
ntIfStatement: /* process "if-else" */
case
ntWhileStatement: /* process "while" */
...
}
Similar technique can be used with Ptree::IsA():
if
(p->IsA(ntIfStatement)) {
/* process "if-else" */
}
else if
(p->IsA(ntWhileStatement)) {
/* process "while" */
}
else { ... }
If n==p->What() then p->IsA(n) is guaranteed to
be true. Converse also holds now, but is not guaranteed to hold in the
future releases.
Visitation
You can provide implementation of AbstractTranslatingWalker
interface and pass it to Ptree::Translate().
Based on the actual nature of the node, appropriate member function of
your implementation will be called.
class
MyWalker : public AbstractTranslatingWalker
{
public:
void
TranslateIf(Ptree* p) { /* process "if-else" */ }
void
TranslateWhile(Ptree* p) { /* process "while" */ }
...
};
You have to implement all TranslateFoo() member functions
defined in AbstractTranslatingWalker.
This can be tedious, especially when most implementations are intended
to be empty. If this is the case, you can use convenience
implementation NoOpTranslatingWalker:
class
MyWalker
: virtual public
AbstractTranslatingWalker
, private
NoOpTranslatingWalker
{
void
TranslateIf(Ptree* p) { /* process "if-else" */ }
void
TranslateWhile(Ptree* p) { /* process "while" */ }
}
It is also common, that you would like your walker to traverse the tree
and perform specific actions only in certain nodes. In such case
convenience implementation TraversingTranslatingWalker
may prove handy:
class
MyWalker
: virtual public
AbstractTranslatingWalker
, private
TraversingTranslatingWalker
{
void
TranslateFuncall(Ptree* p) { /* process function call */ }
}
To actually perform visitation of syntax tree use
MyWalker
walker;
p->Translate(walker);
This scheme is known as Visitor design pattern ([GoF]). The
names of participans map onto implementation names as follows:
GoF
Book
|
OpenC++
|
Visitor |
AbstractTranslatingWalker |
Visitor::VisitConcreteElementFoo()
|
AbstractTranslatingWalker::TranslateFoo()
|
ConcreteVisitor
|
MyWalker
NoOpTranslatingWalker
TraversingTranslatingWalker
|
Element |
Ptree |
Element::Accept()
|
Ptree::Translate()
|
ConcreteElementFoo
|
PtreeFoo
LeafFoo
|
Ptree hierarchy supports also visitation by AbstractTypingWalker. Using
this visitation scheme in client's code is discouraged.
Annotations
Ptree interface contains GetEncodedType() and GetEncodedName(). These member
functions are used by OpenC++ backend to attach information to certain
syntax nodes. This functions are likely to change and clients of Ptree are discouraged from
using them.
Notes
- For historical reasons Ptree
is not an abstract class.
- For historical resons visitor interface uses abstract element
type (Ptree), not concrete
element types (PtreeFoo, LeafFoo).
Implementation of TraverseFoo() has to perform
downcast on its own.
Translation
The source code of one translation unit is represented by memory text
buffer implemented by class Program.
There are several
classes derived from Program,
e.g. ProgramFile (can
constructs itself from a file) or ProgramString
(can constructs itself from a c-string). For historical reasons Program is not an abstract
class.
The concrete class Lex
implements a lexer. Its interface provides
access to consecutive tokens. Each Lex object contains a pointer
to a Program object and
uses it to obtain consecutive characters to be lexed. Lexer also
obtains char*-s from the Program and uses them as
iterators pointing to the Program's
memory buffer throughout the program (in particular such pointers and
planted in certain AST nodes to maintain the correspondece between AST
and flat text).
The concrete class Parser
implements a parser. Each Parser
object contains a pointer to Lexer
object and uses it to obtain consecutive tokens.
The concrete class ClassWalker
implements type elaborator and source-to-source translator. ClassWalker leverages Visitor
patter to traverse the supplied AST. ClassWalker maintains a tree of
Environment objects that
represent scopes. Occasionally ClassWalker
methods employ ClassBodyWalker
for traversals of certain subtrees. As a result of a call to ClassWalker::Translate(), the
new AST is returned. The original and returned trees are passed to Program::MinimumSubst(), which
figures out and remembers the minimum set of alterations of the
original flat text of the translation unit necessary to obtain the text
corresponding to the translated tree. Moreover, ClassWalker object maintains
set of code fragments that should be
inserted at the beginning and at the end of the source code as a result
of translation.
The driver of OpenC++ compiler creates a Program, Lexer, Parser and ClassWalker objects. Within the
main loop driver calls a method of Parser
to obtain one top-level definition or declaration,
passes the obtained AST to ClassWalker::Translate(),
and along with translated AST passes it to Program::MinimumSubst(), as
described above. Each loop iteration processes one definition or
declaration (however such construct may be arbitrary large, e.g. in
case of namespace or class with nested classes).
After the main loop completes, the Program::Write() is called to
the output original source characters stream with modifications
registered in a process of source-to-source translation.
The diagram below illustrates the relations between described classes.
![[translation UML diagram]](architecture-translation.png)
Bibliography
- [GoF] Gamma, Helm, Johnson, Vlissides Design Patterns
Copyright
Documentation (C) Copyright by Grzegorz Jakacki, 2004. See file
COPYING
for the full license text..