[OpenC++ logo]OpenC++ Architecture Overview

Grzegorz Jakacki <jakacki at acm dot org>

Intro

This is reverse-engineered developer's documentation of OpenC++ architecture. This text assumes that you are somewhat familiar with OpenC++, at least that you have read other documentation.

This document is by no means complete. Please help fill-in the gaps.

Data Flow

Originally OpenC++ was designed as a framework implemented as an executable (occ) having the following functionality:

  1. dload plugins, as requested on the command line.
  2. Prepocess src files given on the command line.
  3. Parse them according to C++ rules plus OCC features (e.g. metaclass, but also, say, backquotes)
  4. Translate according to the rules given by the set of pre-linked and dloaded subclasses of Metaclass
  5. Optionally preprocess again.
  6. if -S was specified
    1. Compile into dloadable module
  7. if -S was not specified
    1. Compile
    2. Link with libraries given on the command line, obtaining an executable

Apart of occ executable the distribution contains a library (libocc.{a,so,la}), which has exactly the same content as the occ (in particular it also defines main()).

Observe that libraries in (7.2) do not contain libocc.{a,so,la} if you don't specify it explicitly on the command line.

AFAIU library is provided for static linking approach: instead of using occ to create plugins, you could write some new derivatives of Metaclass (or Class in practice), compile them with $CC and link with libocc.a, obtaining an executable that performs exactly the steps (1)-(7), with your derivatives being among "pre-linked" classes affecting the translation in item (4).

If you want to take an advantage of facilities provided by metaclasses pre-linked into occ (one of them being e.g. backquotes), you could
use occ instead of $CC when taking static linking road.

Most of the above functionality is implemented in driver.cc and driver2.cc . Recently (2.8) it has been moved to a BASH script called occ2.

AST Implementation

Basic Structure

Syntax trees are encoded as binary trees. The binary tree is implemented using base class Ptree and two derived classes Leaf and NonLeaf. This is a classic Composite pattern ([GoF]) with the followinf mapping of participants' names:


GoF Book
OpenC++
Component Ptree
Component::Operation()
Ptree::IsLeaf()
Ptree::What()
Ptree::IsA()
Ptree::Display()
Leaf
Leaf
Composite
NonLeaf



[Ptree, Leaf and NonLeaf UML Class Diagram]

C++ parse trees are encoded as binary trees, e.g. "if (x) i = 0;" is represented as:

[AST, coarse]

Observe, that the node n1 is a root of 'if-else' construct, while nodes n2 and n3 are not roots of any constructs. OpenC++ parser encodes this additional information in the type of n1object. Object n1 will actually have type PtreeIfStatement (derived from  LeafNode).

The Ptree inheritance hierarchy contains many classes derived from Leaf (called LeafFoo) and NonLeaf (called PtreeFoo) that have no data members and act just as means of encoding more information about nature of the node. The functionality of these classes is usually limited to overriding What(), Translate() and Typeof() member functions.

[Ptree with fine-grained subclasses, UML]

Every tree can be viewed as a composite of just Leaf and NonLeaf object, or as a type-rich structure of miscellaneous PtreeFoo and LeafFoo nodes. The above tree in type-rich view looks like this:
[Ptree, detailed]

Syntax node discrimination

In OpenC++ syntax trees are usually represesented by a Ptree* pointer to the root node. In order to process the tree one has to find out what the nature of this node is. This can be done on two levels: coarse-grained (discriminating between Leaf and NonLeaf nodes) and fine-grained (discriminating between different kinds of PtreeFoo and LeafFoo nodes).

Coarse-grained discrimination can be achieved by calling Ptree::IsLeaf() on the node.

There are three ways to perform fine-grained discrimination (all examples below assume that p is of type Ptree*):

Dynamic Cast

The simplest way of figuring out the node nature is using RTTI:

if (PtreeIfStatement* s = dynamic_cast<PtreeIfStatement*>(p)) {
    /* process "if-else" */
}
else if (PtreeWhileStatement* w = dynamic_cast<PtreeWhileStatement*>(p)) {
    /* process "while" */
}
else {
    /* proces others */
}

Type querries

Member function Ptree::What() is guaranteed to return unique identifier determining the type of PtreeFoo or LeafFoo object:

switch (p->What()) {
    case ntIfStatement: /* process "if-else" */
    case ntWhileStatement: /* process "while" */
    ...
}

Similar technique can be used with Ptree::IsA():

if (p->IsA(ntIfStatement)) {
    /* process "if-else" */
}

else if (p->IsA(ntWhileStatement)) {
    /* process "while" */
}

else { ... }

If n==p->What() then p->IsA(n) is guaranteed to be true. Converse also holds now, but is not guaranteed to hold in the future releases.

Visitation

You can provide implementation of AbstractTranslatingWalker interface and pass it to Ptree::Translate(). Based on the actual nature of the node, appropriate member function of your implementation will be called.

class MyWalker : public AbstractTranslatingWalker
{
public:
    void TranslateIf(Ptree* p) { /* process "if-else" */ }
    void TranslateWhile(Ptree* p) { /* process "while" */ }
    ...
};

You have to implement all TranslateFoo() member functions defined in AbstractTranslatingWalker. This can be tedious, especially when most implementations are intended to be empty. If this is the case, you can use convenience implementation NoOpTranslatingWalker:

class MyWalker
  : virtual public AbstractTranslatingWalker
  , private NoOpTranslatingWalker
{
    void TranslateIf(Ptree* p) { /* process "if-else" */ }
    void TranslateWhile(Ptree* p) { /* process "while" */ }
}

It is also common, that you would like your walker to traverse the tree and perform specific actions only in certain nodes. In such case convenience implementation TraversingTranslatingWalker may prove handy:

class MyWalker
  : virtual public AbstractTranslatingWalker
  , private TraversingTranslatingWalker
{
    void TranslateFuncall(Ptree* p) { /* process function call */ }
}

To actually perform visitation of syntax tree use

MyWalker walker;
p->Translate(walker);

This scheme is known as Visitor design pattern ([GoF]). The names of participans map onto implementation names as follows:

GoF Book
OpenC++
Visitor AbstractTranslatingWalker
Visitor::VisitConcreteElementFoo()
AbstractTranslatingWalker::TranslateFoo()
ConcreteVisitor
MyWalker
NoOpTranslatingWalker
TraversingTranslatingWalker
Element Ptree
Element::Accept()
Ptree::Translate()
ConcreteElementFoo
PtreeFoo
LeafFoo

Ptree
hierarchy supports also visitation by AbstractTypingWalker. Using this visitation scheme in client's code is discouraged.

Annotations

Ptree interface contains GetEncodedType() and GetEncodedName(). These member functions are used by OpenC++ backend to attach information to certain syntax nodes. This functions are likely to change and clients of Ptree are discouraged from using them.

Notes


Translation


The source code of one translation unit is represented by memory text buffer implemented by class Program. There are several classes derived from Program, e.g. ProgramFile (can constructs itself from a file) or ProgramString (can constructs itself from a c-string). For historical reasons Program is not an abstract class.

The concrete class Lex implements a lexer. Its interface provides access to consecutive tokens. Each Lex object contains a pointer to a Program object and uses it to obtain consecutive characters to be lexed. Lexer also obtains char*-s from the Program and uses them as iterators pointing to the Program's memory buffer throughout the program (in particular such pointers and planted in certain AST nodes to maintain the correspondece between AST and flat text).

The concrete class Parser implements a parser. Each Parser object contains a pointer to Lexer object and uses it to obtain consecutive tokens.

The concrete class ClassWalker implements type elaborator and source-to-source translator. ClassWalker leverages Visitor patter to traverse the supplied AST. ClassWalker maintains a tree of Environment objects that represent scopes. Occasionally ClassWalker methods employ ClassBodyWalker for traversals of certain subtrees. As a result of a call to ClassWalker::Translate(), the new AST is returned. The original and returned trees are passed to Program::MinimumSubst(), which figures out and remembers the minimum set of alterations of the original flat text of the translation unit necessary to obtain the text corresponding to the translated tree. Moreover, ClassWalker object maintains set of code fragments that should be inserted at the beginning and at the end of the source code as a result of translation.

The driver of OpenC++ compiler creates a Program, Lexer, Parser and ClassWalker objects. Within the main loop driver calls a method of Parser to obtain one top-level definition or declaration, passes the obtained AST to ClassWalker::Translate(), and along with translated AST passes it  to Program::MinimumSubst(), as described above. Each loop iteration processes one definition or declaration (however such construct may be arbitrary large, e.g. in case of namespace or class with nested classes).

After the main loop completes, the Program::Write() is called to the output original source characters stream with modifications registered in a process of source-to-source translation.

The diagram below illustrates the relations between described classes.

[translation UML diagram]


Bibliography


Copyright


Documentation  (C) Copyright by Grzegorz Jakacki, 2004. See file COPYING for the full license text..