Semantic Analysis/The Tiny language: semantic analysis example and C generation

From Wiki**3

< Semantic Analysis

Considere a seguinte gramárica (ε representa a produção nula), onde os operadores tWRITE (não associativo), = (associativo à direita) e + (associativo à esquerda) têm precedências crescentes.

O primeiro objectivo é escrever, utilizando as classes disponibilizadas na CDK (subclasses de cdk::basic_node), uma especificação YACC para a gramática dada, contendo as acções semânticas que permitem construir a árvore sintáctica.

O segundo objectivo é escreves os métodos de o tradutor para C (padrão de desenho Visitor) a árvore sintáctica anteriormente obtida. Esta tradução implica que os símbolos encontrados sejam recordados, para verificar futuras utilizações. Para tal, a classe compact::symbol (que representa cada símbolo; reutiliza-se aqui o símbolo do compilador Compact) e a classe cdk::symbol_table serão utilizadas. <text> prog → decls exprs '.' decls → ε | decls decl ';' decl → tINT tID | tSTR tID init init → ε | '=' tSTRING exprs → expr | exprs ',' expr expr → tINTEGER | tID | tID '=' expr | expr '+' expr | tWRITE expr </text>

Tabela de Símbolos

A interface pública da tabela de símbolos é a seguinte (foram omitidas todas as partes não públicas, assim como os métodos de construção/destruição):

<cpp> namespace cdk {

 template<typename Symbol>
 class symbol_table {
 public:
   /** Create a new context and make it current. */
   void push();
   /**
    * Destroy the current context: the previous context becomes
    * the current one. If the first context is reached no operation
    * is performed.
    */
   void pop();
   /**
    * Define a new identifier in the local (current) context.
    *
    * @param name the symbol's name.
    * @param symbol the symbol.
    * @return
    *   true if new identifier (may be defined in an upper
    *   context); false if identifier already exists in the
    *   current context.
    */
   bool insert(const std::string &name, std::shared_ptr<Symbol> symbol);
   /**
    * Replace the data corresponding to a symbol in the current context.
    *
    * @param name the symbol's name.
    * @param symbol the symbol.
    * @return
    *   true if the symbol exists; false if the
    *   symbol does not exist in any of the contexts.
    */
   bool replace_local(const std::string &name, std::shared_ptr<Symbol> symbol);
   /**
    * Replace the data corresponding to a symbol (look for the symbol in all
    * available contexts, starting with the innermost one).
    *
    * @param name the symbol's name.
    * @param symbol the symbol.
    * @return
    *   true if the symbol exists; false if the
    *   symbol does not exist in any of the contexts.
    */
   bool replace(const std::string &name, std::shared_ptr<Symbol> symbol);
   /**
    * Search for a symbol in the local (current) context.
    *
    * @param name the symbol's name.
    * @param symbol the symbol.
    * @return
    *   true if the symbol exists; false if the
    *   symbol does not exist in the current context.
    */
   std::shared_ptr<Symbol> find_local(const std::string &name);
   /**
    * Search for a symbol in the avaible contexts, starting with the first
    * one and proceeding until reaching the outermost context.
    *
    * @param name the symbol's name.
    * @param from how many contexts up from the current one (zero).
    * @return
    *    nullptr if the symbol cannot be found in any of the
    *    contexts; or the symbol and corresponding attributes.
    */
   std::shared_ptr<Symbol> find(const std::string &name, size_t from = 0) const;

}; </cpp>

Gramática e Criação de Nós da Árvore Sintáctica Abstracta

In the answer to this exercise, we will consider the node tree defined as shown below (in a YACC-like syntax). Note that careful attention must be given to the choices between cdk::node::Nil nodes and NULL pointers (these can be tested for performing special/exceptional actions, while Nil nodes are more useful when tests are undesirable and uniform behavior on the part of the code generator is desired).

We follow the same nomenclature used in the Compact compiler: LINE is a macro corresponding to the source line and all nodes are either CDK nodes or derived from them (as in Compact). We assume that the class SimpleSymbol, similar to CompactSymbol is defined.

Completing (and, possibly, correcting) the following code, so that it runs, is left as an exercise. <text> %{

  1. include <string>
  2. include <cdk/ast/basic_node.h>
  3. include <cdk/ast/expression_node.h>

%} %union {

 int i;
 std::string *s;
 cdk::basic_node *n;
 cdk::expression_node *e;

} %token tINT tSTR tWRITE %token tINTEGER %token tID tSTRING %type init %type<e> expr %type<n> prog decls exprs decl %% prog: decls exprs '.' { $$ = new program_node(LINE, $1, $2); }

decls: /* empty */ { $$ = new cdk::nil_node(LINE); }

    | decls decl ';' { $$ = new cdk::sequence_node(LINE, $2, $1); }
    ;

decl: tINT tID { $$ = new declaration_node(LINE, tINT, *$2, nullptr); delete $2; }

   | tSTR tID init { $$ = new declaration_node(LINE, tSTR, *$2, $3);      delete $2; }
   ;

init: /* empty */ { $$ = nullptr; /* must match the last argument in declaration_node */ }

   | '=' tSTRING  { $$ = $2; }
   ;

exprs: expr { $$ = new cdk::sequence_node(LINE, $1); }

    | exprs ',' expr   { $$ = new cdk::sequence_node(LINE, $3, $1); }
    ;

expr: tINTEGER { $$ = new cdk::integer_node(LINE, $1); }

   | tID                { $$ = new cdk::identifier_node(LINE, *$1); delete $1; }
   | tID '=' expr       { $$ = new assignment_node(LINE, *$1, $3); delete $1; }
   | expr '+' expr      { $$ = new cdk::add_node(LINE, $1, $3); }
   | tWRITE expr        { $$ = new write_node(LINE, $2); }
   ;

</text>

In this code we assume that each declaration node contains two attributes: an integer, representing the declaration's type; a node, representing the identifier being declared; and an optional expression (nullptr if that is the case), representing the initial value of the variable being declared. Likewise, assignment nodes have two children: an identifier, corresponding to the left-value; and the expression to be assigned. Write nodes have a single child (the expression to be printed) (note that, in this simple grammar, write nodes are also expressions). The program node (the main node) has two children: a node containing declarations and another containing expressions.

The other nodes are as described in the CDK.

Code Generation and Type Validation

The following code presents just the bare minimum for defining the code generation visitors (the omitted parts are as in Compact): <c> void Cwriter::processProgramNode(ProgramNode *const node, int lvl) {

 node->declarations()->accept(this, lvl+2);
 node->expressions()->accept(this, lvl+2);

}

void Cwriter::processSequence(cdk::node::Sequence *const node, int lvl) {

 for (size_t i = 0; i < node->size(); i++) {
   node->node(i)->accept(this, lvl);
 }

}

void Cwriter::processDeclarationNode(DeclarationNode *const node, int lvl) {

 const char *id = ((cdk::node::expression::Identifier*)node->identifier())->value().c_str();
 if (_symtab.insert(id, new SimpleSymbol(node->type(), id, 0))) {
   os() << std::string(lvl+2, ' ');
   if (node->type() == INT) os() << "int " << id;
   else os() << "char *" << id;
 }
 else {
   std::cerr << "FATAL: " << node->lineno() << ": " << s << " redeclared" << std::endl;
   return;
 }
 try {
   TypeValidator v(os(), _symtab);
   node->accept(&v, lvl);
 }
 catch (std::string s) {
   std::cerr << "FATAL: " << node->lineno() << ": " << s << std::endl;
   return;
 }
 if (node->initializer()) {
   os() << " = \"" << *node->initializer() << "\"";
 }
 os() << ";\n";

}

void Cwriter::processInteger(cdk::node::expression::Integer *const node, int lvl) {

 os() << node->value();

}

void Cwriter::processIdentifier(cdk::node::expression::Identifier *const node, int lvl) {

 try {
   TypeValidator v(os(), _symtab);
   node->accept(&v, lvl);
 }
 catch (std::string s) {
   std::cerr << "FATAL: " << node->lineno() << ": " << s << std::endl;
   return;
 }
 os() << id;

}

void Cwriter::processAssignmentNode(AssignmentNode *const node, int lvl) {

 try {
   TypeValidator v(os(), _symtab);
   node->accept(&v, lvl);
 }
 catch (std::string s) {
   std::cerr << "FATAL: " << node->lineno() << ": " << s << std::endl;
   return;
 }
 os() << std::string(lvl+2, ' ');
 node->lvalue()->accept(this, lvl);
 os() << " = ";
 node->rvalue()->accept(this, lvl);
 os() << ";\n";

}

void Cwriter::processADD(cdk::node::expression::ADD *const node, int lvl) {

 try {
   TypeValidator v(os(), _symtab);
   node->accept(&v, lvl);
 }
 catch (std::string s) {
   std::cerr << "FATAL: " << node->lineno() << ": " << s << std::endl;
   return;
 }
 os() << std::string(lvl+2, ' ');
 node->left()->accept(this, lvl);
 os() << " + ";
 node->right()->accept(this, lvl);
 os() << ";\n";

}

void Cwriter::processWriteNode(WriteNode *const node, int lvl) {

 try {
   TypeValidator v(os(), _symtab);
   node->accept(&v, lvl);
 }
 catch (std::string s) {
   std::cerr << "FATAL: " << node->lineno() << ": " << s << std::endl;
   return;
 }
 if (node->argument()->type().name() != ExpressionType::TYPE_INTEGER)
   os() << std::string(lvl+2, ' ') << "printf(\"%d\\n\", " << node->value() << ");\n";
 else
   os() << std::string(lvl+2, ' ') << "printf(\"%s\\n\", \"" << node->value() << "\");\n";

} </c>

The type validation visitor class is as follows (only relevant code shown: the rest is similar to Compact's):

<c> void TypeValidator::processDeclarationNode(DeclarationNode *const node, int lvl) {

 try {
   node->identifier()->accept(this, lvl+2);
   if (node->initializer()) {
     node->initializer()->accept(this, lvl+2);
     if (node->identifier()->type().name() != node->initializer()->type().name())
       throw std::string("wrong type for initializer");
   }
   node->type(node->identifier()->type());
 }
 catch (std::string s) { throw s; }

}

void TypeValidator::processInteger(cdk::node::expression::Integer *const node, int lvl) {

 node->type(ExpressionType(4, ExpressionType::TYPE_INT));

}

void TypeValidator::processIdentifier(cdk::node::expression::Identifier *const node, int lvl) {

 const char *id = node->value().c_str();
 SimpleSymbol *symbol = _symtab.find(id);
 if (!symbol) throw std::string(s) + " undeclared";
 // hackish stuff...
 node->type(ExpressionType(4, symbol->type() == INT ? ExpressionType::TYPE_INT : ExpressionType::TYPE_STRING));

}

void TypeValidator::processAssignmentNode(AssignmentNode *const node, int lvl) {

 try {
   node->lvalue()->accept(this, lvl+4);
   node->rvalue()->accept(this, lvl+4);
   if (node->lvalue()->type().name() != node->rvalue()->type().name())
     throw std::string("wrong types in assignment");
   node->type(node->lvalue()->type());
 }
 catch (std::string s) { throw s; }

}

void TypeValidator::processADD(cdk::node::expression::ADD *const node, int lvl) {

 node->left()->accept(this, lvl+2);
 if (node->left()->type().name() != ExpressionType::TYPE_INT)
   throw std::string("integer expression expected in add operator (left)");
 node->right()->accept(this, lvl+2);
 if (node->right()->type().name() != ExpressionType::TYPE_INT)
   throw std::string("integer expression expected in add operator (right)");
 node->type(ExpressionType(4, ExpressionType::TYPE_INT));

}

void TypeValidator::processWriteNode(WriteNode *const node, int lvl) {

 node->argument()->accept(this, lvl+2);
 if (node->argument()->type().name() != ExpressionType::TYPE_INT ||
     node->argument()->type().name() != ExpressionType::TYPE_STRING)
   throw std::string("wrong type in write expression");
 node->type(ExpressionType(4, ExpressionType::TYPE_INT));

} </c>