PUMA Library Reference Manual
|
#include <Puma/Syntax.h>
Syntactic analysis base class.
Implements the top-down parsing algorithm (recursive descend parser). To be derived to implement parsers for specific grammars. Provides infinite look-ahead.
This class uses a tree builder object (see Builder) to create the syntax tree, and a semantic analysis object (see Semantic) to perform required semantic analyses of the parsed code.
The parse process is started by calling Syntax::run() with a token provider as argument. Using the token provider this method reads the first core language token from the input source code and tries to parse it by applying the top grammar rule.
The top grammar rule has to be provided by reimplementing method Syntax::trans_unit(). It may call sub-rules according to the implemented language-specific grammar. Example:
For context-sensitive grammars it may be necessary in the rules of the grammar to perform first semantic analyses of the parsed code (to differentiate ambigous syntactic constructs, resolve names, detect errors, and so one). Example:
If a rule could be parsed successfully the tree builder is used to create a CTree based syntax tree (fragment) for the parsed rule. Failing grammar rules shall return NULL. The result of the top grammar rule is the root node of the abstract syntax tree for the whole input source code.
Classes | |
class | State |
Parser state, the current position in the token stream. More... | |
Public Types | |
typedef std::bitset< TOK_NO > | tokenset |
Public Member Functions | |
pointcut | parse_fct () |
Interface for aspects that affect the syntax and parsing process. | |
pointcut | check_fct () |
pointcut | in_syntax () |
pointcut | rule_exec () |
pointcut | rule_call () |
pointcut | rule_check () |
CTree * | run (TokenProvider &tp) |
Start the parse process. | |
template<class T> | |
CTree * | run (TokenProvider &tp, bool(T::*rule)()) |
Start the parse process at a specific grammar rule. | |
virtual void | configure (Config &c) |
Configure the syntactic analysis object. | |
TokenProvider * | provider () const |
Get the token provider from which the parsed tokens are read. | |
Token * | problem () const |
Get the last token that could not be parsed. | |
bool | error () const |
Check if errors occured during the parse process. | |
bool | look_ahead (int token_type, unsigned n=1) |
Look-ahead n core language tokens and check if the n-th token has the given type. | |
bool | look_ahead (int *token_types, unsigned n=1) |
Look-ahead n core language tokens and check if the n-th token has one of the given types. | |
int | look_ahead (unsigned n=1) |
Look-ahead one core language token. | |
bool | consume () |
Consume all tokens until the next core language token. | |
bool | predict_1 (const tokenset &ts) |
template<class T> | |
bool | parse (CTree *(T::*rule)()) |
Parse the given grammar rule. | |
template<class T> | |
bool | seq (CTree *(T::*rule)()) |
Parse a sequence of the given grammar rule. | |
template<class T> | |
bool | seq (bool(T::*rule)()) |
Parse a sequence of the given grammar rule. | |
template<class T> | |
bool | list (CTree *(T::*rule)(), int separator, bool trailing_separator=false) |
Parse a sequence of rule-separator pairs. | |
template<class T> | |
bool | list (CTree *(T::*rule)(), int *separators, bool trailing_separator=false) |
Parse a sequence of rule-separator pairs. | |
template<class T> | |
bool | list (bool(T::*rule)(), int separator, bool trailing_separator=false) |
Parse a sequence of rule-separator pairs. | |
template<class T> | |
bool | list (bool(T::*rule)(), int *separators, bool trailing_separator=false) |
Parse a sequence of rule-separator pairs. | |
template<class T> | |
bool | catch_error (bool(T::*rule)(), const char *msg, int *finish_tokens, int *skip_tokens) |
Parse a grammar rule automatically catching parse errors. | |
bool | parse (int token_type) |
Parse a token with the given type. | |
bool | parse (int *token_types) |
Parse a token with one of the given types. | |
bool | parse_token (int token_type) |
Parse a token with the given type. | |
bool | opt (bool dummy) const |
Optional rule parsing. | |
Builder & | builder () const |
Get the syntax tree builder. | |
Semantic & | semantic () const |
Get the semantic analysis object. | |
virtual bool | trans_unit () |
Top parse rule to be reimplemented for a specific grammar. | |
virtual void | handle_directive () |
Handle a compiler directive token. | |
State | save_state () |
Save the current parser state. | |
void | forget_state () |
Forget the saved parser state. | |
void | restore_state () |
Restore the saved parser state. | |
void | restore_state (State state) |
Restore the saved parser state to the given state. | |
void | set_state (State state) |
Overwrite the parser state with the given state. | |
bool | accept (CTree *tree, State state) |
Accept the given syntax tree node. | |
CTree * | accept (CTree *tree) |
Accept the given syntax tree node. | |
Token * | locate_token () |
Skip all non-core language tokens until the next core-language token is read. | |
void | skip () |
Skip the current token. | |
void | skip_block (int start, int end, bool inclusive=true) |
Skip all tokens between start and end, including start and end token. | |
void | skip_curly_block () |
Skip all tokens between '{' and '}', including '{' and '}'. | |
void | skip_round_block () |
Skip all tokens between '(' and ')', including '(' and ')'. | |
bool | parse_block (int start, int end) |
Parse all tokens between start and end, including start and end token. | |
bool | parse_curly_block () |
Parse all tokens between '{' and '}', including '{' and '}'. | |
bool | parse_round_block () |
Parse all tokens between '(' and ')', including '(' and ')'. | |
bool | skip (int stop_token, bool inclusive=true) |
Skip all tokens until a token with the given type is read. | |
bool | skip (int *stop_tokens, bool inclusive=true) |
Skip all tokens until a token with one of the given types is read. | |
bool | is_in (int token_type, int *token_types) const |
Check if the given token type is in the set of given token types. | |
Static Public Member Functions | |
template<typename SYNTAX, typename RULE> | |
static bool | seq (SYNTAX &s) |
Parse a sequence of the given grammar rule by calling RULE::check() in a loop. | |
template<typename SYNTAX, typename RULE> | |
static bool | list (SYNTAX &s, int sep, bool trailing_sep=false) |
Parse a sequence of rule-separator pairs by calling RULE::check() in a loop. | |
template<typename SYNTAX, typename RULE> | |
static bool | list (SYNTAX &s, int *separators, bool trailing_sep=false) |
Parse a sequence of rule-separator pairs by calling RULE::check() in a loop. | |
template<class SYNTAX, class RULE> | |
static bool | catch_error (SYNTAX &s, const char *msg, int *finish_tokens, int *skip_tokens) |
Parse a grammar rule automatically catching parse errors. | |
template<class RULE1, class RULE2, class SYNTAX> | |
static bool | ambiguous (SYNTAX &s) |
First parse rule1 and if that rule fails discard all errors and parse the rule2. | |
Public Attributes | |
TokenProvider * | token_provider |
Token provider for getting the tokens to parse. | |
Protected Member Functions | |
Syntax (Builder &b, Semantic &s) | |
Constructor. | |
virtual | ~Syntax () |
Destructor. | |
typedef std::bitset<TOK_NO> Puma::Syntax::tokenset |
Constructor.
b | The syntax tree builder. |
s | The semantic analysis object. |
|
inlineprotectedvirtual |
Destructor.
Accept the given syntax tree node.
Returns the given node.
tree | Tree to accept. |
Accept the given syntax tree node.
If the node is NULL then the parser state is restored to the given state. Otherwise all saved states are discarded.
tree | Tree to accept. |
state | The saved state. |
|
inlinestatic |
First parse rule1 and if that rule fails discard all errors and parse the rule2.
RULE1 | The class that represents the first grammar rule |
RULE2 | The class that represents the second grammar rule |
SYNTAX | The type of syntax |
s | The syntax object on which the rules should be executed |
|
inline |
Get the syntax tree builder.
bool Puma::Syntax::catch_error | ( | bool(T::* | rule )(), |
const char * | msg, | ||
int * | finish_tokens, | ||
int * | skip_tokens ) |
Parse a grammar rule automatically catching parse errors.
rule | The rule to parse. |
msg | The error message to show if the rule fails. |
finish_tokens | Set of token types that abort parsing the rule. |
skip_tokens | If the rule fails skip all tokens until a token is read that has one of the types given here. |
|
static |
Parse a grammar rule automatically catching parse errors.
SYNTAX | The type of syntax |
RULE | The class that represents the grammar rule |
s | A pointer to the syntax object on which the rule should be executed |
msg | The error message to show if the rule fails. |
finish_tokens | Set of token types that abort parsing the rule. |
skip_tokens | If the rule fails skip all tokens until a token is read that has one of the types given here. |
pointcut Puma::Syntax::check_fct | ( | ) |
|
virtual |
Configure the syntactic analysis object.
c | The configuration object. |
Reimplemented in Puma::CCSyntax, Puma::CSyntax, and Puma::InstantiationSyntax.
|
inline |
Consume all tokens until the next core language token.
|
inline |
Check if errors occured during the parse process.
void Puma::Syntax::forget_state | ( | ) |
Forget the saved parser state.
|
inlinevirtual |
Handle a compiler directive token.
The default handling is to skip the compiler directive.
Reimplemented in Puma::CSyntax.
pointcut Puma::Syntax::in_syntax | ( | ) |
bool Puma::Syntax::is_in | ( | int | token_type, |
int * | token_types ) const |
Check if the given token type is in the set of given token types.
token_type | The token type to check. |
token_types | The set of token types. |
|
inline |
Parse a sequence of rule-separator pairs.
rule | The rule to parse at least once. |
separators | The separator tokens. |
trailing_separator | True if a trailing separator token is allowed. |
|
inline |
Parse a sequence of rule-separator pairs.
rule | The rule to parse at least once. |
separator | The separator token. |
trailing_separator | True if a trailing separator token is allowed. |
|
inline |
Parse a sequence of rule-separator pairs.
rule | The rule to parse at least once. |
separators | The separator tokens. |
trailing_separator | True if a trailing separator token is allowed. |
|
inline |
Parse a sequence of rule-separator pairs.
rule | The rule to parse at least once. |
separator | The separator token. |
trailing_separator | True if a trailing separator token is allowed. |
|
inlinestatic |
Parse a sequence of rule-separator pairs by calling RULE::check() in a loop.
s | A pointer to the syntax object on which the rule should be executed |
SYNTAX | The type of syntax |
RULE | The class that represents the grammar rule |
separators | The separator tokens |
trailing_sep | True if a trailing separator token is allowed. |
|
inlinestatic |
Parse a sequence of rule-separator pairs by calling RULE::check() in a loop.
s | A pointer to the syntax object on which the rule should be executed |
SYNTAX | The type of syntax |
RULE | The class that represents the grammar rule |
sep | The separator token |
trailing_sep | True if a trailing separator token is allowed. |
Token * Puma::Syntax::locate_token | ( | ) |
Skip all non-core language tokens until the next core-language token is read.
bool Puma::Syntax::look_ahead | ( | int * | token_types, |
unsigned | n = 1 ) |
Look-ahead n core language tokens and check if the n-th token has one of the given types.
token_types | The possible types of the n-th token. |
n | The number of tokens to look-ahead. |
bool Puma::Syntax::look_ahead | ( | int | token_type, |
unsigned | n = 1 ) |
Look-ahead n core language tokens and check if the n-th token has the given type.
token_type | The type of the n-th token. |
n | The number of tokens to look-ahead. |
|
inline |
Look-ahead one core language token.
n | The number of tokens to look-ahead. |
|
inline |
Optional rule parsing.
Always succeeds regardless of the argument.
dummy | Dummy parameter, is not evaluated. |
|
inline |
Parse the given grammar rule.
Saves the current state of the builder, semantic, and token provider objects.
rule | The rule to parse. |
bool Puma::Syntax::parse | ( | int * | token_types | ) |
Parse a token with one of the given types.
token_types | The token types. |
|
inline |
Parse a token with the given type.
token_type | The token type. |
bool Puma::Syntax::parse_block | ( | int | start, |
int | end ) |
Parse all tokens between start and end, including start and end token.
start | The start token type. |
end | The end token type. |
bool Puma::Syntax::parse_curly_block | ( | ) |
Parse all tokens between '{' and '}', including '{' and '}'.
pointcut Puma::Syntax::parse_fct | ( | ) |
Interface for aspects that affect the syntax and parsing process.
bool Puma::Syntax::parse_round_block | ( | ) |
Parse all tokens between '(' and ')', including '(' and ')'.
bool Puma::Syntax::parse_token | ( | int | token_type | ) |
Parse a token with the given type.
token_type | The token type. |
|
inline |
|
inline |
Get the last token that could not be parsed.
|
inline |
Get the token provider from which the parsed tokens are read.
void Puma::Syntax::restore_state | ( | ) |
Restore the saved parser state.
Triggers restoring the syntax and semantic trees to the saved state.
void Puma::Syntax::restore_state | ( | State | state | ) |
Restore the saved parser state to the given state.
Triggers restoring the syntax and semantic trees.
state | The state to which to restore. |
pointcut Puma::Syntax::rule_call | ( | ) |
pointcut Puma::Syntax::rule_check | ( | ) |
pointcut Puma::Syntax::rule_exec | ( | ) |
CTree * Puma::Syntax::run | ( | TokenProvider & | tp | ) |
Start the parse process.
tp | The token provider from where to get the tokens to parse. |
CTree * Puma::Syntax::run | ( | TokenProvider & | tp, |
bool(T::* | rule )() ) |
Start the parse process at a specific grammar rule.
tp | The token provider from where to get the tokens to parse. |
rule | The grammar rule where to start. |
State Puma::Syntax::save_state | ( | ) |
Save the current parser state.
Calls save_state() on the builder, semantic, and token provider objects.
|
inline |
Get the semantic analysis object.
|
inline |
Parse a sequence of the given grammar rule.
rule | The rule to parse at least once. |
|
inline |
Parse a sequence of the given grammar rule.
rule | The rule to parse at least once. |
|
static |
Parse a sequence of the given grammar rule by calling RULE::check() in a loop.
s | A pointer to the syntax object on which the rule should be executed |
SYNTAX | The type of syntax |
RULE | The class that represents the grammar rule |
void Puma::Syntax::set_state | ( | State | state | ) |
Overwrite the parser state with the given state.
state | The new parser state. |
void Puma::Syntax::skip | ( | ) |
Skip the current token.
bool Puma::Syntax::skip | ( | int * | stop_tokens, |
bool | inclusive = true ) |
Skip all tokens until a token with one of the given types is read.
stop_tokens | The types of the token to stop. |
inclusive | If true, the stop token is skipped too. |
bool Puma::Syntax::skip | ( | int | stop_token, |
bool | inclusive = true ) |
Skip all tokens until a token with the given type is read.
stop_token | The type of the token to stop. |
inclusive | If true, the stop token is skipped too. |
void Puma::Syntax::skip_block | ( | int | start, |
int | end, | ||
bool | inclusive = true ) |
Skip all tokens between start and end, including start and end token.
start | The start token type. |
end | The end token type. |
inclusive | If true, the stop token is skipped too. |
void Puma::Syntax::skip_curly_block | ( | ) |
Skip all tokens between '{' and '}', including '{' and '}'.
void Puma::Syntax::skip_round_block | ( | ) |
Skip all tokens between '(' and ')', including '(' and ')'.
|
inlinevirtual |
Top parse rule to be reimplemented for a specific grammar.
Reimplemented in Puma::CSyntax.
TokenProvider* Puma::Syntax::token_provider |
Token provider for getting the tokens to parse.