| LLOOP Index | GSP Language | GSP Library | Framework Classes | Component Classes |
This is the Reference Guide for writing GSP specification files. Examples are provided along all explanations.
A GSP file is written in plain text files which should have the " .gsp " file extension.
Its content divides into two parts :
The symbol definitions. The type of symbols that can be defined are the following:
Symbols which are defined outside the gsp file can be imported into the current gsp file with an appropriate instruction.
A GSP file can be empty. There is also rarely no definition to provide for a symbol, because there is always at least one token needed. Most time some symbols also need to be defined for integration purposes with the final application in the target programming language.
The following types can be defined in the scope of the symbol definitions:
Any symbol definition, i.e. token, non-terminal or option, offers the following common features:
Importing a gsp file allows to include into the current gsp file all the pre-processors, non-terminals or tokens defined in that gsp file. This allows to keep gsp specifications modular, reusable, shareable and readable.
A GSP file is imported using the keyword import and the name of that file without the " .gsp " file extension.
'import' word ';'
Example:
import int;
There are numerous other examples.
A token is declared with the keyword token, a name followed by some options, and finally the token implementation code.
'token' word [ SymbolDef ] [ Include ] [ Alias ] [ 'declare' '{{' Block '}}' ] [ 'implement' '{{' Block '}}' ] ['parse'] '{{' Block '}}' [ [ 'expand' ] '{{' Block '}}' ] [ STestCode ]
The token is is parsed according to the C++ parse code defined for it.
Inheritances can be defined for tokens.
Aliases can be defined for tokens.
Self-tests can be defined for tokens.
C++ includes can be defined for tokens.
A Token is written with C/C++ code enclosed by {{ and }}.
The principle for parsing a token is to read chars one by one from the input as long as a valid char sequence is found for the token.
The input from which to read the input chars is accessible via the following implicit and pre-declared reference :
istream& is
If the read chars are not valid for the token, the parsing can be aborted at any moment by calling the following method:
abort() ;
IMPORTANT: Once a token has been successfully parsed, the input stream pointer shall be re-positioned on the first not-accepted char.
Additional C++ code can be embedded within the token definition using the notations given below. For an example, check the standard block token.
declare {{ ... }} implement {{ ... }}
Please check the standard block token as example.
If the parser should be reversible, the user should also provide the C++ code that allows to re-construct and output the token value (expand code).
The expand code is written with C/C++ code enclosed by {{ and }}.
The expand code is executed when one of the following method is called on the parsed symbol object (generated for a sample token named IdToken):
bool expand() String expand()
The output chars can be written into the following implicit and pre-declared reference :
ostream& os
Example :
An explicit definition of a non-terminal is useful whenever the user wants to make inherit the non-terminal from an application class, which is actually likely to be the most common case. This explains why.
A symbol is declared with the keyword symbol and a name.
('symbol'|'string') word [ SymbolDef ] [ImplementsOption] [ Include ] [ Alias ] [ STestCode ] ';'
It is not mandatory to provide a definition for a non-terminal prior to giving its syntax rules. A non-terminal is implicitly defined if a syntax rule is provided for it.
For convenience, it is possible to declare a non-terminal as a string. In that case, the generated symbol class inherits from the standard String class:
string Name;
is equivalent to
symbol Name extends class universal::String() include "universal__String";
The non-terminal is parsed according to the syntax rule(s) which is (are) mandatory defined for it.
Inheritances can be defined for non-terminals.
Aliases can be defined for non-terminals.
Self-tests can be defined for non-terminals.
C++ includes can be defined for non-terminals.
Check the examples.
LLOOP allows to fully automate the management of command line options without any programming. This is achieved simply by providing a specification for the set of options to recognise, i.e. at least an identifying keyword and a type of expected value, if any, for each option.
It is also possible to define subsets of options within a set of options. This allows to structure hierarchically all options and manage the complexity of highly configurable applications in an organised manner.
A non-terminal symbol can declared as implementing a command line option inside the non-terminal definition, using the keyword implements option followed by the specific data defining the option, enclosed by parenthesis ( ) :
A non-terminal symbol can declared as implementing an option set inside the non-terminal definition, using the keyword implements optionset followed by the specific data defining the option set, enclosed by parenthesis ( ):
An option set shall be specified in an external (not imported) gsp specification file. In this case, the symbol is referred to as an external symbol, i.e. with the name of the gsp file without extension, followed by a dot and the symbol name.
From the specification of each option and option subset, the following is generated:
ImplementsOption ::= 'implements' 'option' '(' quote ',' quote ',' quote ',' quote { ',' quote } { ',' word } ')'
The following respective specific data shall be specified within quotes for options :
Option Data Meaning Label The name describing what the option is about Synopsis A line documenting the usage of the option Documentation The comprehensive documentation of the option Switch(s) The list of keywords which can be used as switch for the option Value(s) The list of expected symbol (token, non-terminal) to be read after the switch. There can be none or several. Even if several values can be specified for an option, in general, there is often only one value associated with it
ImplementsOption ::= ...
'| 'implements' 'optionset' '(' quote ',' quote ',' symbolpath ')'
symbolpath ::= word { '.' word }
The following respective specific data shall be specified within quotes for option sets :
Option Data Meaning Label A name describing what the option set is about Usage Preamble Preamble to the synopsis of all options of the option set Parse Symbol The expected symbol (generally an external non-terminal) used to read the options of the option set
| C++ Parse Code | Reads and checks the passed options and their values. This is of course the base functionality provided by the generator. |
| C++ Formatted Usage Function | Prints out, for each option, the synopsis and the label on the standard output. |
| C++ Formatted Help Function | Prints out, for each option, the synopsis and the comprehensive documentation on the standard output. |
| TCL/TK Entry Forms | Allows entry of options through graphical interface. This feature is detailed in the next paragraphs. |
No syntax rule has to be provided for symbols defined as command line options: the generator builds implicitly the required syntax rules from the specified option data.
The following table summarises the possible rules which are implicitly defined for options and option sets depending on their specification. Note that there are as many alternative rules implicitly defined as there are different possible switches specified for an option.
| Option or Option Set | Rule |
| Option with no value |
option ::= switch |
| Option with value and with switch starting with -- |
option ::= switch '=' value |
| Option with value and with switch NOT starting with -- |
option ::= switch value |
| Option set |
option set ::= symbol |
The definition of a root syntax rule is still required for specifying which alternative options should be parsed and how.
A root syntax rule for parsing command line options would typically look like:
Arguments ::= { Argument } {{ }}
Argument ::= ( option1 | option2 | option3 ...)
A root syntax rule for parsing sub-options would typically look like:
SubArgument ::= ( sub-option1 | sub-option2 | sub-option3 ...)
This advanced functionality is thoroughly used by the generator itself (gspc.exe). It can therefore be illustrated as follows by taking some excerpts of the matching specification and resulting generated code and outputs. Please note that "[...]" stands for information removed for space saving and readability purposes:
Options Specification
[...]
symbol MakeExeOption implements option(
"Executable",
"--make-exe",
"Tells whether to generate a makefile that builds an executable",
"--make-exe") ;
symbol MakeLibOption implements option(
"Library",
"--make-lib",
"Tells whether to generate a makefile that builds a library",
"--make-lib") ;
symbol PlatformOption implements option(
"Platform",
"--platform=<PLATFORM>",
"Gives the name [...]",
PlatformName) ;
[...]
symbol AdvancedMakefileOptionSet implements optionset(
"Advanded Makefile Option Set",
"Advanded Makefile Option Set",
gspc_arguments_make.ArgumentMAK) ;
[...]
|
|
There is a TCL/TK form generated for each option set.
There is a separate page describing the generated TCL/TK entry forms, how to use and how to integrate them into your applications.
LLOOP allows it to create Graphical Front End Applications (GFE) for controlling command line programs, without any programming, based on the generated forms and a configurable template GFE application.
The gsp spec files used to implement the gspc generator provide a complete example for this functionality.
A preprocessor is declared with the keyword preprocessor, a name followed by some options, and finally the preprocessor implementation code.
It is mandatory to define pre-processors prior to any token and non-terminal definition. A Pre-processor is written with C/C++ code enclosed by {{ and }}. The principle of pre-processing consists of filtering
the chars of the input one by one. In this process, each
char is either kept, discarded or replaced. The user has
in charge to set up all the required variables and functions in order to carry out the pre-processing. The pre-processing code is executed for each char of the input. The current read char is stored in the following implicit
and pre-declared variable : The current char can be discarded calling
the following method. If not called, the char is assumed to be
accepted. IMPORTANT: To preserve integrity of error line numbers
reported during parsing, newline chars shall never be discarded
from the input. The reason is that parsing is actually
done on the pre-processed input while errors are reported in the
original input. The base parser object is accessible via the following
implicit and pre-declared reference : Additional C++ code can be embedded within the pre-processor definition using the notations given below. For an example, check the standard define pre-preprocessor.
Writing post-processors within GSP files is currently not supported. Post-processors have to be written and handled manually directly in C++.
The Beginner's Guide and Tutorial provides further information about writing post-processors.
It is possible to specify how many times a symbol should occur at minimum and at maximum. The generated parsers check whether these limits are respected, and if not so, end with an error.
Count limits are defined separately of the symbol definition itself, with an additional statement as follows:
At the end of the parsing, the parser stocklists all instances created for a given symbol. If the number of instances found for that symbol is not within the defined limits, an error is raised.
The gsp spec files used to implement the gspc generator provide a concrete example where this functionality is used to specify how many times a command line options can be defined.
A non-terminal, token or pre-processor can inherit from one of the following type patterns:
There can be multiple inheritances, except for pre-processors.
Inheritance is always implicitly public.
The constructor arguments must be specified within ( )
for constructing the base object when the symbol is created. The user is free to provide whatever string for the constructor arguments as far as the matching generated code is compilable.
If the base class is defined in a C++ name space, the complete
name space path must be specified.
The base class can also be a template class.
Inheritances are specified within the definition of a non-terminal, token or pre-processor, using the keyword extends followed for each inheritance by: name of the inherited type pattern, full qualified type name (i.e. with C++ namespace if required), and finally constructor arguments or template type name.
When a symbol inherits from a class, the generator implicitly looks for a header file with the same name (without extension) as the class name where to find the class definition. If the actual header file name differs from the class name, it must be explicitly included. Example :
is strictly equivalent to:
Here the non-terminal named " WordList "
extends base class " Stack " and takes
" 50 " as constructor argument. Concretely,
it means that a class " WordList " will be generated and inherit from class " Stack ". When a symbol inherits from another, the generator automatically resolves the dependencies related to the generated code of the inherited symbol in the target programming language. Therefore, there is no include to specify nor any namespace or class name.
Note that an alias can not be used in inheritances. The primary symbol name must be used.
Example:
The image token defined in the standard GSP library inherits from symbol ExistingFileToken:
A set of unit tests can be specified for a symbol and are useful for validation and non-regression testing purposes.
The unit tests are part of the definition of the symbol itself and allows the symbol to test itself.
A test is declared at the end of a token definition or a non-terminal definition, with the keyword test and a list of unit test definitions enclosed by {{ }}:
The definition of a unit test consists of a quoted (short) description, followed by a semi-colon, a keyword (pass or fail) indicating whether the test should pass or fail, the input to parse and optionally the expected output when expanding (reverse parsing).
An input and output can be either of the following:
A test input is pre-processed according to the pre-processors defined for the overall project where the symbol is defined or imported. It is possible to deactivate pre-processing by specifying the keyword no-preprocessing after the keyword test.
The generated Parser Class, used to carry out the parsing according to the input gsp spec file(s), defines the testSymbols() function which allows to make all defined or imported symbols test themselves.
Unit tests for a particular symbol can be run individually by calling the test() function defined in the class generated for the symbol.
There are numerous examples of selftests provided. The File URL token contains itself helpful examples.
A list of alias can be defined for tokens and non-terminals. These names can be used in the syntax rules instead of the official declared symbol names.
Defining an alias is useful or even required in these situations:
An alias is defined inside a symbol definition with the keyword alias and a list of names enclosed by quotes and separated from each other with comas.
If you attempt to define a symbol named "int", a "class int" will be
generated for it. Unfortunately, the C++ compiler will complain
that "int" is a reserved language keyword. You
can work around by defining a symbol name which is also
a valid class name, e.g. SignedIntegerToken, and defining a list of alias corresponding to the actual name(s) you would like to use in the syntax rules, here "int":
Check this link for the complete definition of the int token.
There are numerous other examples of alias in the standard GSP library.
For any symbol and preprocessor, it is possible to specify a list of C/C++ header files to be included in the generated symbol class header file.
When a symbol inherits from a class, the generator implicitly looks for a header file with the same base name as the class name where to find the class definition.
The header file must be explicitly included in the following cases:
File inclusion are useful for using types needed for either writing rule reduction code or implementing token parse code or pre/post-processor code.
A file inclusion is declared inside a symbol definition with the keyword include and a list of header file names enclosed by quotes and separated from each other with comas.
Check the example provided earlier on this page.
Existing definitions, like the Standard GSP Library, should be used as much as possible.
It is recommended to define pre/post-processors and tokens
in separate specification files.
Writing pre/post-processors and tokens shall be considered
as an advanced use requiring additional know-how as well as a minimum knowledge in the standard C++ stream API. A syntax rule is specified using the BNF notation. The principle of BNF is also well described in the tutorial.
It is possible to define constants which contain whitespaces, provided the following peculiarities:
A symbol which is defined in an external (not imported) gsp spec file can be referred to with the name of the gsp file without extension (which serves also as C++ namespace), followed by a dot and the symbol name. There is a simple academic example which demonstrates this functionality.
No syntax rule has to be provided for symbols defined as command line options: the generator builds implicitly the required syntax rules from the specified option data. This is detailed earlier on this page. A missing rule will raise an error during generation.
The reduction code is written in C++ code enclosed by {{ }}. During parsing, the reduction code of a syntax rule is executed immediately after an input was successfully parsed according to this rule. The parser object is accessible via the following
implicit and pre-declared reference : The reduction is executed internally by the generated code through the following method (generated for a sample symbol named SIniFile):
Trick: It is not mandatory to provide a reduction code for each alternaltive rule, except for the last one.
The expand code is written in C++ code enclosed by {{ }}. Providing an expand code is optional and only required if the parser should be reversible.
The expand code is executed when one of the following method is called on the parsed symbol object (generated for a sample symbol named SIniFile):
The C++ output stream where to write is accessible via the following
implicit and pre-declared reference : From the reduction code, you can get access both to the parsed constants and to the objects corresponding to the parsed tokens and non-terminals, using the specific short notations detailed hereafter. The code into which these notations are translated ensures that accesses to objects are done in a safe way. An attempt to access an unparsed optional symbol object or a repetitive symbol with an invalid index will result in an error and an appropriate error message. Getting the value of a parsed constant You can get the value of a parsed constant using the char " # " followed by the index of the constant among all constants of the reduced syntax rule. Example:
Returns respectively the first and second constant in the syntax rule.
The returned value is of the following C type:
The interest of using this notation is to refer to a constant within the reduction code independently of the actual constant value, thereby improving maintainability.
To get a C++ non-const object reference, use the char " $ " followed by the index of the token or non-terminal among all tokens and terminals in the reduced syntax rule. Example: Returns respectively a reference to the first and second symbol objects in the syntax rule.
To get a C++ non-const object pointer, use the char " & " followed by the index of the token or non-terminal among all tokens and terminals in the reduced syntax rule. Example: Returns respectively a pointer to the first and second symbol objects in the syntax rule.
Using pointers is not recommended, but is useful to check whether an optional symbol was actually parsed or not. If not parsed, the pointer is null. If a non-terminal/token is located within a repetitive symbol sequence, an index within that sequence must be provided additionally, enclosed by parenthesises. Example: Returns the 50th instance of the symbol in the repetitive sequence.
When there are several nested repetitive sequences, an index must be specified for each of these sequences. For example, let's assume you have the following dummy syntax rule: Here { 'a' ... } is the first repetitive sequence. This sequence includes another one which is { 'b' MyRule }. Examples:
Returns the first MyRule instance in the first repetitive symbol sequence.
Returns the third instance in the 10th occurrence of the first repetitive sequence.
Returns the first instance in the 21th occurrence of the first repetitive sequence.
Getting the number of occurrences of a repetitive symbol To get the number of occurrence of a symbol in a repetitive sequence, use the normal notation as described above, followed by (#). Example:
Returns the number of occurrences of the first repetitive token/non-terminal.
In case of nested repetitive symbol sequences, an index for all intermediate repetitive sequence must be provided. Let's check the following examples basing on the above-mentioned dummy rule MyRule:
Returns the number of occurrences of constant b within the first occurence of the first repetitive sequence.
Returns the number of occurrences of MyRule within the second occurrence of the first repetitive sequence.
Returns the number of occurrences of MyRule within the 21th occurrence of the first repetitive sequence.
Only integer numbers or simple strings (typically variable names) are accepted and can be used as indexes. The generator does not check whether symbol indexes following either " # ", " $ " or " $ " are out-of-range.
Invalid indexes will appear as is in the generated code and will result in a compilation error.
Examples Check the gspc_rc example, which is the gsp spec file used to parse these notations. Check also the other examples. This file is part of the LLOOP Reversible Object-Oriented Parser Generator. Copyright (c) 2005-2006 Michel MEHL, France. All rights reserved. LLOOP is distributed by the company ERSA SaRL.
Pre-processors
'preprocessor' word [ 'extends' 'class' [ Namespace ] word '(' Block ')' ] [ Include ] [ 'declare' '{{' Block '}}' ] [ 'implement' '{{' Block '}}' ] '{{' Block '}}'
char c
discard() ;
LLParser& parser
declare {{ ... }} implement {{ ... }}
Post-processors
Symbol Count Limits
'symbol' word SymbolOccurrence ';'
SymbolOccurrence ::= 'occurs' SymbolOccurrenceAmount
| 'occurs' [ 'at' 'least' SymbolOccurrenceAmount] [ 'at' 'most' SymbolOccurrenceAmount ]
SymbolOccurrenceAmount ::= uint
| 'once'
| 'twice'
| 'thrice'
Symbol Inheritances
SymbolDef ::= 'extends' { SymbolDefExt [ Namespace ] word Ctor_Or_Template }
SymbolDefExt ::= 'class'
| 'token'
| 'symbol'
Ctor_Or_Template ::= '(' Block ')'
| '<' Block '>'
symbol WordList extends class Stack(50) ;
symbol WordList extends class Stack(50) include "Stack.h";
symbol ImageFileToken
extends symbol ExistingFileToken()
alias "image", "imagefile", "existingimage"
...
Symbol Self-tests
STestCode ::= 'test' [ 'no-preprocessing' nil ] '{{' { STestCaseCode } '}}'
STestCaseCode ::= [quote] ':' ('pass'|'fail') STestCaseCodeString [ STestCaseCodeString !':' ]
STestCaseCodeString ::= quote
| '{' block '}'
| urlfile
Symbol Aliases
Alias ::= 'alias' AliasList
AliasList ::= quote [ ',' AliasList ]
token SignedIntegerToken
alias "int"
{{
[ ... ]
}}
Including C++ Header Files
Include ::= 'include' IncludeList
IncludeList ::= quote [ ',' IncludeList ]
| '<' Block '>' [ ',' IncludeList ]
Recommendations
Syntax Rules Definitions
BNF Syntax Rules
Syntax Rule Reduction Code
Parser& parser
reduce()
reduce()
Syntax Rule Expand Code
bool expand()
String expand()
std::ostream& os
Getting parsed symbols values within reduction code
#1
#2
const char*
$1
$2
&1
&2
$1(49)
MyRule ::= { 'a' { 'b' MyRule } }
$1(0, 0)
$1(9, 2)
$1(20, 0)
#1(#)
Returns the number of occurrences of the first repetitive constant.
$1(#)
#1(0, #)
$1(1, #)
$1(20, #)
Copyright (c) 2005-2006 Michel MEHL, Haguenau, France
LLOOP version 1.1