Class CodeGenerator
- Direct Known Subclasses:
CppCodeGenerator
,CSharpCodeGenerator
,DiagnosticCodeGenerator
,DocBookCodeGenerator
,HTMLCodeGenerator
,JavaCodeGenerator
,PythonCodeGenerator
A CodeGenerator knows about a Grammar data structure and a grammar analyzer. The Grammar is walked to generate the appropriate code for both a parser and lexer (if present). This interface may change slightly so that the lexer is itself living inside of a Grammar object (in which case, this class generates only one recognizer). The main method to call is gen(), which initiates all code gen.
The interaction of the code generator with the analyzer is simple: each subrule block calls deterministic() before generating code for the block. Method deterministic() sets lookahead caches in each Alternative object. Technically, a code generator doesn't need the grammar analyzer if all lookahead analysis is done at runtime, but this would result in a slower parser.
This class provides a set of support utilities to handle argument list parsing and so on.
- Version:
- 2.00a
- Author:
- Terence Parr, John Lilley
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected LLkGrammarAnalyzer
The LLk analyzerprotected Tool
protected DefineGrammarSymbols
The grammar behaviorprotected static final int
If there are more than 8 long words to init in a bitset, try to optimize it; e.g., detect runs of -1L and 0L.protected Vector
List of all bitsets that must be dumped.protected int
This is a hint for the language-specific code generator.protected CharFormatter
Object used to format characters in the target language.protected PrintWriter
Current output Streamprotected boolean
Use option "codeGenDebug" to generate debugging outputprotected static final int
protected static final int
Default values for code-generation thresholdsprotected Grammar
The grammar for which we generate codeprotected int
This is a hint for the language-specific code generator.protected int
Current tab indentation for code outputstatic String
static String
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected void
Output a String to the currentOutput stream.protected void
Print an action without leading tabs, attempting to preserve the current indentation level for multi-line actions Ignored if string is null.protected void
Output a String followed by newline, to the currentOutput stream.static String
static boolean
elementsAreRange
(int[] elems) Test if a set element array represents a contiguous range.static String
protected String
Get the identifier portion of an argument-action token.protected String
extractIdOfAction
(String s, int line, int column) Get the identifier portion of an argument-action.protected String
Get the type string out of an argument-action token.protected String
extractTypeOfAction
(String s, int line, int column) Get the type portion of an argument-action.abstract void
gen()
Generate the code for all grammarsabstract void
gen
(ActionElement action) Generate code for the given grammar element.abstract void
gen
(AlternativeBlock blk) Generate code for the given grammar element.abstract void
gen
(BlockEndElement end) Generate code for the given grammar element.abstract void
gen
(CharLiteralElement atom) Generate code for the given grammar element.abstract void
Generate code for the given grammar element.abstract void
gen
(LexerGrammar g) Generate the code for a parserabstract void
gen
(OneOrMoreBlock blk) Generate code for the given grammar element.abstract void
gen
(ParserGrammar g) Generate the code for a parserabstract void
gen
(RuleRefElement rr) Generate code for the given grammar element.abstract void
gen
(StringLiteralElement atom) Generate code for the given grammar element.abstract void
Generate code for the given grammar element.abstract void
gen
(TokenRefElement atom) Generate code for the given grammar element.abstract void
gen
(TreeElement t) Generate code for the given grammar element.abstract void
Generate the code for a parserabstract void
gen
(WildcardElement wc) Generate code for the given grammar element.abstract void
gen
(ZeroOrMoreBlock blk) Generate code for the given grammar element.protected void
Generate the token types as a text file for persistence across shared lexer/parserabstract String
Get a string for an expression to generate creation of an AST subtree.abstract String
getASTCreateString
(GrammarAtom atom, String str) Get a string for an expression to generate creating of an AST nodeprotected String
getBitsetName
(int index) Given the index of a bitset in the bitset list, generate a unique name.getFIRSTBitSet
(String ruleName, int k) getFOLLOWBitSet
(String ruleName, int k) abstract String
mapTreeId
(String id, ActionTransInfo tInfo) Map an identifier to it's corresponding tree-node variable.protected int
Add a bitset to the list of bitsets to be generated.protected void
Output tab indent followed by a String, to the currentOutput stream.protected void
Print an action with leading tabs, attempting to preserve the current indentation level for multi-line actions Ignored if string is null.protected void
Output tab indent followed by a String followed by newline, to the currentOutput stream.protected void
Output the current tab indentation.protected abstract String
processActionForSpecialSymbols
(String actionStr, int line, RuleBlock currentRule, ActionTransInfo tInfo) Lexically process $ and # references within the action.Process a string for an simple expression for use in xx/action.g it is used to cast simple tokens/references to the right type for the generated language.protected String
Remove the assignment portion of a declaration, if any.static String
void
setAnalyzer
(LLkGrammarAnalyzer analyzer_) void
setBehavior
(DefineGrammarSymbols behavior_) protected void
Set a grammar for the code generator to usevoid
-
Field Details
-
antlrTool
-
tabs
protected int tabsCurrent tab indentation for code output -
currentOutput
Current output Stream -
grammar
The grammar for which we generate code -
bitsetsUsed
List of all bitsets that must be dumped. These are Vectors of BitSet. -
behavior
The grammar behavior -
analyzer
The LLk analyzer -
charFormatter
Object used to format characters in the target language. subclass must initialize this to the language-specific formatter -
DEBUG_CODE_GENERATOR
protected boolean DEBUG_CODE_GENERATORUse option "codeGenDebug" to generate debugging output -
DEFAULT_MAKE_SWITCH_THRESHOLD
protected static final int DEFAULT_MAKE_SWITCH_THRESHOLDDefault values for code-generation thresholds- See Also:
-
DEFAULT_BITSET_TEST_THRESHOLD
protected static final int DEFAULT_BITSET_TEST_THRESHOLD- See Also:
-
BITSET_OPTIMIZE_INIT_THRESHOLD
protected static final int BITSET_OPTIMIZE_INIT_THRESHOLDIf there are more than 8 long words to init in a bitset, try to optimize it; e.g., detect runs of -1L and 0L.- See Also:
-
makeSwitchThreshold
protected int makeSwitchThresholdThis is a hint for the language-specific code generator. A switch() or language-specific equivalent will be generated instead of a series of if/else statements for blocks with number of alternates greater than or equal to this number of non-predicated LL(1) alternates. This is modified by the grammar option "codeGenMakeSwitchThreshold" -
bitsetTestThreshold
protected int bitsetTestThresholdThis is a hint for the language-specific code generator. A bitset membership test will be generated instead of an ORed series of LA(k) comparisions for lookahead sets with degree greater than or equal to this value. This is modified by the grammar option "codeGenBitsetTestThreshold" -
TokenTypesFileSuffix
-
TokenTypesFileExt
-
-
Constructor Details
-
CodeGenerator
public CodeGenerator()Construct code generator base class
-
-
Method Details
-
_print
Output a String to the currentOutput stream. Ignored if string is null.- Parameters:
s
- The string to output
-
_printAction
Print an action without leading tabs, attempting to preserve the current indentation level for multi-line actions Ignored if string is null.- Parameters:
s
- The action string to output
-
_println
Output a String followed by newline, to the currentOutput stream. Ignored if string is null.- Parameters:
s
- The string to output
-
elementsAreRange
public static boolean elementsAreRange(int[] elems) Test if a set element array represents a contiguous range.- Parameters:
elems
- The array of elements representing the set, usually from BitSet.toArray().- Returns:
- true if the elements are a contiguous range (with two or more).
-
extractIdOfAction
Get the identifier portion of an argument-action token. The ID of an action is assumed to be a trailing identifier. Specific code-generators may want to override this if the language has unusual declaration syntax.- Parameters:
t
- The action token- Returns:
- A string containing the text of the identifier
-
extractIdOfAction
Get the identifier portion of an argument-action. The ID of an action is assumed to be a trailing identifier. Specific code-generators may want to override this if the language has unusual declaration syntax.- Parameters:
s
- The action textline
- Line used for error reporting.column
- Line used for error reporting.- Returns:
- A string containing the text of the identifier
-
extractTypeOfAction
Get the type string out of an argument-action token. The type of an action is assumed to precede a trailing identifier Specific code-generators may want to override this if the language has unusual declaration syntax.- Parameters:
t
- The action token- Returns:
- A string containing the text of the type
-
extractTypeOfAction
Get the type portion of an argument-action. The type of an action is assumed to precede a trailing identifier Specific code-generators may want to override this if the language has unusual declaration syntax.- Parameters:
s
- The action textline
- Line used for error reporting.- Returns:
- A string containing the text of the type
-
gen
public abstract void gen()Generate the code for all grammars -
gen
Generate code for the given grammar element.- Parameters:
action
- The {...} action to generate
-
gen
Generate code for the given grammar element.- Parameters:
blk
- The "x|y|z|..." block to generate
-
gen
Generate code for the given grammar element.- Parameters:
end
- The block-end element to generate. Block-end elements are synthesized by the grammar parser to represent the end of a block.
-
gen
Generate code for the given grammar element.- Parameters:
atom
- The character literal reference to generate
-
gen
Generate code for the given grammar element.- Parameters:
r
- The character-range reference to generate
-
gen
Generate the code for a parser- Throws:
IOException
-
gen
Generate code for the given grammar element.- Parameters:
blk
- The (...)+ block to generate
-
gen
Generate the code for a parser- Throws:
IOException
-
gen
Generate code for the given grammar element.- Parameters:
rr
- The rule-reference to generate
-
gen
Generate code for the given grammar element.- Parameters:
atom
- The string-literal reference to generate
-
gen
Generate code for the given grammar element.- Parameters:
r
- The token-range reference to generate
-
gen
Generate code for the given grammar element.- Parameters:
atom
- The token-reference to generate
-
gen
Generate code for the given grammar element.- Parameters:
blk
- The tree to generate code for.
-
gen
Generate the code for a parser- Throws:
IOException
-
gen
Generate code for the given grammar element.- Parameters:
wc
- The wildcard element to generate
-
gen
Generate code for the given grammar element.- Parameters:
blk
- The (...)* block to generate
-
genTokenInterchange
Generate the token types as a text file for persistence across shared lexer/parser- Throws:
IOException
-
processStringForASTConstructor
Process a string for an simple expression for use in xx/action.g it is used to cast simple tokens/references to the right type for the generated language.- Parameters:
str
- A String.
-
getASTCreateString
Get a string for an expression to generate creation of an AST subtree.- Parameters:
v
- A Vector of String, where each element is an expression in the target language yielding an AST node.
-
getASTCreateString
Get a string for an expression to generate creating of an AST node- Parameters:
str
- The text of the arguments to the AST construction
-
getBitsetName
Given the index of a bitset in the bitset list, generate a unique name. Specific code-generators may want to override this if the language does not allow '_' or numerals in identifiers.- Parameters:
index
- The index of the bitset in the bitset list.
-
encodeLexerRuleName
-
decodeLexerRuleName
-
mapTreeId
Map an identifier to it's corresponding tree-node variable. This is context-sensitive, depending on the rule and alternative being generated- Parameters:
id
- The identifier name to mapforInput
- true if the input tree node variable is to be returned, otherwise the output variable is returned.- Returns:
- The mapped id (which may be the same as the input), or null if the mapping is invalid due to duplicates
-
markBitsetForGen
Add a bitset to the list of bitsets to be generated. if the bitset is already in the list, ignore the request. Always adds the bitset to the end of the list, so the caller can rely on the position of bitsets in the list. The returned position can be used to format the bitset name, since it is invariant.- Parameters:
p
- Bit set to mark for code generationforParser
- true if the bitset is used for the parser, false for the lexer- Returns:
- The position of the bitset in the list.
-
print
Output tab indent followed by a String, to the currentOutput stream. Ignored if string is null.- Parameters:
s
- The string to output.
-
printAction
Print an action with leading tabs, attempting to preserve the current indentation level for multi-line actions Ignored if string is null.- Parameters:
s
- The action string to output
-
println
Output tab indent followed by a String followed by newline, to the currentOutput stream. Ignored if string is null.- Parameters:
s
- The string to output
-
printTabs
protected void printTabs()Output the current tab indentation. This outputs the number of tabs indicated by the "tabs" variable to the currentOutput stream. -
processActionForSpecialSymbols
protected abstract String processActionForSpecialSymbols(String actionStr, int line, RuleBlock currentRule, ActionTransInfo tInfo) Lexically process $ and # references within the action. This will replace #id and #(...) with the appropriate function calls and/or variables etc... -
getFOLLOWBitSet
-
getFIRSTBitSet
-
removeAssignmentFromDeclaration
Remove the assignment portion of a declaration, if any.- Parameters:
d
- the declaration- Returns:
- the declaration without any assignment portion
-
reverseLexerRuleName
-
setAnalyzer
-
setBehavior
-
setGrammar
Set a grammar for the code generator to use -
setTool
-