What token types are supported can be found in [ETokType.csv](../project/enums/ETokType.csv) you can also find the token types in [ETokType.h](../project/components/gen/etoktype.cpp) , which is the generated enum from the csv file.
I plan to replace IsAssign with a general flags field and properly keep track of all operator types instead of abstracting it away to `ETokType::Operator`.
Traversing the tokens is done with the following interface macros:
| Macro | Description |
| --- | --- |
| `currtok_noskip` | Get the current token without skipping whitespace |
| `currtok` | Get the current token, skip any whitespace tokens |
| `prevtok` | Get the previous token (does not skip whitespace) |
| `nexttok` | Get the next token (does not skip whitespace) |
| `eat( Token Type )` | Check to see if the current token is of the given type, if so, advance Token's index to the next token |
| `left` | Get the number of tokens left in the token array |
| `check_noskip` | Check to see if the current token is of the given type, without skipping whitespace |
| `check` | Check to see if the current token is of the given type, skip any whitespace tokens |
The parser has a limited user interface, only specific types of definitions or statements are expected to be provided by the user directly when using to construct an AST dynamically (See SOA for example). It however does attempt to provide capability to parse a full C/C++ from production codebases.
Below is an outline of the general alogirithim used for these internal procedures. The intention is to provide a basic briefing to aid the user in traversing the actual code definitions. These appear in the same order as they are in the `parser.cpp` file
***NOTE: This is still heavily in an alpha state. A large swaph of this can change, make sure these docs are up to date before considering them 1:1 with the repo commit your considering.***
This is going to get heavily changed down the line to have a more broken down "identifier expression" so that the qualifier, template args, etc, can be distinguished between the targeted identifier.
The function can parse all of them, however the AST node compresses them all into a string.
This is needed as a operator defintion is not easily resolvable early on, as such this function handles resolving a operator after its been made ceratin that the type of declaration or definition is indeed for a operator signature.
By the point this function is called the following are known : export module flag, attributes, specifiers, return type
1. If there is any qualifiers for the operator, parse them
2. Consume operator keyword
3. Determine the operator type (This will be offloaded to the lexer moreso than how it is now) & consume
4.`parse_params`
5. If there is no parameters this is operator is a member of pointer if its symbols is a *.
6. Parse postfix specifiers
7. If there is a opening curly brace, `parse function_body`
8. Otherwise: consume end statement, check for inline comment.
When this function is called, attribute and specifiers may have been resolved, however what comes next can still be either an operator, function, or varaible.
1. Check for preprocessor macro, if there is one : `parse_simple_preprocess`
2.`parse_type` (Does the bulk of the work)
3. Begin lookahead to see if we get qualifiers or we eventually find the operator declaration
4. If we find an operator keyword : `parse_operator_after_ret_type`
5. otherwise :
1.`parse_identifier`
2. If we se a opening parenthesis (capture start), its a function : `parse_function_after_name`
There is still decent room for improvement in this setup. Right now the entire macro's relevant tokens are shoved into an untyped AST. It would be better to store it instead in an `AST_Macro` node instead down the line.
1. Consume the macro token
2. Check for an opening curly brace
1. Consume opening curly brace
2. Until the closing curly is encountered consume all tokens.
This is needed as a variable defintion is not easily resolvable early on, it takes a long evaluation period before its known that the declaration or definition is a variable. As such this function handles resolving a variable.
By the point this function is called the following are known : export module flag, attributes, specifiers, value type, name
1. If its an assignment, parse the assignment expression (currently to an untyped string)
2. If its an opening curly brace, parse the expression within (currnelty to an untyped stirng).
1. Consume the closing curly brace
3. If its a `:`, we're dealing with bitfield definition:
1. Consume the assign classifier
2. Consume the expression (currently to an untyped string)
4. If a comma is encountered : `parse_variable declaration_list`
This function's implementation is awful and not done correctly. It will most likely be overhauled in the future as I plan to segement the AST_Type into several AST varaints along with sub-types to help produce robust type expressions.
Hopefully I won't need to make authentic type expressions as I was hopeing to avoid that...
### Current Algorithim
Anything that is in the qualifier capture of the function typename is treated as an expression abstracted as an untyped string
1.`parse_attributes`
2. Parse specifiers
3. This is where things get ugly for each of these depend on what the next token is.
1. If its an in-place definition of a class, enum, struct, or union:
2. If its a decltype (Not supported yet but draft impl there)
3. If its a compound native type expression (unsigned, char, short, long, int, float, dobule, etc )
4. Ends up being a regular type alias of an identifier
4. Parse specifiers (postfix)
5. We need to now look ahead to see If we're dealing with a function typename
6. If wer're dealing with a function typename:
1. Shove the specifiers, and identifier code we have so far into a return type typename's Name (untyped string)
1. Reset the specifiers code for the top-level typeanme
2. Check to see if the next token is an identifier:
1.`parse_identifier`
3. Check to see if the next token is capture start and is not the last capture ("qualifier capture"):
1. Consume `(`
2. Consume expresssion between capture
3. Consume `)`
4.`parse_params`
5. Parse postfix specifiers
7. Check for varaidic argument (param pack) token:
1. Consume varadic argument token
### WIP - Alternative Algorithim
Currently wrapped up via macro: `GEN_USE_NEW_TYPENAME_PARSING`
Anything that is in the qualifier capture of the function typename is treated as an expression abstracted as an untyped string
1.`parse_attributes`
2. Parse specifiers (prefix)
3. This is where things get ugly for each of these depend on what the next token is.
1. If its an in-place definition of a class, enum, struct, or union:
2. If its a decltype (Not supported yet but draft impl there)
3. If its a compound native type expression (unsigned, char, short, long, int, float, dobule, etc )
4. Ends up being a regular type alias of an identifier
4. Parse specifiers (postfix)
1. If any specifiers are found populate specifiers code with them.
5. We need to now look ahead to see If we're dealing with a function typename
6. If wer're dealing with a function typename:
1. Shove the specifiers, and identifier code we have so far into a return type typename's Name (untyped string)
1. Reset the specifiers code for the top-level typename
2. Check to see if the next token is an identifier:
1.`parse_identifier`
3. Check to see if the next token is capture start and is not the last capture ("qualifier capture"):
1. Consume `(`
2. Parse binding specifiers
3.`parse_identifier`
4.`parse_parameters` -> params_nested
5. Consume `)`
6. Construct a nested function typename definition for the qualifier `Name`
4.`parse_params` - > params
5. Parse postfix specifiers
7. Check for varaidic argument (param pack) token:
1. Consume varadic argument token
### **Later: Algorithim based on typename expressions**