WIP: Improvements to parser, updated docs

Trying to get support for typename keyword soon
This commit is contained in:
2023-11-21 21:27:33 -05:00
parent 772db608be
commit f67f9547df
7 changed files with 500 additions and 271 deletions

View File

@ -1,15 +1,16 @@
# Parser's Algorithim
gencpp uses a hand-written recursive descent parser. Both the lexer and parser handle a full C/C++ file in a single pass.
gencpp uses a hand-written recursive descent parser. Both the lexer and parser currently handle a full C/C++ file in a single pass.
## Notable implementation background
### Lexer
The lex procedure does the lexical pass of content provided as a `StrC` type.
The tokens are stored (for now) in `gen::Parser::Tokens`.
The tokens are stored (for now) in `gen::parser::Tokens`.
Fields:
```cpp
Array<Token> Arr;
s32 Idx;
@ -18,23 +19,34 @@ s32 Idx;
What token types are supported can be found in [ETokType.csv](../project/enums/ETokType.csv) you can also find the token types in [ETokType.h](../project/components/gen/etoktype.cpp) , which is the generated enum from the csv file.
Tokens are defined with the struct `gen::Parser::Token`:
Tokens are defined with the struct `gen::parser::Token`:
Fields:
```cpp
char const* Text;
sptr Length;
TokType Type;
s32 Line;
s32 Column;
bool IsAssign;
u32 Flags;
```
`IsAssign` is a flag that is set when the token is an assignment operator. Which is used for various purposes:
Flags is a bitfield made up of TokFlags (Token Flags):
* Using statment assignment
* Parameter argument default value assignment
* Variable declaration initialization assignment
* `TF_Operator` : Any operator token used in expressions
* `TF_Assign`
* Using statment assignment
* Parameter argument default value assignment
* Variable declaration initialization assignment
* `TF_Preprocess` : Related to a preprocessing directive
* `TF_Preprocess_Cond` : A preprocess conditional
* `TF_Attribute` : An attribute token
* `TF_AccessSpecifier` : An accesor operation token
* `TF_Specifier` : One of the specifier tokens
* `TF_EndDefinition` : Can be interpreted as an end definition for a scope.
* `TF_Formatting` : Considered a part of the formatting
* `TF_Literal` : Anything considered a literal by C++.
I plan to replace IsAssign with a general flags field and properly keep track of all operator types instead of abstracting it away to `ETokType::Operator`.
@ -58,7 +70,7 @@ The parser has a limited user interface, only specific types of definitions or s
Each public user interface procedure has the following format:
```cpp
CodeStruct parse_<definition type>( StrC def )
<code type> parse_<definition type>( StrC def )
{
check_parse_args( def );
using namespace Parser;
@ -76,14 +88,168 @@ The most top-level parsing procedure used for C/C++ file parsing is `parse_globa
It uses a helper procedure called `parse_global_nspace`.
Each internal procedure will be
Each internal procedure will have the following format:
## parse_global_nspace
```cpp
internal
<code type> parse_<definition_type>( <empty or contextual params> )
{
push_scope();
...
<code type> result = (<code type>) make_code();
...
Context.pop();
return result;
}
```
Below is an outline of the general alogirithim used for these internal procedures. The intention is provide a basic briefing to aid the user in traversing the actual code definitions. These appear in the same order as they are in the `parser.cpp` file
## `parse_array_decl`
1. Check if its an array declaration with no expression.
1. Consume and return empty array declaration
2. Opening square bracket
3. Consume expression
4. Closing square bracket
5. If adjacent opening bracket
1. Repeat array declaration parse until no brackets remain
## `parse_attributes`
1. Check for standard attribute
2. Check for GNU attribute
3. Check for MSVC attribute
4. Check for a token registered as an attribute
## `parse_class_struct`
## `parse_class_struct_body`
## `parse_comment`
## `parse_compilcated_definition`
## `parse_define`
## `parse_forward_or_definition`
## `parse_function_after_name`
## `parse_function_body`
## `parse_global_nspace`
1. Make sure the type provided to the helper function is a `Namespace_Body`, `Global_Body`, `Export_Body`, `Extern_Linkage_body`.
2. If its not a `Global_Body` eat the opening brace for the scope.
3.
3. `
## `parse_identifier`
## parse_type
## `parse_include`
## `parse_operator_after_ret_type`
## `parse_operator_function_or_variable`
## `parse_pragma`
## `parse_params`
## `parse_preprocess_cond`
## `parse_simple_preprocess`
## `parse_static_assert`
## `parse_template_args`
## `parse_variable_after_name`
## `parse_variable_declaration_list`
## `parse_class`
## `parse_constructor`
## `parse_destructor`
## `parse_enum`
## `parse_export_body`
## `parse_extern_link_body`
## `parse_extern_link`
## `parse_friend`
## `parse_function`
## `parse_namespace`
## `parse_operator`
## `parse_operator_cast`
## `parse_struct`
## `parse_template`
## `parse_type`
## `parse_typedef`
1. Check for export module specifier
2. typedef keyword
3. If its a preprocess macro: Get the macro name
4.
## `parse_union`
1. Check for export module specifier
2. union keyword
3. `parse_attributes`
4. Check for identifier
5. Parse the body (Possible options):
1. Newline
2. Comment
3. Decl_Class
4. Decl_Enum
5. Decl_Struct
6. Decl_Union
7. Preprocess_Define
8. Preprocess_Conditional
9. Preprocess_Macro
10. Preprocess_Pragma
11. Unsupported preprocess directive
12. Variable
6. If its not an inplace definiton: End Statement
## `parse_using`
1. Check for export module specifier
2. using keyword
3. Check to see if its a using namespace
4. Get the identifier
5. If its a regular using declaration:
1. `parse_attributes`
2. `parse_type`
3. `parse_array_decl`
6. End statement
7. Check for inline comment
## `parse_variable`
1. Check for export module specifier
2. `parse_attributes`
3. `parse specifiers`
4. `parse_type`
5. `parse_identifier`
6. `parse_variable_after_name`