mirror of
https://github.com/Ed94/gencpp.git
synced 2025-07-29 16:36:59 -07:00
WIP: Improvements to parser, updated docs
Trying to get support for typename keyword soon
This commit is contained in:
@@ -1,11 +1,3 @@
|
||||
# Forward
|
||||
|
||||
Was never satisfied with how I did the wrap of the management of the AST.
|
||||
For C++, the current design may be as good as it gets for the limitations of the langauge.
|
||||
|
||||
I'll at least try in this issue to brainstorm something simpiler without losing ergonomics.
|
||||
This will also be a good place to document the current design.
|
||||
|
||||
## Current Design
|
||||
|
||||
`AST` is the actual managed node object for the library.
|
||||
@@ -22,10 +14,8 @@ The simpliest being just a type alias.
|
||||
using Code = AST*;
|
||||
```
|
||||
|
||||
This is what the genc library would have to use due to its constraints of a langauge.
|
||||
Anything else and it would either be an unergonomic mess of struct wrapping with a mess of macros & procedures to interface with it.
|
||||
|
||||
Further, to provide intuitive filters on the AST, there are AST types (covered in [AST_Types.md](AST_Types.md)).
|
||||
This is what the genc library would have to use due to its constraints of a langauge.
|
||||
The actual content per type of AST is covered within [AST_Types.md](AST_Types.md).
|
||||
|
||||
These are pure PODS that just have the lay members relevant to the type of AST node they represent.
|
||||
Each of them has a Code type alias specific to it.
|
||||
@@ -35,3 +25,20 @@ Again, the simpliest case for these would be a type alias.
|
||||
```cpp
|
||||
using struct AST_Typedef CodeTypedef;
|
||||
```
|
||||
|
||||
As of November 21st, 2023, the AST has had a strict layout for how its content is laid out.
|
||||
This will be abandoned during its redesign that will occur starting with support for statments & expressions for either execution and type declarations.
|
||||
Having a strict layout is too resctrictive vs allowing each AST type to have maximum control over the layout.
|
||||
|
||||
The redesign will occur after the following todos are addressed:
|
||||
|
||||
* [Improvements Lexer & Token struct#27](https://github.com/Ed94/gencpp/issues/27)
|
||||
* [Generalize AST Flags to a single 4-byte flag#42](https://github.com/Ed94/gencpp/issues/42)
|
||||
* [AST-Code Object Redesign.#38](https://github.com/Ed94/gencpp/issues/38)
|
||||
* [Code-AST Documentation#40](https://github.com/Ed94/gencpp/issues/40)
|
||||
* [AST::debug_str() improvements#33](https://github.com/Ed94/gencpp/issues/33)
|
||||
* [AST::is_equal implemented and works with singleheader-test#31](https://github.com/Ed94/gencpp/issues/31)
|
||||
* [Parser : Add ability to have a parse failure and continue with errors recorded.#35](https://github.com/Ed94/gencpp/issues/35)
|
||||
* [Scanner : Add CodeFile#29](https://github.com/Ed94/gencpp/issues/29)
|
||||
* [Auxiliary : AST visual debugger#36](https://github.com/Ed94/gencpp/issues/36)
|
||||
|
||||
|
@@ -21,6 +21,7 @@ Fields:
|
||||
```cpp
|
||||
Code Front;
|
||||
Code Back;
|
||||
Token* Token;
|
||||
Code Parent;
|
||||
StringCached Name;
|
||||
CodeT Type;
|
||||
@@ -582,20 +583,21 @@ Serialization:
|
||||
## Typedef
|
||||
|
||||
Behave as usual except function or macro typedefs.
|
||||
Those don't use the underlying type field as everything was serialized under the Name field.
|
||||
Those (macros) don't use the underlying type field as everything was serialized under the Name field.
|
||||
|
||||
Fields:
|
||||
|
||||
```cpp
|
||||
CodeComment InlineCmt;
|
||||
Code UnderlyingType;
|
||||
Code Prev;
|
||||
Code Next;
|
||||
Code Parent;
|
||||
StringCached Name;
|
||||
CodeT Type;
|
||||
ModuleFlag ModuleFlags;
|
||||
b32 IsFunction;
|
||||
CodeComment InlineCmt;
|
||||
Code UnderlyingType;
|
||||
Code Prev;
|
||||
Code Next;
|
||||
parse::Token* Tok
|
||||
Code Parent;
|
||||
StringCached Name;
|
||||
CodeT Type;
|
||||
ModuleFlag ModuleFlags;
|
||||
b32 IsFunction;
|
||||
```
|
||||
|
||||
Serialization:
|
||||
@@ -617,6 +619,7 @@ CodeAttributes Attributes;
|
||||
CodeBody Body;
|
||||
Code Prev;
|
||||
Code Next;
|
||||
parser::Token* Tok;
|
||||
Code Parent;
|
||||
StringCached Name;
|
||||
CodeT Type;
|
||||
@@ -642,6 +645,7 @@ CodeAttributes Attributes;
|
||||
CodeType UnderlyingType;
|
||||
Code Prev;
|
||||
Code Next;
|
||||
parser::Token* Tok;
|
||||
Code Parent;
|
||||
StringCached Name;
|
||||
CodeT Type;
|
||||
@@ -660,6 +664,8 @@ Serialization:
|
||||
|
||||
## Variable
|
||||
|
||||
[Algo](./Parser_Algo.md:)
|
||||
|
||||
Fields:
|
||||
|
||||
```cpp
|
||||
@@ -669,8 +675,10 @@ CodeSpecifiers Specs;
|
||||
CodeType ValueType;
|
||||
Code BitfieldSize;
|
||||
Code Value;
|
||||
CodeVar NextVar;
|
||||
Code Prev;
|
||||
Code Next;
|
||||
parser::Token* Tok;
|
||||
Code Parent;
|
||||
StringCached Name;
|
||||
CodeT Type;
|
||||
@@ -681,8 +689,8 @@ Serialization:
|
||||
|
||||
```cpp
|
||||
// Regular
|
||||
<ModuleFlags> <Attributes> <Specs> <ValueType> <Name> = <Value>; <InlineCmt>
|
||||
<ModuleFlags> <Attributes> <Specs> <ValueType> <Name> = <Value>, NextVar ...; <InlineCmt>
|
||||
|
||||
// Bitfield
|
||||
<ModuleFlags> <Attributes> <Specs> <ValueType> <Name> : <BitfieldSize> = <Value>; <InlineCmt>
|
||||
<ModuleFlags> <Attributes> <Specs> <ValueType> <Name> : <BitfieldSize> = <Value>, NextVar ...; <InlineCmt>
|
||||
```
|
||||
|
@@ -1,15 +1,16 @@
|
||||
# Parser's Algorithim
|
||||
|
||||
gencpp uses a hand-written recursive descent parser. Both the lexer and parser handle a full C/C++ file in a single pass.
|
||||
gencpp uses a hand-written recursive descent parser. Both the lexer and parser currently handle a full C/C++ file in a single pass.
|
||||
|
||||
## Notable implementation background
|
||||
|
||||
### Lexer
|
||||
|
||||
The lex procedure does the lexical pass of content provided as a `StrC` type.
|
||||
The tokens are stored (for now) in `gen::Parser::Tokens`.
|
||||
The tokens are stored (for now) in `gen::parser::Tokens`.
|
||||
|
||||
Fields:
|
||||
|
||||
```cpp
|
||||
Array<Token> Arr;
|
||||
s32 Idx;
|
||||
@@ -18,23 +19,34 @@ s32 Idx;
|
||||
|
||||
What token types are supported can be found in [ETokType.csv](../project/enums/ETokType.csv) you can also find the token types in [ETokType.h](../project/components/gen/etoktype.cpp) , which is the generated enum from the csv file.
|
||||
|
||||
Tokens are defined with the struct `gen::Parser::Token`:
|
||||
Tokens are defined with the struct `gen::parser::Token`:
|
||||
|
||||
Fields:
|
||||
|
||||
```cpp
|
||||
char const* Text;
|
||||
sptr Length;
|
||||
TokType Type;
|
||||
s32 Line;
|
||||
s32 Column;
|
||||
bool IsAssign;
|
||||
u32 Flags;
|
||||
```
|
||||
|
||||
`IsAssign` is a flag that is set when the token is an assignment operator. Which is used for various purposes:
|
||||
Flags is a bitfield made up of TokFlags (Token Flags):
|
||||
|
||||
* Using statment assignment
|
||||
* Parameter argument default value assignment
|
||||
* Variable declaration initialization assignment
|
||||
* `TF_Operator` : Any operator token used in expressions
|
||||
* `TF_Assign`
|
||||
* Using statment assignment
|
||||
* Parameter argument default value assignment
|
||||
* Variable declaration initialization assignment
|
||||
* `TF_Preprocess` : Related to a preprocessing directive
|
||||
* `TF_Preprocess_Cond` : A preprocess conditional
|
||||
* `TF_Attribute` : An attribute token
|
||||
* `TF_AccessSpecifier` : An accesor operation token
|
||||
* `TF_Specifier` : One of the specifier tokens
|
||||
* `TF_EndDefinition` : Can be interpreted as an end definition for a scope.
|
||||
* `TF_Formatting` : Considered a part of the formatting
|
||||
* `TF_Literal` : Anything considered a literal by C++.
|
||||
|
||||
I plan to replace IsAssign with a general flags field and properly keep track of all operator types instead of abstracting it away to `ETokType::Operator`.
|
||||
|
||||
@@ -58,7 +70,7 @@ The parser has a limited user interface, only specific types of definitions or s
|
||||
Each public user interface procedure has the following format:
|
||||
|
||||
```cpp
|
||||
CodeStruct parse_<definition type>( StrC def )
|
||||
<code type> parse_<definition type>( StrC def )
|
||||
{
|
||||
check_parse_args( def );
|
||||
using namespace Parser;
|
||||
@@ -76,14 +88,168 @@ The most top-level parsing procedure used for C/C++ file parsing is `parse_globa
|
||||
|
||||
It uses a helper procedure called `parse_global_nspace`.
|
||||
|
||||
Each internal procedure will be
|
||||
Each internal procedure will have the following format:
|
||||
|
||||
## parse_global_nspace
|
||||
```cpp
|
||||
internal
|
||||
<code type> parse_<definition_type>( <empty or contextual params> )
|
||||
{
|
||||
push_scope();
|
||||
|
||||
...
|
||||
|
||||
<code type> result = (<code type>) make_code();
|
||||
...
|
||||
|
||||
Context.pop();
|
||||
return result;
|
||||
}
|
||||
```
|
||||
|
||||
Below is an outline of the general alogirithim used for these internal procedures. The intention is provide a basic briefing to aid the user in traversing the actual code definitions. These appear in the same order as they are in the `parser.cpp` file
|
||||
|
||||
## `parse_array_decl`
|
||||
|
||||
1. Check if its an array declaration with no expression.
|
||||
1. Consume and return empty array declaration
|
||||
2. Opening square bracket
|
||||
3. Consume expression
|
||||
4. Closing square bracket
|
||||
5. If adjacent opening bracket
|
||||
1. Repeat array declaration parse until no brackets remain
|
||||
|
||||
## `parse_attributes`
|
||||
|
||||
1. Check for standard attribute
|
||||
2. Check for GNU attribute
|
||||
3. Check for MSVC attribute
|
||||
4. Check for a token registered as an attribute
|
||||
|
||||
## `parse_class_struct`
|
||||
|
||||
|
||||
|
||||
## `parse_class_struct_body`
|
||||
|
||||
## `parse_comment`
|
||||
|
||||
## `parse_compilcated_definition`
|
||||
|
||||
## `parse_define`
|
||||
|
||||
## `parse_forward_or_definition`
|
||||
|
||||
## `parse_function_after_name`
|
||||
|
||||
## `parse_function_body`
|
||||
|
||||
## `parse_global_nspace`
|
||||
|
||||
1. Make sure the type provided to the helper function is a `Namespace_Body`, `Global_Body`, `Export_Body`, `Extern_Linkage_body`.
|
||||
2. If its not a `Global_Body` eat the opening brace for the scope.
|
||||
3.
|
||||
3. `
|
||||
|
||||
## `parse_identifier`
|
||||
|
||||
## parse_type
|
||||
## `parse_include`
|
||||
|
||||
## `parse_operator_after_ret_type`
|
||||
|
||||
## `parse_operator_function_or_variable`
|
||||
|
||||
## `parse_pragma`
|
||||
|
||||
## `parse_params`
|
||||
|
||||
## `parse_preprocess_cond`
|
||||
|
||||
## `parse_simple_preprocess`
|
||||
|
||||
## `parse_static_assert`
|
||||
|
||||
## `parse_template_args`
|
||||
|
||||
## `parse_variable_after_name`
|
||||
|
||||
## `parse_variable_declaration_list`
|
||||
|
||||
## `parse_class`
|
||||
|
||||
## `parse_constructor`
|
||||
|
||||
## `parse_destructor`
|
||||
|
||||
## `parse_enum`
|
||||
|
||||
## `parse_export_body`
|
||||
|
||||
## `parse_extern_link_body`
|
||||
|
||||
## `parse_extern_link`
|
||||
|
||||
## `parse_friend`
|
||||
|
||||
## `parse_function`
|
||||
|
||||
## `parse_namespace`
|
||||
|
||||
## `parse_operator`
|
||||
|
||||
## `parse_operator_cast`
|
||||
|
||||
## `parse_struct`
|
||||
|
||||
## `parse_template`
|
||||
|
||||
## `parse_type`
|
||||
|
||||
## `parse_typedef`
|
||||
|
||||
1. Check for export module specifier
|
||||
2. typedef keyword
|
||||
3. If its a preprocess macro: Get the macro name
|
||||
4.
|
||||
|
||||
## `parse_union`
|
||||
|
||||
1. Check for export module specifier
|
||||
2. union keyword
|
||||
3. `parse_attributes`
|
||||
4. Check for identifier
|
||||
5. Parse the body (Possible options):
|
||||
1. Newline
|
||||
2. Comment
|
||||
3. Decl_Class
|
||||
4. Decl_Enum
|
||||
5. Decl_Struct
|
||||
6. Decl_Union
|
||||
7. Preprocess_Define
|
||||
8. Preprocess_Conditional
|
||||
9. Preprocess_Macro
|
||||
10. Preprocess_Pragma
|
||||
11. Unsupported preprocess directive
|
||||
12. Variable
|
||||
6. If its not an inplace definiton: End Statement
|
||||
|
||||
## `parse_using`
|
||||
|
||||
1. Check for export module specifier
|
||||
2. using keyword
|
||||
3. Check to see if its a using namespace
|
||||
4. Get the identifier
|
||||
5. If its a regular using declaration:
|
||||
1. `parse_attributes`
|
||||
2. `parse_type`
|
||||
3. `parse_array_decl`
|
||||
6. End statement
|
||||
7. Check for inline comment
|
||||
|
||||
## `parse_variable`
|
||||
|
||||
1. Check for export module specifier
|
||||
2. `parse_attributes`
|
||||
3. `parse specifiers`
|
||||
4. `parse_type`
|
||||
5. `parse_identifier`
|
||||
6. `parse_variable_after_name`
|
||||
|
||||
|
@@ -65,7 +65,7 @@ As mentioned in root readme, the user is provided Code objects by calling the co
|
||||
The AST is managed by the library and provided to the user via its interface.
|
||||
However, the user may specifiy memory configuration.
|
||||
|
||||
Data layout of AST struct:
|
||||
Data layout of AST struct (Subject to heavily change with upcoming redesign):
|
||||
|
||||
```cpp
|
||||
union {
|
||||
|
Reference in New Issue
Block a user