Reorganization of parser, refactor of parse_type( bool* ) and progression of parser docs

Wanted to make parser implementation easier to sift through, so I emphasized alphabetical order more.

Since I couldn't just strip whitespace from typenames I decided to make the parse_type more aware of the typename's components if it was a function signature.
This ofc lead to the dark & damp hell that is parsing typenames.

Also made initial implementation to support parsing decltype within a typename signature..

The test failure for the singleheader is still a thing, these changes have not addressed that.
This commit is contained in:
Edward R. Gonzalez 2023-09-05 01:44:04 -04:00
parent 3868e1e811
commit 3e249d9bc5
10 changed files with 1752 additions and 1380 deletions

View File

@ -24,7 +24,9 @@
"filesystem": "cpp", "filesystem": "cpp",
"format": "cpp", "format": "cpp",
"ratio": "cpp", "ratio": "cpp",
"xstring": "cpp" "xstring": "cpp",
"functional": "cpp",
"vector": "cpp"
}, },
"C_Cpp.intelliSenseEngineFallback": "disabled", "C_Cpp.intelliSenseEngineFallback": "disabled",
"mesonbuild.configureOnOpen": true, "mesonbuild.configureOnOpen": true,

View File

@ -561,6 +561,8 @@ Fields:
```cpp ```cpp
CodeAttributes Attributes; CodeAttributes Attributes;
CodeSpecifiers Specs; CodeSpecifiers Specs;
CodeReturnType ReturnType;
CodeParam Params;
Code ArrExpr; Code ArrExpr;
Code Prev; Code Prev;
Code Next; Code Next;

91
docs/Parser_Algo.md Normal file
View File

@ -0,0 +1,91 @@
# Parser's Algorithim
gencpp uses a hand-written recursive descent parser. Both the lexer and parser handle a full C/C++ file in a single pass.
## Notable implementation background
### Lexer
The lex procedure does the lexical pass of content provided as a `StrC` type.
The tokens are stored (for now) in `gen::Parser::Tokens`.
Fields:
```cpp
Array<Token> Arr;
s32 Idx;
```
What token types are supported can be found in [ETokType.csv](../project/enums/ETokType.csv) you can also find the token types in [ETokType.h](../project/components/gen/etoktype.cpp) , which is the generated enum from the csv file.
Tokens are defined with the struct `gen::Parser::Token`:
Fields:
```cpp
char const* Text;
sptr Length;
TokType Type;
s32 Line;
s32 Column;
bool IsAssign;
```
`IsAssign` is a flag that is set when the token is an assignment operator. Which is used for various purposes:
* Using statment assignment
* Parameter argument default value assignment
* Variable declaration initialization assignment
I plan to replace IsAssign with a general flags field and properly keep track of all operator types instead of abstracting it away to `ETokType::Operator`.
Traversing the tokens is done with the following interface macros:
| Macro | Description |
| --- | --- |
| `currtok_noskip` | Get the current token without skipping whitespace |
| `currtok` | Get the current token, skip any whitespace tokens |
| `prevtok` | Get the previous token (does not skip whitespace) |
| `nexttok` | Get the next token (does not skip whitespace) |
| `eat( Token Type )` | Check to see if the current token is of the given type, if so, advance Token's index to the next token |
| `left` | Get the number of tokens left in the token array |
| `check_noskip` | Check to see if the current token is of the given type, without skipping whitespace |
| `check` | Check to see if the current token is of the given type, skip any whitespace tokens |
### Parser
The parser has a limited user interface, only specific types of definitions or statements are expected to be provided by the user directly when using to construct an AST dynamically (See SOA for example). It however does attempt to provide capability to parse a full C/C++ from production codebases.
Each public user interface procedure has the following format:
```cpp
CodeStruct parse_<definition type>( StrC def )
{
check_parse_args( def );
using namespace Parser;
TokArray toks = lex( def );
if ( toks.Arr == nullptr )
return CodeInvalid;
// Parse the tokens and return a constructed AST using internal procedures
...
}
```
The most top-level parsing procedure used for C/C++ file parsing is `parse_global_body`:
It uses a helper procedure called `parse_global_nspace`.
Each internal procedure will be
## parse_global_nspace
1. Make sure the type provided to the helper function is a `Namespace_Body`, `Global_Body`, `Export_Body`, `Extern_Linkage_body`.
2. If its not a `Global_Body` eat the opening brace for the scope.
3.
## parse_type
This is the worst part of the parser. Because other than actual expression values in C++, typenames are the second worst thing to parse in the langauge.

View File

@ -1,11 +1,12 @@
# Parsing # Parsing
The library features a naive parser tailored for only what the library needs to construct the supported syntax of C++ into its AST. The library features a naive parser tailored for only what the library needs to construct the supported syntax of C++ into its AST.
This parser does not, and should not do the compiler's job. By only supporting this minimal set of features, the parser is kept (so far) around 5000 loc. This parser does not, and should not do the compiler's job. By only supporting this minimal set of features, the parser is kept (so far) around 5000 loc.
You can think of this parser of a frontend parser vs a semantic parser. Its intuitively similar to WYSIWYG. What you precerive as the syntax from the user-side before the compiler gets a hold of it, is what you get. You can think of this parser of a frontend parser vs a semantic parser. Its intuitively similar to WYSIWYG. What you precerive as the syntax from the user-side before the compiler gets a hold of it, is what you get.
The parsing implementation supports the following for the user: User exposed interface:
```cpp ```cpp
CodeClass parse_class ( StrC class_def ); CodeClass parse_class ( StrC class_def );
@ -55,9 +56,9 @@ Any preprocessor definition abuse that changes the syntax of the core language i
Exceptions: Exceptions:
* function signatures are allowed for a preprocessed macro: `neverinline MACRO() { ... }` * function signatures are allowed for a preprocessed macro: `neverinline MACRO() { ... }`
* Disable with: `#define GEN_PARSER_DISABLE_MACRO_FUNCTION_SIGNATURES` * Disable with: `#define GEN_PARSER_DISABLE_MACRO_FUNCTION_SIGNATURES`
* typedefs allow for a preprocessed macro: `typedef MACRO();` * typedefs allow for a preprocessed macro: `typedef MACRO();`
* Disable with: `#define GEN_PARSER_DISABLE_MACRO_TYPEDEF` * Disable with: `#define GEN_PARSER_DISABLE_MACRO_TYPEDEF`
*(See functions `parse_operator_function_or_variable` and `parse_typedef` )* *(See functions `parse_operator_function_or_variable` and `parse_typedef` )*
@ -75,8 +76,6 @@ The lexing and parsing takes shortcuts from whats expected in the standard.
* Assumed to *come before specifiers* (`const`, `constexpr`, `extern`, `static`, etc) for a function * Assumed to *come before specifiers* (`const`, `constexpr`, `extern`, `static`, etc) for a function
* Or in the usual spot for class, structs, (*right after the declaration keyword*) * Or in the usual spot for class, structs, (*right after the declaration keyword*)
* typedefs have attributes with the type (`parse_type`) * typedefs have attributes with the type (`parse_type`)
* As a general rule; if its not available from the upfront constructors, its not available in the parsing constructors.
* *Upfront constructors are not necessarily used in the parsing constructors, this is just a good metric to know what can be parsed.*
* Parsing attributes can be extended to support user defined macros by defining `GEN_DEFINE_ATTRIBUTE_TOKENS` (see `gen.hpp` for the formatting) * Parsing attributes can be extended to support user defined macros by defining `GEN_DEFINE_ATTRIBUTE_TOKENS` (see `gen.hpp` for the formatting)
Empty lines used throughout the file are preserved for formatting purposes during ast serialization. Empty lines used throughout the file are preserved for formatting purposes during ast serialization.

View File

@ -773,7 +773,8 @@ String AST::to_string()
result.append( "typedef "); result.append( "typedef ");
if ( IsFunction ) // Determines if the typedef is a function typename
if ( UnderlyingType->ReturnType )
result.append( UnderlyingType->to_string() ); result.append( UnderlyingType->to_string() );
else else
result.append_fmt( "%S %S", UnderlyingType->to_string(), Name ); result.append_fmt( "%S %S", UnderlyingType->to_string(), Name );
@ -796,21 +797,45 @@ String AST::to_string()
case Typename: case Typename:
{ {
if ( Attributes || Specs ) #if GEN_USE_NEW_TYPENAME_PARSING
if ( ReturnType && Params )
{ {
if ( Attributes ) if ( Attributes )
result.append_fmt( "%S ", Attributes->to_string() ); result.append_fmt( "%S ", Attributes->to_string() );
if ( Specs )
result.append_fmt( "%S %S", Name, Specs->to_string() );
else else
result.append_fmt( "%S", Name ); {
if ( Specs )
result.append_fmt( "%S ( %S ) ( %S ) %S", ReturnType->to_string(), Name, Params->to_string(), Specs->to_string() );
else
result.append_fmt( "%S ( %S ) ( %S )", ReturnType->to_string(), Name, Params->to_string() );
}
break;
} }
else #else
if ( ReturnType && Params )
{ {
result.append_fmt( "%S", Name ); if ( Attributes )
result.append_fmt( "%S ", Attributes->to_string() );
else
{
if ( Specs )
result.append_fmt( "%S %S ( %S ) %S", ReturnType->to_string(), Name, Params->to_string(), Specs->to_string() );
else
result.append_fmt( "%S %S ( %S )", ReturnType->to_string(), Name, Params->to_string() );
}
break;
} }
#endif
if ( Attributes )
result.append_fmt( "%S ", Attributes->to_string() );
if ( Specs )
result.append_fmt( "%S %S", Name, Specs->to_string() );
else
result.append_fmt( "%S", Name );
if ( IsParamPack ) if ( IsParamPack )
result.append("..."); result.append("...");

View File

@ -6,11 +6,6 @@
#include "gen/especifier.hpp" #include "gen/especifier.hpp"
#endif #endif
namespace Parser
{
struct Token;
}
struct AST; struct AST;
struct AST_Body; struct AST_Body;
struct AST_Attributes; struct AST_Attributes;
@ -234,12 +229,15 @@ struct AST
union { union {
struct struct
{ {
AST* InlineCmt; // Class, Constructor, Destructor, Enum, Friend, Functon, Operator, OpCast, Struct, Typedef, Using, Variable union {
AST* Attributes; // Class, Enum, Function, Struct, Typedef, Union, Using, Variable AST* InlineCmt; // Class, Constructor, Destructor, Enum, Friend, Functon, Operator, OpCast, Struct, Typedef, Using, Variable
AST* Specs; // Destructor, Function, Operator, Typename, Variable AST* SpecsFuncSuffix; // Only used with typenames, to store the function suffix if typename is function signature.
};
AST* Attributes; // Class, Enum, Function, Struct, Typedef, Union, Using, Variable
AST* Specs; // Destructor, Function, Operator, Typename, Variable
union { union {
AST* InitializerList; // Constructor AST* InitializerList; // Constructor
AST* ParentType; // Class, Struct AST* ParentType; // Class, Struct, ParentType->Next has a possible list of interfaces.
AST* ReturnType; // Function, Operator AST* ReturnType; // Function, Operator
AST* UnderlyingType; // Enum, Typedef AST* UnderlyingType; // Enum, Typedef
AST* ValueType; // Parameter, Variable AST* ValueType; // Parameter, Variable
@ -249,13 +247,13 @@ struct AST
AST* Params; // Constructor, Function, Operator, Template AST* Params; // Constructor, Function, Operator, Template
}; };
union { union {
AST* ArrExpr; // Typename AST* ArrExpr; // Typename
AST* Body; // Class, Constructr, Destructor, Enum, Function, Namespace, Struct, Union AST* Body; // Class, Constructr, Destructor, Enum, Function, Namespace, Struct, Union
AST* Declaration; // Friend, Template AST* Declaration; // Friend, Template
AST* Value; // Parameter, Variable AST* Value; // Parameter, Variable
}; };
}; };
StringCached Content; // Attributes, Comment, Execution, Include StringCached Content; // Attributes, Comment, Execution, Include
SpecifierT ArrSpecs[AST::ArrSpecs_Cap]; // Specifiers SpecifierT ArrSpecs[AST::ArrSpecs_Cap]; // Specifiers
}; };
union { union {
@ -286,12 +284,15 @@ struct AST_POD
union { union {
struct struct
{ {
AST* InlineCmt; // Class, Constructor, Destructor, Enum, Friend, Functon, Operator, OpCast, Struct, Typedef, Using, Variable union {
AST* Attributes; // Class, Enum, Function, Struct, Typename, Union, Using, Variable AST* InlineCmt; // Class, Constructor, Destructor, Enum, Friend, Functon, Operator, OpCast, Struct, Typedef, Using, Variable
AST* Specs; // Function, Operator, Typename, Variable AST* SpecsFuncSuffix; // Only used with typenames, to store the function suffix if typename is function signature.
};
AST* Attributes; // Class, Enum, Function, Struct, Typename, Union, Using, Variable
AST* Specs; // Function, Operator, Typename, Variable
union { union {
AST* InitializerList; // Constructor AST* InitializerList; // Constructor
AST* ParentType; // Class, Struct AST* ParentType; // Class, Struct, ParentType->Next has a possible list of interfaces.
AST* ReturnType; // Function, Operator AST* ReturnType; // Function, Operator
AST* UnderlyingType; // Enum, Typedef AST* UnderlyingType; // Enum, Typedef
AST* ValueType; // Parameter, Variable AST* ValueType; // Parameter, Variable
@ -301,13 +302,13 @@ struct AST_POD
AST* Params; // Function, Operator, Template AST* Params; // Function, Operator, Template
}; };
union { union {
AST* ArrExpr; // Type Symbol AST* ArrExpr; // Type Symbol
AST* Body; // Class, Constructr, Destructor, Enum, Function, Namespace, Struct, Union AST* Body; // Class, Constructr, Destructor, Enum, Function, Namespace, Struct, Union
AST* Declaration; // Friend, Template AST* Declaration; // Friend, Template
AST* Value; // Parameter, Variable AST* Value; // Parameter, Variable
}; };
}; };
StringCached Content; // Attributes, Comment, Execution, Include StringCached Content; // Attributes, Comment, Execution, Include
SpecifierT ArrSpecs[AST::ArrSpecs_Cap]; // Specifiers SpecifierT ArrSpecs[AST::ArrSpecs_Cap]; // Specifiers
}; };
union { union {

View File

@ -472,11 +472,11 @@ struct AST_Type
char _PAD_[ sizeof(SpecifierT) * AST::ArrSpecs_Cap ]; char _PAD_[ sizeof(SpecifierT) * AST::ArrSpecs_Cap ];
struct struct
{ {
char _PAD_CMT_[ sizeof(AST*) ]; CodeSpecifiers SpecsFuncSuffix; // Only used for function signatures
CodeAttributes Attributes; CodeAttributes Attributes;
CodeSpecifiers Specs; CodeSpecifiers Specs;
CodeType ReturnType; // Only used for function signatures CodeType ReturnType; // Only used for function signatures
CodeParam Params; // Only used for function signatures CodeParam Params; // Only used for function signatures
Code ArrExpr; Code ArrExpr;
}; };
}; };

View File

@ -31,6 +31,7 @@ namespace ESpecifier
Virtual, Virtual,
Const, Const,
Final, Final,
NoExceptions,
Override, Override,
Pure, Pure,
NumSpecifiers NumSpecifiers
@ -67,6 +68,7 @@ namespace ESpecifier
{ sizeof( "virtual" ), "virtual" }, { sizeof( "virtual" ), "virtual" },
{ sizeof( "const" ), "const" }, { sizeof( "const" ), "const" },
{ sizeof( "final" ), "final" }, { sizeof( "final" ), "final" },
{ sizeof( "noexcept" ), "noexcept" },
{ sizeof( "override" ), "override" }, { sizeof( "override" ), "override" },
{ sizeof( "= 0" ), "= 0" }, { sizeof( "= 0" ), "= 0" },
}; };

File diff suppressed because it is too large Load Diff

View File

@ -21,5 +21,6 @@ Volatile, volatile
Virtual, virtual Virtual, virtual
Const, const Const, const
Final, final Final, final
NoExceptions, noexcept
Override, override Override, override
Pure, = 0 Pure, = 0

1 Invalid INVALID
21 Virtual virtual
22 Const const
23 Final final
24 NoExceptions noexcept
25 Override override
26 Pure = 0