add gencpp_c11 into thirdparty

This commit is contained in:
2025-02-01 14:40:32 -05:00
parent a04ce430ee
commit 967305e4b5
8 changed files with 31371 additions and 0 deletions
+59
View File
@@ -0,0 +1,59 @@
## Navigation
[Top](../Readme.md)
<- [docs - General](Readme.md)
## Current Design
`AST` is the actual managed node object for the library.
Its raw and really not meant to be used directly.
All user interaction must be with its pointer so the type they deal with is `AST*`.
In order to abstract away constant use of `AST*` its wrapped in a Code type which can be either:
When its the [C generated variant of the library](../gen_c_library/)
```c
typedef AST* Code;
typedef AST_<name>* Code<name>;
...
```
**or**
For C++:
```cpp
struct Code {
AST* ast;
};
struct Code<name> {
...
AST_<name>* ast;
};
```
The full definitions of all asts are within:
* [`ast.hpp`](../base/components/ast.hpp)
* [`ast_types.hpp`](../base/components/ast_types.hpp)
* [`code_types.hpp`](../base/components/code_types.hpp)
The C/C++ interface procedures are located with `ast.hpp` (for the Code type), and `code_types.hpp` for all others.
## Serialization
All code types can either serialize using a function of the pattern:
```c
StrBuilder <prefix>_to_strbuilder(Code code);
// or
<prefix>_to_strbuilder(Code code, StrBuilder& result);
```
Where the first generates strings allocated using Allocator_StringArena and the other appends an existing strings with their backed allocator.
Serialization of for the AST is defined for `Code` in [`ast.cpp`](../base/components/ast.cpp) with `code_to_strbuilder_ptr` & `code_to_strbuilder`.
Serializtion for the rest of the code types is within [`code_serialization.cpp`](../base/components/code_serialization.cpp).
Gencpp's serialization does not provide coherent formatting of the code. The user should use a formatter after serializing.
+788
View File
@@ -0,0 +1,788 @@
## Navigation
[Top](../Readme.md)
<- [docs - General](Readme.md)
# AST Types Documentation
While the Readme for docs covers the data layout per AST, this will focus on the AST types available, and their nuances.
## Body
These are containers representing a scope body of a definition that can be of the following `CodeType` type:
* Class_Body
* Enum_Body
* Export_Body
* Extern_Linkage_Body
* Function_Body
* Global_Body
* Namespace_Body
* Struct_Body
* Union_Body
Fields:
```cpp
StrCached Name;
Code Front;
Code Back;
Token* Tok;
Code Parent;
CodeType Type;
s32 NumEntries;
```
The `Front` member represents the start of the link list and `Back` the end.
NumEntries is the number of entries in the body.
Parent should have a compatible CodeType type for the type of definition used.
Serialization:
Will output only the entries, the braces are handled by the parent.
```cpp
<Front>
...
<Back>
```
## Attributes
Represent standard or vendor specific C/C++ attributes.
Fields:
```cpp
StrCached Content;
StrCached Name;
Code Prev;
Code Next;
Token* Tok;
Code Parent;
CodeType Type;
```
Serialization:
```cpp
<Content>
```
While the parser supports the `__declspec` and `__attribute__` syntax, the upfront constructor ( def_attributes ) must have the user specify the entire attribute, including the `[[]]`, `__declspec` or `__attribute__` parts.
## Comment
Stores a comment.
Fields:
```cpp
StrCached Content;
StrCached Name;
Code Prev;
Code Next;
Token* Tok;
Code Parent;
CodeType Type;
```
Serialization:
```cpp
<Content>
```
The parser will preserve comments found if residing with a body or in accepted inline-to-definition locations.
Otherwise they will be skipped by the TokArray::__eat and TokArray::current( skip foramtting enabled ) functions.
The upfront constructor: `def_comment` expects to recieve a comment without the `//` or `/* */` parts. It will add them during construction.
## Class & Struct
Fields:
```cpp
CodeComment InlineCmt; // Only supported by forward declarations
CodeAttributes Attributes;
CodeType ParentType;
CodeBody Body;
StrCached Name;
CodeType Prev;
CodeType Next;
Token* Tok;
Code Parent;
CodeType Type;
ModuleFlag ModuleFlags;
AccessSpec ParentAccess;
```
Serialization:
```cpp
// Class_Fwd
<ModuleFlags> <class/struct> <Name>; <InlineCmt>
// Class
<ModuleFlags> <class/struct> <Attributes> <Name> : <ParentAccess> <ParentType>, public <ParentType->Next>, ... <InlineCmt>
{
<Body>
};
```
You'll notice that only one parent type is supported only with parent access. This library only supports single inheritance, the rest are assumed to be interfaces and are given public acess specifiers.
## Constructor
Fields:
```cpp
CodeComment InlineCmt; // Only supported by forward declarations
Code InitializerList;
CodeParams Params;
Code Body;
StrCached Name;
Code Prev;
Code Next;
Token* Tok;
Code Parent;
CodeType Type;
```
Serialization:
```cpp
// Constructor_Fwd
<Specs> <Parent->Name>( <Params> ); <InlineCmt>
// Constructor
<Specs> <Parent->Name>( <Params> ) <InlineCmt>
: <InitializerList>
{
<Body>
}
// Constructor Source Implementation
<Specs> <Parent>::~<Parent->Name>( <Params> ) <Specs>
{
<Body>
}
```
## Define
Represents a preprocessor define
Fields:
```cpp
CodeDefineParams Params;
Code Body;
StrCached Name;
Code Prev;
Code Next;
Token* Tok;
Code Parent;
CodeType Type;
```
Serialization:
```cpp
#define <Name> <Content>
```
## DefineParams
Preprocessor define's parameters.
Fields:
```cpp
StrCached Name;
Code Last;
Code Next;
Token* Tok;
Code Parent;
CodeType Type;
s32 NumEntries;
```
Serialization:
```cpp
<Name>, <Next> ...
```
## Destructor
Fields:
```cpp
CodeComment InlineCmt;
CodeSpecifiers Specs;
Code Body;
StrCached Name;
Code Prev;
Code Next;
Token* Tok;
Code Parent;
CodeType Type;
```
Serialization:
```cpp
// Destructor_Fwd
<Specs> ~<Parent->Name>( <Params> ) <Specs>; <InlineCmt>
// Destructor
<Specs> ~<Parent->Name>( <Params> ) <Specs>
{
<Body>
}
// Destructor Source Implementation
<Specs> <Parent>::~<Parent->Name>( <Params> ) <Specs>
{
<Body>
}
```
## Enum
Fields:
```cpp
CodeComment InlineCmt;
CodeAttributes Attributes;
CodeType UnderlyingType;
Code UnderlyingTypeMacro;
CodeBody Body;
Code Prev;
Code Next;
Token* Tok;
Code Parent;
StrCached Name;
CodeType Type;
ModuleFlag ModuleFlags;
```
UnderlyingTypeMacro is a macro the library natively supports: `enum_underlying(type)` that is meant to behave as a wrapper for underlying type assignment.
The `enum_underlying_sig` is a `Str` global var that can be set which will be defined within `PreprocessorDefines` and used in `parser_parse_enum` to identify a valid macro.
Serialization:
```cpp
// Enum_Fwd
<ModuleFlags> enum class <Name> : <UnderlyingType> or <UnderlyingTypeMacro> ; <InlineCmt>
// Enum
<ModuleFlags> <enum or enum class> <Name> : <UnderlyingType> or <UnderlyingTypeMacro>
{
<Body>
};
```
## Execution
Just represents an execution body. Equivalent to an untyped body.
Will be obsolete when function body parsing is implemented.
Fields:
```cpp
StrCached Content;
StrCached Name;
Code Prev;
Code Next;
Token* Tok;
Code Parent;
CodeType Type;
```
Serialization:
```cpp
<Content>
```
## External Linkage (Extern)
Fields:
```cpp
CodeBody Body;
StrCached Name;
Code Prev;
Code Next;
Token* Tok;
Code Parent;
CodeType Type;
```
Serialization:
```cpp
extern "<Name>"
{
<Body>
}
```
## Include
Fields:
```cpp
StrCached Content;
StrCached Name;
Code Prev;
Code Next;
Code Parent;
Token* Tok;
CodeType Type;
```
Serialization:
```cpp
#include <Content>
```
## Friend
This library (until its necessary become some third-party library to do otherwise) does not support friend declarations with in-statment function definitions.
Fields:
```cpp
CodeComment InlineCmt;
Code Declaration;
StrCached Name;
Code Prev;
Code Next;
Token* Tok;
Code Parent;
CodeType Type;
```
Serialization:
```cpp
friend <Declaration>; <InlineCmt>
```
## Function
Fields:
```cpp
CodeComment InlineCmt;
CodeAttributes Attributes;
CodeSpecifiers Specs;
CodeType ReturnType;
CodeParams Params;
CodeBody Body;
StrCached Name;
Code Prev;
Code Next;
Token* Tok;
Code Parent;
CodeType Type;
ModuleFlag ModuleFlags;
```
Serialization:
```cpp
// Function_Fwd
<ModuleFlags> <Attributes> <Specs> <ReturnType> <Name>( <Params> ) <Specs>; <InlineCmt>
// Function
<ModuleFlags> <Attributes> <Specs> <ReturnType> <Name>( <Params> ) <Specs>
{
<Body>
}
```
## Module
Fields:
```cpp
StrCached Name;
Code Prev;
Code Next;
Token* Tok;
Code Parent;
CodeType Type;
ModuleFlag ModuleFlags;
```
Serialization:
```cpp
<ModuleFlags> module <Name>;
```
## Namespace
Fields:
```cpp
CodeBody Body;
StrCached Name;
Code Prev;
Code Next;
Token* Tok;
Code Parent;
CodeType Type;
ModuleFlag ModuleFlags;
```
Serialization:
```cpp
<ModuleFlags> namespace <Name>
{
<Body>
}
```
## Operator Overload (Operator)
Fields:
```cpp
CodeComment InlineCmt;
CodeAttributes Attributes;
CodeSpecifiers Specs;
CodeType ReturnType;
CodeParams Params;
CodeBody Body;
StrCached Name;
Code Prev;
Code Next;
Token* Tok;
Code Parent;
CodeType Type;
ModuleFlag ModuleFlags;
OperatorT Op;
```
Serialization:
```cpp
// Operator_Fwd
<ModuleFlags> <Attributes> <Specs> <ReturnType> operator <Op>( <Params> ) <Specs>; <InlineCmt>
// Operator
<ModuleFlags> <Attributes> <Specs> <ReturnType> <Name>operator <Op>( <Params> ) <Specs>
{
<Body>
}
```
## Operator Cast Overload ( User-Defined Type Conversion, OpCast )
Fields:
```cpp
CodeComment InlineCmt;
CodeSpecifiers Specs;
CodeType ValueType;
CodeBody Body;
StrCached Name;
Code Prev;
Code Next;
Token* Tok;
Code Parent;
CodeType Type;
```
Serialization:
```cpp
// Operator_Cast_Fwd
<Specs> operator <ValueType>() <Specs>; <InlineCmt>
// Operator_Cast
<Specs> <Name>operator <ValueType>() <Specs>
{
<Body>
}
```
## Parameters (AST_Params)
Fields:
```cpp
CodeType ValueType;
Code Macro;
Code Value;
Code PostNameMacro;
StrCached Name;
CodeParams Last;
CodeParams Next;
Token* Tok;
Code Parent;
CodeType Type;
s32 NumEntries;
```
Serialization:
```cpp
<Macro>, <Next> ... <Last>
<Macro> <ValueType> <Name> <PostNameMacro> = <Value>, <Next>... <Last>
```
## Pragma
Fields:
```cpp
StrCached Content;
StrCached Name;
Code Prev;
Code Next;
Token* Tok;
Code Parent;
CodeType Type;
```
Serialization:
```cpp
#pragma <Content>
```
## Preprocessor Conditional
Fields:
```cpp
StrCached Content;
StrCached Name;
Code Prev;
Code Next;
Token* Tok;
Code Parent;
CodeType Type;
```
Serialization:
```cpp
#<based off Type> <Content>
```
## Specifiers
Fields:
```cpp
SpecifierT ArrSpecs[ AST_ArrSpecs_Cap ];
CodeSpecifiers NextSpecs;
StrCached Name;
Code Prev;
Code Next;
Token* Tok;
Code Parent;
CodeType Type;
s32 NumEntries;
```
Serialization:
```cpp
<Spec>, ...
```
## Template
Fields:
```cpp
CodeParams Params;
Code Declaration;
StrCached Name;
Code Prev;
Code Next;
Token* Tok;
Code Parent;
CodeType Type;
ModuleFlag ModuleFlags;
```
Serialization:
```cpp
<ModuleFlags>
template< <Params> >
<Declaration>
```
## Typename
Typenames represent the type "symbol".
Fields:
```cpp
CodeAttributes Attributes;
CodeSpecifiers Specs;
CodeReturnType ReturnType;
CodeParams Params;
Code ArrExpr;
Code Prev;
Code Next;
Token* Tok;
Code Parent;
StrCached Name;
CodeType Type;
b32 IsParamPack;
ETypenameTag TypeTag;
```
Serialization:
```cpp
<Attributes> <TypeTag> <Name> <Specs> <IsParamPack ?: ...>
// Function
<Attributes> <ReturnType> <Name> <Params> <Specs>
```
`<Name>` currently has the full serialization of anything with
*Note: ArrExpr is not used in serialization by `typename_to_strbuilder_ref` its instead handled by a parent AST's serailization (variable, typedef, using).*
## Typedef
Behave as usual except function or macro typedefs.
Those (macros) don't use the underlying type field as everything was serialized under the Name field.
Fields:
```cpp
CodeComment InlineCmt;
Code UnderlyingType;
StrCached Name;
Code Prev;
Code Next;
Token* Tok
Code Parent;
CodeType Type;
ModuleFlag ModuleFlags;
b32 IsFunction;
```
Serialization:
```cpp
// Regular
<ModuleFlags> typedef <UnderlyingType> <Name> <UnderlyingType-ArrExpr>; <InlineCmt>
// Functions
// Currently:
<ModuleFlags> typedef <UnderlyingType (Serialized expression)>; <InlineCmt>
// Desired: Not handled yet
<ModuleFlags> typedef <UnderlyingType->ReturnType> UnderlyingType->Name> <UnderlyingType-ArrExpr> ( <UnderlyingType->Parameters> ); <InlineCmt>
<ModuleFlags> typedef <UnderlyingType->ReturnType> ( <Name->Namespace> for<Specs->has(Spec_Ptr) ?: *> <UnderlyingType->Name> <UnderlyingType-ArrExpr> ) ( <UnderlyingType->Parameters> ); <InlineCmt>
```
## Union
Fields:
```cpp
CodeAttributes Attributes;
CodeBody Body;
StrCached Name;
Code Prev;
Code Next;
Token* Tok;
Code Parent;
CodeType Type;
ModuleFlag ModuleFlags;
```
Serialization:
```cpp
<ModuleFlags> union <Attributes> <Name>
{
<Body>
}
```
## Using
Fields:
```cpp
CodeComment InlineCmt;
CodeAttributes Attributes;
CodeType UnderlyingType;
StrCached Name;
Code Prev;
Code Next;
Token* Tok;
Code Parent;
CodeType Type;
ModuleFlag ModuleFlags;
```
Serialization:
```cpp
// Regular
<ModuleFlags> using <Attributes> <Name> = <UnderlyingType>; <InlineCmt>
// Namespace
<ModuleFlags> using namespace <Name>; <InlineCmt>
```
## Variable
[Algo](./Parser_Algo.md:)
Fields:
```cpp
CodeComment InlineCmt;
CodeAttributes Attributes;
CodeSpecifiers Specs;
CodeType ValueType;
Code BitfieldSize;
Code Value;
StrCached Name;
CodeVar NextVar;
Code Prev;
Code Next;
Token* Tok;
Code Parent;
CodeType Type;
ModuleFlag ModuleFlags;
s32 VarParenthesizedInit;
```
Serialization:
```cpp
// Regular
<ModuleFlags> <Attributes> <Specs> <ValueType> <Name> = <Value>, NextVar ...; <InlineCmt>
// Bitfield
<ModuleFlags> <Attributes> <Specs> <ValueType> <Name> : <BitfieldSize> = <Value>, NextVar ...; <InlineCmt>
// VarParenthesizedInit
<Attributes> <Specs> <ValueType> <Name>( <Value>, NextVar ... ); <InlineCmt>
```
+44
View File
@@ -0,0 +1,44 @@
BSD 3-Clause License
Copyright (c) 2023, Edward R. Gonzalez
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Source URL: https://github.com/Ed94/gencpp
Acknowledgements
* The dependencies for gencpp source are derived from the zpl library: https://github.com/zpl-c/zpl
Special thanks to:
* The Handmade Community.
* Casey Muratori, Ginger Bill (Bill Hall), Mr. 4th (Allen Webster), Ryan Fluery: Influnced conceptually how to handle staged metaprograming.
* Jonathan Blow: Jai's metaprogramming influenced the design of this library.
* My friends for putting up with discord spam on this library.
File diff suppressed because it is too large Load Diff
+159
View File
@@ -0,0 +1,159 @@
## Navigation
[Top](../Readme.md)
<- [docs - General](Readme.md)
# Parsing
The library features a naive single-pass parser, tailored for only what the library needs; for construction of C++ code into gencpp's AST for *"front-end"* meta-programming purposes.
This parser does not, and should not do the compiler's job. By only supporting this minimal set of features, the parser is kept (so far) around ~7000 loc. I hope to keep it under 10-15k loc worst case.
You can think of this parser as *frontend parser* vs a *semantic parser*. Its intuitively similar to WYSIWYG. What you ***perceive*** as the syntax from the user-side before the compiler gets a hold of it, is what you get.
User exposed interface:
```cpp
CodeClass parse_class ( Str class_def );
CodeConstructor parse_constructor ( Str constructor_def );
CodeDefine parse_define ( Str define_def );
CodeDestructor parse_destructor ( Str destructor_def );
CodeEnum parse_enum ( Str enum_def );
CodeBody parse_export_body ( Str export_def );
CodeExtern parse_extern_link ( Str exten_link_def );
CodeFriend parse_friend ( Str friend_def );
CodeFn parse_function ( Str fn_def );
CodeBody parse_global_body ( Str body_def );
CodeNS parse_namespace ( Str namespace_def );
CodeOperator parse_operator ( Str operator_def );
CodeOpCast parse_operator_cast( Str operator_def );
CodeStruct parse_struct ( Str struct_def );
CodeTemplate parse_template ( Str template_def );
CodeType parse_type ( Str type_def );
CodeTypedef parse_typedef ( Str typedef_def );
CodeUnion parse_union ( Str union_def );
CodeUsing parse_using ( Str using_def );
CodeVar parse_variable ( Str var_def );
```
To parse file buffers, use the `parse_global_body` function.
***Parsing will aggregate any tokens within a function body or expression statement to an untyped Code AST.***
Everything is done in one pass for both the preprocessor directives and the rest of the language.
The parser performs no macro expansion as the scope of gencpp feature-set is to only support the preprocessor for the goal of having rudimentary awareness of preprocessor ***conditionals***, ***defines***, ***includes***, and ***pragmas***.
The keywords supported for the preprocessor are:
* include
* define
* if
* ifdef
* elif
* endif
* pragma
Each directive `#` line is considered one preproecessor unit, and will be treated as one Preprocessor AST node.
If a directive is used with an unsupported keyword its will be processed as an untyped AST.
The preprocessor lines are stored as members of their associated scope they are parsed within ( Global, Namespace, Class/Struct ).
***Again: These ASTs will be considered members or entries of braced scope they reside within***
Any preprocessor definition abuse that changes the syntax of the core language is unsupported and will fail to parse if not kept within an execution scope (function body, or expression assignment).
Exceptions:
* variable definitions are allowed for a preprocessed macro `extern MACRO();`
* function definitions are allowed for a preprocessed macro: `neverinline MACRO() { ... }`
* Disable with: `#define GEN_PARSER_DISABLE_MACRO_FUNCTION_SIGNATURES`
* typedefs allow for a preprocessed macro: `typedef MACRO();`
* Disable with: `#define GEN_PARSER_DISABLE_MACRO_TYPEDEF`
* Macros can behave as typenames
* There is some macro support in parameters for functions or templates *(Specifically added to support parsing Unreal Engine source)*.
*(Exceptions are added on an on-demand basis)*
*(See functions `parse_operator_function_or_variable` and `parse_typedef` )*
Adding your own exceptions is possible by simply modifying the parser to allow for the syntax you need.
*Note: You could interpret this strictness as a feature. This would allow the user to see if their codebase or a third-party's codebase contains some egregious preprocessor abuse.*
Macros used within a file should be registered by the user before parsing. This can be done two ways:
1. The register macro interface within [interface.hpp](../base/components/interface.hpp).
2. Using `def_define` to create a CodeDefine and making sure to not set `opts.dont_register_to_preprocess_macros` to `true`.
## Registering macros
While the registeration of macros in the meta-program's side for parsing can be considered tedius, its necessary for the parser to accurately resolve the macros intent in one pass (and it provides some hygenics by verifying that they are used as intended).
The following can be used to register a macro:
```c
GEN_API void register_macro( Macro macro );
GEN_API void register_macros( s32 num, ... );
GEN_API void register_macros_arr( s32 num, Macro* macros );
```
The Macro typename is defined with the following in [parser_types.hpp](../base/components/parser_types.hpp):
```c
struct Macro
{
StrCached Name;
MacroType Type;
MacroFlags Flags;
};
```
The macro can be designated one of the following types:
* `MT_Expression`: Intended to resolve to an expression expansion.
* `MT_Statement`: Intended to resolve an statement expansion.
* `MT_Typename`: Intended to resolve to a typename.
Additioonally tthe following flags may be set:
* `MF_Functional`: The macro intended to be passed arguments are at least have the calling `()` as part of its usage.
* `MF_Expects_Body`: The parser should expect a braced-body `{ ... }` after the macro signature `<name> <params>`
* `MF_Allow_As_Identifier`: Will allow the macro to be an acceptable token/s when an `Tok_Identifier` is expected.
* `MF_Allow_As_Attribute`: Will allow the macro to be an acceptable token/s when an attribute token/s is expected.
* `MF_Allow_As_Definition`: Will allow the macro be an acceptable token/s when the parser expects a declartion or definition to resolve after attributes or specifiers have been identified beforehand.
* This flag requires that the macro is of type `MT_Statement` to make any sense of usage.
If a macro is not defined the following warning will be issued if `GEN_BUILD_DEBUG=1` during lexing within [lexer.cpp](../base/components/lexer.cpp) - `lex_preprocessor_define`:
```c
log_fmt("Warning: '%S' was not registered before the lexer processed its #define directive, it will be registered as a expression macro\n"
, name.Text
);
```
Further within the same scope, the lexer will issue a warning if it detects a macro was not flagged as function but has an open parenthesis `(` token right after is name with no whitespace:
```c
log_fmt("Warning: %S registered macro is not flagged as functional yet the definition detects opening parenthesis '(' for arguments\n"
, name.Text
);
```
Macros are tracked using a `MacroTable Macros;` defined as a member of the library's `Context`.
```c
typedef HashTable(Macro) MacroTable;
```
## Notes
* Empty lines used throughout the file are preserved for formatting purposes during ast serialization (they have a dedicated Token: `Tok_NewLine`).
* Numeric literals are not checked for validity.
* The parse API treats any execution scope definitions with no validation and are turned into untyped Code ASTs. (There is a [todo](https://github.com/Ed94/gencpp/issues/49) to add support)
* *This includes the assignment of variables.*
* Attributes ( `[[]]` (standard), `__declspec` (Microsoft), or `__attribute__` (GNU) )
* Assumed to *come before specifiers* (`const`, `constexpr`, `extern`, `static`, etc) for a function or right afterthe return type.
* Or in the usual spot for class, structs, (*right after the declaration keyword*)
* typedefs have attributes with the type (`parse_type`)
* Parsing attributes can be extended to support user defined macros by defining `GEN_DEFINE_ATTRIBUTE_TOKENS` (see `gen.hpp` for the formatting)
* This is useful for example: parsing Unreal `Module_API` macros.
**The lexer & parser do not gracefully attempt to continue when it comes across incorrect code, and doesn't properly track errors into a listing (yet).**
+147
View File
@@ -0,0 +1,147 @@
## Navigation
[Top](../Readme.md)
* [docs](../docs/Readme.md)
# C Library Generation
`c_library.cpp` generates both *segemnted* and *singleheader* variants of the library compliant with C11.
The output will be in the `gen_segmented/gen` directory (if the directory does not exist, it will create it).
If using the library's provided build scripts:
```ps1
.\build.ps1 <compiler> <debug or omit> c_lib
```
To generate a static or dynamic library:
```ps1
.\build.ps1 <compiler> <debug or omit> c_lib_static c_lib_dyn
```
.
All free from tag identifiers will be prefixed with `gen_` or `GEN_` as the namespace. This can either be changed after generation with a `.refactor` script (or your preferred subst method), OR by modifying [c_library.refactor](./c_library.refactor).
**If c_library.refactor is modified you may need to modify c_library.cpp and its [components](./components/). As some of the container generation relies on that prefix.**
## Generation structure
1. Files are scanned in or parsed.
* If they are parsed, its due to requiring some changes to either naming, or adding additonal definitions (container generation, typedefs, etc).
2. All scanned or parsed code is refactored (identifiers substs) and/or formatted.
3. Singleheader generated.
4. Segmented headers and source generated.
## Templated container generation
The array and hashtable containers used across this library are generated using the following implementatioon:
* [containers.array.hpp](./components/containers.array.hpp)
* [containers.hashtable.hpp](./components/containers.hashtable.hpp)
These are functionally (and interface wise) equivalent to the library's `Array<Type>` `HashTable<Type>` within [containers.hpp](../base/dependencies/containers.hpp)
Both files follow the same pattern of providing three procedures:
* `gen_<container>_base` : Intended to be called once, defines universal "base" definitions.
* `gen_<container>` : Called per instatiation of the container for a given set of dependent args.
* `gen_<container>_generic_selection_interface` : Intended to be called once after all of the instantiated containers have finished generating. It will generate a set of generic selection macros as described by Macro Usage section below.
A simple `<container>_DefinitionCounter` is used to know how many instantiations of the template have occured. This is used to determine how to define `GENERIC_SLOT_<ID>_<functionID>` for the generic interface along with how many slots the `_Generic` macro will need to have generated.
## Macro Usage
For the most part macros are kept minimal with exception to `_Generic`...
The `_Generic` macro plays a key role in reducing direct need of the user to wrangle with mangled definition identifiers of 'templated' containers or for type coercion to map distinct data types to a common code path.
Many C11 libraries don't use it.. and, of those that do. they usually end up obfuscate it with excessive preprocessor abuse; Effort was put into minimizing how much of these macros are handled by the preprocessor vs gencpp itself.
*(I will be explaining this thing for the rest of this seciton along with gencpp c library's usage of it)*
The usual presentation (done bare) is the following:
```c
#define macro_that_selects_typeof_arg(arg, y) \
_Generic( (arg), \
int : some expression, \
double : some other expression, \
struct Whatnot : something else again, \
default : fallback expression \
)
```
Where `_Generic` can be considered the follwoing (psuedo-C):
```c
#define type_expr_pair(type, expr) type: expr
C_Expression _Generic( selector_arg, a_type_expr_pair, ... ) {
switch( typeof(selector_arg)) {
case a_type_expr_pair:
return a_type_expr_pari.expr;
...
default:
return default.expr;
}
}
```
The first `arg` of _Generic behaves as the "controlling expression" or the expression that resolves to a type which will dictate which of the following expressions provided after to `_Generic` will be resolved as the one used inline for the implemenation.
For this library's purposes we'll be using the functional macro equivalent *(if there is an exception I'll link it at the end of this section)*:
```c
#define macro_that_uses_selector_arg_for_resolving_a_fucntion( selecting_exp) \
_Generic( (selecting_exp), \
int : func_use_int, \
double : func_use_double, \
struct Whatnot : func_use_Whatnot, \
default : struct SIGNALS_FAILURE \
) (selecting_exp)
```
In this case, we directly added `(selecting_exp)` to the end there.. as we want to directly have the macro resolve to calling a resolved procedure. A default has been set to a struct as that leads to a neat compiler message that would otherwise be impossible beause static_assert is a statement and thus cannot be used within a slot.
Now, even with gencpp generating this type-expression table, we still need wrapper macros to achieve function 'overloading' for the templated containers as _Generic has a [significant drawback](https://www.chiark.greenend.org.uk/~sgtatham/quasiblog/c11-generic/):
> Discarded expressions still have to be semantically valid.
The only way to absolve this issue [(without resorting to nasty preprocessor hacks)](https://github.com/JacksonAllan/CC/blob/main/articles/Better_C_Generics_Part_1_The_Extendible_Generic.md) is with wrapping expressions in 'slot' resolving macros that do not expand if the slot is not defined:
```c
GEN_IF_MACRO_DEFINED_INCLUDE_THIS_SLOT( GENERIC_SLOT_1__function_sig )
```
`GENERIC_SLOT_1__function_sig` is our warpper of a "`int, func_use_int`" pair. The `GEN_IF_MACRO_DEFINED_INCLUDE_THIS_SLOT` is a verbse named macro to indicate that that pair will be expanded ***ONLY IF*** its defined.
So for any given templated container interface. Expect the follwoing (taken straight from generation, and just cleaned up formatting):
```c
#define gen_array_append( selector_arg, ... ) _Generic( \
(selector_arg ), \
GEN_IF_MACRO_DEFINED_INCLUDE_THIS_SLOT( GEN_GENERIC_SLOT_1__array_append ) \
GEN_IF_MACRO_DEFINED_INCLUDE_THIS_SLOT( GEN_GENERIC_SLOT_2__array_append ) \
GEN_IF_MACRO_DEFINED_INCLUDE_THIS_SLOT( GEN_GENERIC_SLOT_3__array_append ) \
GEN_IF_MACRO_DEFINED_INCLUDE_THIS_SLOT( GEN_GENERIC_SLOT_4__array_append ) \
GEN_IF_MACRO_DEFINED_INCLUDE_THIS_SLOT( GEN_GENERIC_SLOT_5__array_append ) \
GEN_IF_MACRO_DEFINED_INCLUDE_THIS_SLOT( GEN_GENERIC_SLOT_6__array_append ) \
GEN_IF_MACRO_DEFINED_INCLUDE_THIS_SLOT( GEN_GENERIC_SLOT_7__array_append ) \
GEN_IF_MACRO_DEFINED_INCLUDE_THIS_SLOT( GEN_GENERIC_SLOT_8__array_append ) \
GEN_IF_MACRO_DEFINED_INCLUDE_THIS_SLOT( GEN_GENERIC_SLOT_9__array_append ) \
GEN_IF_MACRO_DEFINED_INCLUDE_THIS_SLOT( GEN_GENERIC_SLOT_10__array_append ) \
default: gen_generic_selection_fail \
) GEN_RESOLVED_FUNCTION_CALL( &selector_arg, __VA_ARGS__ )
```
*Note(Ed): Unfortunately I cannot get clang-format to output these macros sanely like the above..*
*Eventually I'll add some basic builtin formatting but if the user has suggestions for something better I'm open ears...*
`GEN_RESOLVED_FUNCTION_CALL` is an empty define, its just to indicate that its intended to expand to a function call.
To see the the actual macro definitions used - see: [generic_macros.h](./components/generic_macros.h). They'll be injected right after the usual macros in the generated header file.
+453
View File
@@ -0,0 +1,453 @@
# General Docs
[Top](../Readme.md)
Contains:
* [AST_Design](./AST_Design.md): Overview of ASTs
* [AST Types](./AST_Types.md): Listing of all AST types along with their Code type interface.
* [Parsing](./Parsing.md): Overview of the parsing interface.
* [Parser Algo](./Parser_Algo.md): In-depth breakdown of the parser's implementation.
### *CURRENTLY UNSUPPORTED*
**There is no support for validating expressions.**
Its a [todo](https://github.com/Ed94/gencpp/issues/49)
**Only trivial template support is provided.**
The intention is for only simple, non-recursive substitution.
The parameters of the template are treated like regular parameter AST entries.
This means that the typename entry for the parameter AST would be either:
* `class`
* `typename`
* A fundamental type, function, or pointer type.
***Concepts and Constraints are not supported***
Its a [todo](https://github.com/Ed94/gencpp/issues/21)
### Feature Macros
* `GEN_DEFINE_ATTRIBUTE_TOKENS` : Allows user to define their own attribute macros for use in parsing.
* This can be generated using base.cpp.
* `GEN_DEFINE_LIBRARY_CORE_CONSTANTS` : Optional typename codes as they are non-standard to C/C++ and not necessary to library usage
* `GEN_DONT_ENFORCE_GEN_TIME_GUARD` : By default, the library ( gen.hpp/ gen.cpp ) expects the macro `GEN_TIME` to be defined, this disables that.
* `GEN_ENFORCE_STRONG_CODE_TYPES` : Enforces casts to filtered code types.
* `GEN_EXPOSE_BACKEND` : Will expose symbols meant for internal use only.
* `GEN_ROLL_OWN_DEPENDENCIES` : Optional override so that user may define the dependencies themselves.
* `GEN_DONT_ALLOW_INVALID_CODE` (Not implemented yet) : Will fail when an invalid code is constructed, parsed, or serialized.
* `GEN_C_LIKE_CPP` : Setting to `<true or 1>` Will prevent usage of function defnitions using references and structs with member functions. Structs will still have user-defined operator conversions, for-range support, and other operator overloads
### The Data & Interface
The library's persistent state is managed tracked by a context struct: `global Context* _ctx;` defined within [static_data.cpp](../base/components/static_data.cpp)
https://github.com/Ed94/gencpp/blob/967a044637f1615c709cb723dc61118fcc08dcdb/base/components/interface.hpp#L39-L97
The interface for the context:
* `init`: Initializtion
* `deinit`: De-initialization.
* `reset`: Clears the allocations, but doesn't free the memoery, then calls `init()` on `_ctx` again.
* `get_context`: Retreive the currently tracked context.
* `set_context`: Swap out the current tracked context.
#### Allocato usage
* `Allocator_DyanmicContainers`: Growing arrays, hash tables. (Unbounded sized containers)
* `Allocator_Pool`: Fixed-sized object allocations (ASTs, etc)
* `Allocator_StrCache`: StrCached allocations
* `Allocator_Temp`: Temporary alloations mostly intended for StrBuilder usage. Manually cleared by the user by their own discretion.
The allocator definitions used are exposed to the user incase they want to dictate memory usage
* Allocators are defined with the `AllocatorInfo` structure found in [`memory.hpp`](../base/dependencies/memory.hpp)
* Most of the work is just defining the allocation procedure:
```cpp
void* ( void* allocator_data, AllocType type, ssize size, ssize alignment, void* old_memory, ssize old_size, u64 flags );
```
For any allocator above that the user does not define before `init`, a fallback allocator will be assigned that utiizes the `fallback_allocator_proc` wtihin [interface.cpp](../base/components/interface.cpp).
As mentioned in root readme, the user is provided Code objects by calling the constructor's functions to generate them or find existing matches.
The AST is managed by the library and provided to the user via its interface.
However, the user may specifiy memory configuration.
[Data layout of AST struct (Subject to heavily change with upcoming todos)](../base/components/ast.hpp#L396-461)
https://github.com/Ed94/gencpp/blob/967a044637f1615c709cb723dc61118fcc08dcdb/base/components/ast.hpp#L369-L435
*`StringCahced` is a typedef for `Str` (a string slice), to denote it is an interned string*
*`CodeType` is enum taggin the type of code. Has an underlying type of `u32`*
*`OperatorT` is a typedef for `EOperator::Type` which has an underlying type of `u32`*
*`StrBuilder` is the dynamically allocating string builder type for the library*
AST widths are setup to be AST_POD_Size (128 bytes by default).
The width dictates how much the static array can hold before it must give way to using an allocated array:
```cpp
constexpr static
int AST_ArrSpecs_Cap =
(
AST_POD_Size
- sizeof(Code)
- sizeof(StrCached)
- sizeof(Code) * 2
- sizeof(Token*)
- sizeof(Code)
- sizeof(CodeType)
- sizeof(ModuleFlag)
- sizeof(u32)
)
/ sizeof(Specifier) - 1;
```
Data Notes:
* ASTs are wrapped for the user in a Code struct which is a wrapper for a AST* type.
* Code types have member symbols but their data layout is enforced to be POD types.
* This library treats memory failures as fatal.
* Cached Strings are stored in their own set of arenas. AST constructors use cached strings for names, and content.
* Strings used for serialization and file buffers are not contained by those used for cached strings.
* `_ctx->Allocator_Temp` is used.
* Its intended to generate the AST in one go and serialize after. The constructors and serializer are designed to be a "one pass, front to back" setup.
* Any modifcations to an existing AST should be to just construct another with the modifications done on-demand while traversing the AST (non-destructive).
The following CodeTypes are used which the user may optionally use strong typing with if they enable: `GEN_ENFORCE_STRONG_CODE_TYPES`
* CodeBody : Has support for `for : range` iterating across Code objects.
* CodeAttributes
* CodeComment
* CodeClass
* CodeConstructor
* CodeDefine
* CodeDefineParams
* CodeDestructor
* CodeEnum
* CodeExec
* CodeExtern
* CodeInclude
* CodeFriend
* CodeFn
* CodeModule
* CodeNS
* CodeOperator
* CodeOpCast : User defined member operator conversion
* CodeParams : Has support for `for : range` iterating across parameters.
* CodePreprocessCond
* CodePragma
* CodeSpecifiers : Has support for `for : range` iterating across specifiers.
* CodeStruct
* CodeTemplate
* CodeTypename
* CodeTypedef
* CodeUnion
* CodeUsing
* CodeVar
Each `struct Code<Name>` has an associated "filtered AST" with the naming convention: `AST_<CodeName>`
Unrelated fields of the AST for that node type are omitted and only necessary padding members are defined otherwise.
For the interface related to these code types see:
* [ast.hpp](../base/components/ast.hpp): Under the region pragma `Code C-Interface`
* [code_types.hpp](../base/components/code_types.hpp): Under the region pragma `Code C-Interface`. Additional functionlity for c++ will be within the struct definitions or at the end of the file.
## There are three categories of interfaces for Code AST generation & reflection
* Upfront
* Parsing
* Untyped
### Upfront Construction
All component ASTs must be previously constructed, and provided on creation of the code AST.
The construction will fail and return CodeInvalid otherwise.
Interface :``
* def_attributes
* *This is pre-appended right before the function symbol, or placed after the class or struct keyword for any flavor of attributes used.*
* *Its up to the user to use the desired attribute formatting: `[[]]` (standard), `__declspec` (Microsoft), or `__attribute__` (GNU).*
* def_comment
* def_class
* def_constructor
* def_define
* def_define_params
* def_destructor
* def_enum
* def_execution
* *This is equivalent to untyped_str, except that its intended for use only in execution scopes.*
* def_extern_link
* def_friend
* def_function
* def_include
* def_module
* def_namespace
* def_operator
* def_operator_cast
* def_param
* def_params
* def_pragma
* def_preprocess_cond
* def_specifier
* def_specifiers
* def_struct
* def_template
* def_type
* def_typedef
* def_union
* def_using
* def_using_namespace
* def_variable
Bodies:
* def_body
* def_class_body
* def_enum_body
* def_export_body
* def_extern_link_body
* def_function_body
* *Use this for operator bodies as well*
* def_global_body
* def_namespace_body
* def_struct_body
* def_union_body
Usage:
```cpp
<name> = def_<function type>( ... );
Code <name>
{
...
<name> = def_<function name>( ... );
}
```
All optional parmeters are defined within `struct Opts_def_<functon name>`. This was done to setup a [macro trick](https://x.com/vkrajacic/status/1749816169736073295) for default optional parameers in the C library:
```cpp
struct gen_Opts_def_struct
{
gen_CodeBody body;
gen_CodeTypename parent;
gen_AccessSpec parent_access;
gen_CodeAttributes attributes;
gen_CodeTypename* interfaces;
gen_s32 num_interfaces;
gen_ModuleFlag mflags;
};
typedef struct gen_Opts_def_struct gen_Opts_def_struct;
GEN_API gen_CodeClass gen_def__struct( gen_Str name, gen_Opts_def_struct opts GEN_PARAM_DEFAULT );
#define gen_def_struct( name, ... ) gen_def__struct( name, ( gen_Opts_def_struct ) { __VA_ARGS__ } )
```
In the C++ library, the `def_<funtion name>` is not wrapped in a macro.
When using the body functions, its recommended to use the args macro to auto determine the number of arguments for the varadic:
```cpp
def_global_body( args( ht_entry, array_ht_entry, hashtable ));
// instead of:
def_global_body( 3, ht_entry, array_ht_entry, hashtable );
```
If a more incremental approach is desired for the body ASTs, `Code def_body( CodeT type )` can be used to create an empty bodyss
When the members have been populated use: `code_validate_body` to verify that the members are valid entires for that type.
### Parse construction
A string provided to the API is parsed for the intended language construct.
Interface :
* parse_class
* parse_constructor
* parse_define
* parse_destructor
* parse_enum
* parse_export_body
* parse_extern_link
* parse_friend
* parse_function
* parse_global_body
* parse_namespace
* parse_operator
* parse_operator_cast
* parse_struct
* parse_template
* parse_type
* parse_typedef
* parse_union
* parse_using
* parse_variable
Usage:
```cpp
Code <name> = parse_<function name>( string with code );
Code <name> = def_<function name>( ..., parse_<function name>(
<string with code>
));
```
### Untyped constructions
Code ASTs are constructed using unvalidated strings.
Interface :
* token_fmt_va
* token_fmt
* untyped_str
* untyped_fmt
* untyped_token_fmt
During serialization any untyped Code AST has its string value directly injected inline of whatever context the content existed as an entry within.
Even though these are not validated from somewhat correct c/c++ syntax or components, it doesn't mean that Untyped code can be added as any component of a Code AST:
* Untyped code cannot have children, thus there cannot be recursive injection this way.
* Untyped code can only be a child of a parent of body AST, or for values of an assignment (ex: variable assignment).
These restrictions help prevent abuse of untyped code to some extent.
Usage Conventions:
```cpp
Code <name> = def_variable( <type>, <name>, untyped_<function name>(
<string with code>
));
Code <name> = untyped_str( code(
<some code without "" quotes>
));
```
Optionally, `code_str`, and `code_fmt` macros can be used so that the code macro doesn't have to be used:
```cpp
Code <name> = code_str( <some code without "" quotes > )
```
Template metaprogramming in the traditional sense becomes possible with the use of `token_fmt` and parse constructors:
```cpp
Str value = txt("Something");
char const* template_str = txt(
Code with <key> to replace with token_values
...
);
char const* gen_code_str = token_fmt( "key", value, template_str );
Code <name> = parse_<function name>( gen_code_str );
```
## Predefined Codes
The following are provided predefined by the library as they are commonly used:
* `enum_underlying_macro`
* `access_public`
* `access_protected`
* `access_private`
* `attrib_api_export`
* `attrib_api_import`
* `module_global_fragment`
* `module_private_fragment`
* `fmt_newline`
* `pragma_once`
* `param_varaidc` (Used for varadic definitions)
* `preprocess_else`
* `preprocess_endif`
* `spec_const`
* `spec_consteval`
* `spec_constexpr`
* `spec_constinit`
* `spec_extern_linkage` (extern)
* `spec_final`
* `spec_forceinline`
* `spec_global` (global macro)
* `spec_inline`
* `spec_internal_linkage` (internal macro)
* `spec_local_persist` (local_persist macro)
* `spec_mutable`
* `spec_neverinline`
* `spec_noexcept`
* `spec_override`
* `spec_ptr`
* `spec_pure`
* `spec_ref`
* `spec_register`
* `spec_rvalue`
* `spec_static_member` (static)
* `spec_thread_local`
* `spec_virtual`
* `spec_volatile`
* `t_empty` (Used for varaidc macros)
* `t_auto`
* `t_void`
* `t_int`
* `t_bool`
* `t_char`
* `t_wchar_t`
* `t_class`
* `t_typename`
Optionally the following may be defined if `GEN_DEFINE_LIBRARY_CODE_CONSTANTS` is defined
* `t_b32`
* `t_s8`
* `t_s16`
* `t_s32`
* `t_s64`
* `t_u8`
* `t_u16`
* `t_u32`
* `t_u64`
* `t_ssize` (ssize_t)
* `t_usize` (size_t)
* `t_f32`
* `t_f64`
## Extent of operator overload validation
The AST and constructors will be able to validate that the arguments provided for the operator type match the expected form:
* If return type must match a parameter
* If number of parameters is correct
* If added as a member symbol to a class or struct, that operator matches the requirements for the class (types match up)
* There is no support for validating new & delete operations (yet)
The user is responsible for making sure the code types provided are correct
and have the desired specifiers assigned to them beforehand.
## Code generation and modification
There are two provided auxiliary interfaces:
* Builder
* Scanner
### Builder is a similar object to the jai language's strbuilder_builder
* The purpose of it is to generate a file.
* A file is specified and opened for writing using the open( file_path) function.
* The code is provided via print( code ) function will be serialized to its buffer.
* When all serialization is finished, use the write() command to write the buffer to the file.
### Scanner
* The purpose is to scan or parse files
* Some with two basic functions to convert a fil to code: `scan_file` and `parse_file`
* `scan_file`: Merely grabs the file and stores it in an untyped Code.
* `parse_file`: Will parse the file using `parse_global_body` and return a `CodeBody`.
* Two basic functions for grabbing columns from a CSV: `parse_csv_one_column` and `parse_csv_two_columns`
+28500
View File
File diff suppressed because it is too large Load Diff