Updated docs

This commit is contained in:
Edward R. Gonzalez 2023-08-08 11:56:42 -04:00
parent d2fc1d0a56
commit c7647ab00f
8 changed files with 109 additions and 111 deletions

View File

@ -6,12 +6,11 @@ The library API is a composition of code element constructors.
These build up a code AST to then serialize with a file builder.
This code base attempts follow the [handmade philosophy](https://handmade.network/manifesto),
its not meant to be a black box metaprogramming utility, its meant for the user to extend for their project domain.
its not meant to be a black box metaprogramming utility, it should be easy to intergrate into a user's their project domain.
## Notes
The project has reached an *alpha* state, all the current functionality works for the test cases but it will most likely break in many other cases.
The [issues](https://github.com/Ed94/gencpp/issues) marked with v1.0 Feature indicate whats left before the library is considered feature complete.
A `natvis` and `natstepfilter` are provided in the scripts directory.
@ -25,7 +24,7 @@ A metaprogram is built to generate files before the main program is built. We'll
`gen.cpp` \`s `main()` is defined as `gen_main()` which the user will have to define once for their program. There they will dictate everything that should be generated.
In order to keep the locality of this code within the same files the following pattern may be used:
In order to keep the locality of this code within the same files the following pattern may be used (although this pattern isn't required at all):
Within `program.cpp` :
@ -41,6 +40,8 @@ u32 gen_main()
}
#endif
// "Stage" agnostic code.
#ifndef GEN_TIME
#include "program.gen.cpp"
@ -56,6 +57,7 @@ Example using each construction interface:
### Upfront
Validation and construction through a functional interface.
```cpp
Code t_uw = def_type( name(uw) );
@ -75,6 +77,8 @@ Code header;
### Parse
Validation through ast construction.
```cpp
Code header = parse_struct( code(
struct ArrayHeader
@ -89,6 +93,8 @@ Code header = parse_struct( code(
### Untyped
No validation, just glorified text injection.
```cpp
Code header = code_str(
struct ArrayHeader

View File

@ -1,34 +1,36 @@
# Parsing
The library features a naive parser tailored for only what the library needs to construct the supported syntax of C++ into its AST.
This parser does not, and should not do the compiler's job. By only supporting this minimal set of features, the parser is kept under 5000 loc.
This parser does not, and should not do the compiler's job. By only supporting this minimal set of features, the parser is kept (so far) under 5000 loc.
The parsing implementation supports the following for the user:
```cpp
CodeClass parse_class ( StrC class_def );
CodeEnum parse_enum ( StrC enum_def );
CodeBody parse_export_body ( StrC export_def );
CodeExtern parse_extern_link ( StrC exten_link_def);
CodeFriend parse_friend ( StrC friend_def );
CodeFn parse_function ( StrC fn_def );
CodeBody parse_global_body ( StrC body_def );
CodeNS parse_namespace ( StrC namespace_def );
CodeOperator parse_operator ( StrC operator_def );
CodeOpCast parse_operator_cast( StrC operator_def );
CodeStruct parse_struct ( StrC struct_def );
CodeTemplate parse_template ( StrC template_def );
CodeType parse_type ( StrC type_def );
CodeTypedef parse_typedef ( StrC typedef_def );
CodeUnion parse_union ( StrC union_def );
CodeUsing parse_using ( StrC using_def );
CodeVar parse_variable ( StrC var_def );
CodeClass parse_class ( StrC class_def );
CodeConstructor parse_constructor ( StrC constructor_def );
CodeDestructor parse_destructor ( StrC destructor_def );
CodeEnum parse_enum ( StrC enum_def );
CodeBody parse_export_body ( StrC export_def );
CodeExtern parse_extern_link ( StrC exten_link_def );
CodeFriend parse_friend ( StrC friend_def );
CodeFn parse_function ( StrC fn_def );
CodeBody parse_global_body ( StrC body_def );
CodeNS parse_namespace ( StrC namespace_def );
CodeOperator parse_operator ( StrC operator_def );
CodeOpCast parse_operator_cast( StrC operator_def );
CodeStruct parse_struct ( StrC struct_def );
CodeTemplate parse_template ( StrC template_def );
CodeType parse_type ( StrC type_def );
CodeTypedef parse_typedef ( StrC typedef_def );
CodeUnion parse_union ( StrC union_def );
CodeUsing parse_using ( StrC using_def );
CodeVar parse_variable ( StrC var_def );
```
***Parsing will aggregate any tokens within a function body or expression statement to an untyped Code AST.***
Everything is done in one pass for both the preprocessor directives and the rest of the language.
The parser performs no macro expansion as the scope of gencpp feature-set is to only support the preprocessor for the goal of having rudimentary awareness of preprocessor ***conditionals***, ***defines***, and ***includes***, and ***`pragmas`**.
The parser performs no macro expansion as the scope of gencpp feature-set is to only support the preprocessor for the goal of having rudimentary awareness of preprocessor ***conditionals***, ***defines***, and ***includes***, and ***pragmas**.
The keywords supported for the preprocessor are:
@ -42,11 +44,29 @@ The keywords supported for the preprocessor are:
* pragma
Each directive `#` line is considered one preproecessor unit, and will be treated as one Preprocessor AST. *These ASTs will be considered members or entries of braced scope they reside within*.
All keywords except *include* are suppported as members of a scope for a class/struct, global, or namespace body.
If a directive is used with an unsupported keyword its will be processed as an untyped AST.
The preprocessor lines are stored as members of their associated scope they are parsed within. ( Global, Namespace, Class/Struct )
Any preprocessor definition abuse that changes the syntax of the core language is unsupported and will fail to parse if not kept within an execution scope (function body, or expression assignment).
Exceptions:
Exceptions to the above rule (If its too hard to keep track of just follow the above notion):
* function signatures are allowed for a preprocessed macro: `neverinline MACRO() { ... }`
* typedefs allow for a preprocessed macro: `typedef MACRO();`
* Typedefs allow of a macro exansion to be defined after the keyword; Ex: `typedef GEN_FILE_OPEN_PROC( file_open_proc );`
*(See functions `parse_operator_function_or_variable` and `parse_typedef` )*
The lexing and parsing takes shortcuts from whats expected in the standard.
* Numeric literals are not checked for validity.
* The parse API treats any execution scope definitions with no validation and are turned into untyped Code ASTs.
* *This includes the assignment of variables.*
* Attributes ( `[[]]` (standard), `__declspec` (Microsoft), or `__attribute__` (GNU) )
* Assumed to *come before specifiers* (`const`, `constexpr`, `extern`, `static`, etc) for a function
* Or in the usual spot for class, structs, (*right after the declaration keyword*)
* typedefs have attributes with the type (`parse_type`)
* As a general rule; if its not available from the upfront constructors, its not available in the parsing constructors.
* *Upfront constructors are not necessarily used in the parsing constructors, this is just a good metric to know what can be parsed.*
* Parsing attributes can be extended to support user defined macros by defining `GEN_DEFINE_ATTRIBUTE_TOKENS` (see `gen.hpp` for the formatting)
Empty lines used throughout the file are preserved for formatting purposes for ast serialization.

View File

@ -40,12 +40,6 @@ Otherwise the library is free of any templates.
### *WHAT IS NOT PROVIDED*
* Execution statement validation : Execution expressions are defined using the untyped AST.
* Lambdas (This naturally means its unsupported)
* Non-trivial template validation support.
* RAII : This needs support for constructors/destructor parsing
* Haven't gotten around to yet (its in the github issues)
Keywords kept from "Modern C++":
* constexpr : Great to store compile-time constants.
@ -55,13 +49,9 @@ Keywords kept from "Modern C++":
* import : ^^
* module : ^^
When it comes to expressions:
**There is no support for validating expressions.**
Its difficult to parse without enough benefits (At the metaprogramming level).
When it comes to templates:
**Only trivial template support is provided.**
The intention is for only simple, non-recursive substitution.
The parameters of the template are treated like regular parameter AST entries.
@ -78,7 +68,7 @@ Use at your own mental peril.
### The Data & Interface
As mentioned in [Usage](#usage), the user is provided Code objects by calling the constructor's functions to generate them or find existing matches.
As mentioned in root readme, the user is provided Code objects by calling the constructor's functions to generate them or find existing matches.
The AST is managed by the library and provided the user via its interface.
However, the user may specifiy memory configuration.
@ -89,39 +79,44 @@ Data layout of AST struct:
union {
struct
{
AST* Attributes; // Class, Enum, Function, Struct, Typedef, Union, Using, Variable
AST* Specs; // Function, Operator, Type symbol, Variable
AST* Attributes; // Class, Enum, Function, Struct, Typedef, Union, Using, Variable
AST* Specs; // Function, Operator, Type symbol, Variable
union {
AST* ParentType; // Class, Struct
AST* ReturnType; // Function, Operator
AST* UnderlyingType; // Enum, Typedef
AST* ValueType; // Parameter, Variable
AST* InitializerList; // Constructor, Destructor
AST* ParentType; // Class, Struct
AST* ReturnType; // Function, Operator
AST* UnderlyingType; // Enum, Typedef
AST* ValueType; // Parameter, Variable
};
AST* Params; // Function, Operator, Template
union {
AST* ArrExpr; // Type Symbol
AST* Body; // Class, Enum, Function, Namespace, Struct, Union
AST* Declaration; // Friend, Template
AST* Value; // Parameter, Variable
AST* BitfieldSize; // Varaiable (Class/Struct Data Member)
AST* Params; // Function, Operator, Template
};
union {
AST* ArrExpr; // Type Symbol
AST* Body; // Class, Constructr, Destructor, Enum, Function, Namespace, Struct, Union
AST* Declaration; // Friend, Template
AST* Value; // Parameter, Variable
};
};
StringCached Content; // Attributes, Comment, Execution, Include
StringCached Content; // Attributes, Comment, Execution, Include
SpecifierT ArrSpecs[AST::ArrSpecs_Cap]; // Specifiers
};
union {
AST* Prev;
AST* Front; // Used by CodeBody
AST* Last; // Used by CodeParam
AST* Front;
AST* Last;
};
union {
AST* Next;
AST* Back; // Used by CodeBody
AST* Back;
};
AST* Parent;
StringCached Name;
CodeT Type;
ModuleFlag ModuleFlags;
union {
b32 IsFunction; // Used by typedef to not serialize the name field.
OperatorT Op;
AccessSpec ParentAccess;
s32 NumEntries;
@ -145,7 +140,7 @@ uw ArrSpecs_Cap =
- sizeof(StringCached)
- sizeof(CodeT)
- sizeof(ModuleFlag)
- sizeof(s32)
- sizeof(u32)
)
/ sizeof(SpecifierT) -1; // -1 for 4 extra bytes (Odd num of AST*)
```
@ -155,7 +150,7 @@ uw ArrSpecs_Cap =
Data Notes:
* The allocator definitions used are exposed to the user incase they want to dictate memory usage
* You'll find the memory handling in `init`, `gen_string_allocator`, `get_cached_string`, `make_code`.
* You'll find the memory handling in `init`, `deinit`, `reset`, `gen_string_allocator`, `get_cached_string`, `make_code`.
* ASTs are wrapped for the user in a Code struct which is a wrapper for a AST* type.
* Both AST and Code have member symbols but their data layout is enforced to be POD types.
* This library treats memory failures as fatal.
@ -168,15 +163,15 @@ Data Notes:
* Linked lists used children nodes on bodies, and parameters.
* Its intended to generate the AST in one go and serialize after. The constructors and serializer are designed to be a "one pass, front to back" setup.
* Allocations can be tuned by defining the folloiwng macros:
* `GEN_BUILDER_STR_BUFFER_RESERVE`
* `GEN_CODEPOOL_NUM_BLOCKS` : Number of blocks per code pool in the code allocator
* `GEN_GLOBAL_BUCKET_SIZE` : Size of each bucket area for the global allocator
* `GEN_LEX_ALLOCATOR_SIZE`
* `GEN_CODEPOOL_NUM_BLOCKS` : Number of blocks per code pool in the code allocator
* `GEN_SIZE_PER_STRING_ARENA` : Size per arena used with string caching.
* `GEN_MAX_COMMENT_LINE_LENGTH` : Longest length a comment can have per line.
* `GEN_MAX_NAME_LENGTH` : Max length of any identifier.
* `GEN_MAX_UNTYPED_STR_LENGTH` : Max content length for any untyped code.
* `GEN_SIZE_PER_STRING_ARENA` : Size per arena used with string caching.
* `GEN_TOKEN_FMT_TOKEN_MAP_MEM_SIZE` : token_fmt_va uses local_persit memory of this size for the hashtable.
* `GEN_LEX_ALLOCATOR_SIZE`
* `GEN_BUILDER_STR_BUFFER_RESERVE`
The following CodeTypes are used which the user may optionally use strong typing with if they enable: `GEN_ENFORCE_STRONG_CODE_TYPES`
@ -187,7 +182,6 @@ The following CodeTypes are used which the user may optionally use strong typing
* CodeConstructor
* CodeDefine
* CodeDestructor
* CodePreprocessCond
* CodeEnum
* CodeExec
* CodeExtern
@ -199,6 +193,8 @@ The following CodeTypes are used which the user may optionally use strong typing
* CodeOperator
* CodeOpCast
* CodeParam : Has support for `for-range` iterating across parameters.
* CodePreprocessCond
* CodePragma
* CodeSpecifiers : Has support for `for-range` iterating across specifiers.
* CodeStruct
* CodeTemplate
@ -221,7 +217,7 @@ Retrieving a raw version of the ast can be done using the `raw()` function defin
### Upfront Construction
All component ASTs must be previously constructed, and provided on creation of the code AST.
The construction will fail and return Code::Invalid otherwise.
The construction will fail and return CodeInvalid otherwise.
Interface :``
@ -231,6 +227,7 @@ Interface :``
* def_comment
* def_class
* def_constructor
* def_define
* def_destructor
* def_enum
* def_execution
@ -245,6 +242,7 @@ Interface :``
* def_operator_cast
* def_param
* def_params
* def_preprocess_cond
* def_specifier
* def_specifiers
* def_struct
@ -321,19 +319,6 @@ Interface :
* parse_using
* parse_variable
The lexing and parsing takes shortcuts from whats expected in the standard.
* Numeric literals are not check for validity.
* The parse API treats any execution scope definitions with no validation and are turned into untyped Code ASTs.
* *This includes the assignment of variables.*
* Attributes ( `[[]]` (standard), `__declspec` (Microsoft), or `__attribute__` (GNU) )
* Assumed to *come before specifiers* (`const`, `constexpr`, `extern`, `static`, etc) for a function
* Or in the usual spot for class, structs, (*right after the declaration keyword*)
* typedefs have attributes with the type (`parse_type`)
* As a general rule; if its not available from the upfront constructors, its not available in the parsing constructors.
* *Upfront constructors are not necessarily used in the parsing constructors, this is just a good metric to know what can be parsed.*
* Parsing attributes can be extended to support user defined macros by defining `GEN_DEFINE_ATTRIBUTE_TOKENS` (see `gen.hpp` for the formatting)
Usage:
```cpp
@ -342,13 +327,6 @@ Code <name> = parse_<function name>( string with code );
Code <name> = def_<function name>( ..., parse_<function name>(
<string with code>
));
Code <name> = make_<function name>( ... )
{
<name>->add( parse_<function name>(
<string with code>
));
}
```
### Untyped constructions
@ -408,12 +386,15 @@ The following are provided predefined by the library as they are commonly used:
* `access_public`
* `access_protected`
* `access_private`
* `attrib_api_export`
* `attrib_api_import`
* `module_global_fragment`
* `module_private_fragment`
* `fmt_newline`
* `pragma_once`
* `param_varaidc` (Used for varadic definitions)
* `preprocess_else`
* `preprocess_endif`
* `pragma_once`
* `spec_const`
* `spec_consteval`
* `spec_constexpr`
@ -425,8 +406,10 @@ The following are provided predefined by the library as they are commonly used:
* `spec_internal_linkage` (internal macro)
* `spec_local_persist` (local_persist macro)
* `spec_mutable`
* `spec_neverinline`
* `spec_override`
* `spec_ptr`
* `spec_pure`
* `spec_ref`
* `spec_register`
* `spec_rvalue`
@ -434,10 +417,6 @@ The following are provided predefined by the library as they are commonly used:
* `spec_thread_local`
* `spec_virtual`
* `spec_volatile`
* `spec_type_signed`
* `spec_type_unsigned`
* `spec_type_short`
* `spec_type_long`
* `t_empty` (Used for varaidc macros)
* `t_auto`
* `t_void`

View File

View File

@ -1,7 +1,6 @@
# Documentation
The library is fragmented into a series of headers and sources files meant to be scanned in and then generated to a tailored format for the target
`gen` files.
The library is fragmented into a series of headers and source files meant to be scanned in and then generated to a tailored format for the target `gen` files.
The principal (user) files are `gen.hpp` and `gen.cpp`.
They contain includes for its various components: `components/<component_name>.<hpp/cpp>`
@ -14,7 +13,6 @@ They directly include `depedencies/file_handling.<hpp/cpp>` as the core library
**TODO : Right now the library is not finished, as such the first self-hosting iteration is still WIP**
Both libraries use *pre-generated* (self-hosting I guess) version of the library to then generate the latest version of itself.
(sort of a verification that the generated version is equivalent).
The default `gen.bootstrap.cpp` located in the project folder is meant to be produce a standard segmented library, where the components of the library
have relatively dedicated header and source files. Dependencies included at the top of the file and each header starting with a pragma once.
@ -28,7 +26,6 @@ Feature Macros:
* This is auto-generated if using the bootstrap or single-header generation
* *Note: The user will use the `AttributeTokens.csv` when the library is fully self-hosting.*
* `GEN_DEFINE_LIBRARY_CORE_CONSTANTS` : Optional typename codes as they are non-standard to C/C++ and not necessary to library usage
* `GEN_DONT_USE_NAMESPACE` : By default, the library is wrapped in a `gen` namespace, this will disable that expose it to the global scope.
* `GEN_DONT_ENFORCE_GEN_TIME_GUARD` : By default, the library ( gen.hpp/ gen.cpp ) expects the macro `GEN_TIME` to be defined, this disables that.
* `GEN_ENFORCE_STRONG_CODE_TYPES` : Enforces casts to filtered code types.
* `GEN_EXPOSE_BACKEND` : Will expose symbols meant for internal use only.
@ -36,13 +33,13 @@ Feature Macros:
## On multi-threading
Currently unsupported. The following changes would have to be made:
Currently unsupported.
## Extending the library
This library is relatively very small, and can be extended without much hassle.
The convention you'll see used throughout the API of the library is as follows:
The convention you'll see used throughout the interface of the library is as follows:
1. Check name or parameters to make sure they are valid for the construction requested
2. Create a code object using `make_code`.
@ -55,3 +52,4 @@ Names or Content fields are interned strings and thus showed be cached using `ge
The library has its code segmented into component files, use it to help create a derived version without needing to have to rewrite a generated file directly or build on top of the header via composition or inheritance.
When the scanner is implemented, this will be even easier to customize.

View File

@ -212,7 +212,7 @@ struct AST
- sizeof(StringCached)
- sizeof(CodeT)
- sizeof(ModuleFlag)
- sizeof(s32)
- sizeof(u32)
)
/ sizeof(SpecifierT) -1; // -1 for 4 extra bytes

View File

@ -16,7 +16,7 @@ BraceSquare_Close, "]"
Capture_Start, "("
Capture_End, ")"
Comment, "__comemnt__"
Comment_End, "__comment_end__"
Comment_End, "__comment_end__"
Comment_Start, "__comment_start__"
Char, "__character__"
Comma, ","
@ -43,9 +43,9 @@ Operator, "__operator__"
Preprocess_Hash, "#"
Preprocess_Define, "define"
Preprocess_If, "if"
Preprocess_IfDef, "ifdef"
Preprocess_IfNotDef, "ifndef"
Preprocess_ElIf, "elif"
Preprocess_IfDef, "ifdef"
Preprocess_IfNotDef, "ifndef"
Preprocess_ElIf, "elif"
Preprocess_Else, "else"
Preprocess_EndIf, "endif"
Preprocess_Include, "include"
@ -71,7 +71,7 @@ Spec_Override, "override"
Spec_Static, "static"
Spec_ThreadLocal, "thread_local"
Spec_Volatile, "volatile"
Spec_Virtual, "virtual"
Spec_Virtual, "virtual"
Star, "*"
Statement_End, ";"
StaticAssert, "static_assert"
@ -83,10 +83,10 @@ Type_Long, "long"
Type_char, "char"
Type_int, "int"
Type_double, "double"
Type_MS_int8, "__int8"
Type_MS_int16, "__int16"
Type_MS_int32, "__int32"
Type_MS_int64, "__int64"
Type_MS_W64, "_W64"
Type_MS_int8, "__int8"
Type_MS_int16, "__int16"
Type_MS_int32, "__int32"
Type_MS_int64, "__int64"
Type_MS_W64, "_W64"
Varadic_Argument, "..."
__Attributes_Start, "__attrib_start__"

1 Invalid __invalid__
16 Capture_Start (
17 Capture_End )
18 Comment __comemnt__
19 Comment_End __comment_end__
20 Comment_Start __comment_start__
21 Char __character__
22 Comma ,
43 Preprocess_Hash #
44 Preprocess_Define define
45 Preprocess_If if
46 Preprocess_IfDef ifdef
47 Preprocess_IfNotDef ifndef
48 Preprocess_ElIf elif
49 Preprocess_Else else
50 Preprocess_EndIf endif
51 Preprocess_Include include
71 Spec_Static static
72 Spec_ThreadLocal thread_local
73 Spec_Volatile volatile
74 Spec_Virtual virtual
75 Star *
76 Statement_End ;
77 StaticAssert static_assert
83 Type_char char
84 Type_int int
85 Type_double double
86 Type_MS_int8 __int8
87 Type_MS_int16 __int16
88 Type_MS_int32 __int32
89 Type_MS_int64 __int64
90 Type_MS_W64 _W64
91 Varadic_Argument ...
92 __Attributes_Start __attrib_start__

View File

@ -1,11 +1,6 @@
# Test
The following tests focus on attempting to generate some math, containers, and the memory module of zpl.
The implementaiton here is not well organized and needs a rewrite..
Not all the files are written how I would practically use the library, the containers for example would
be better on in c++ as templates, since the templates they generate are trivial symbols to inspect or debug.
An example of a non-trivial generation is a container for elements with SOA or AOS policy for layout.
(If a unified element syntax is desired)
The test is divided between two major sets of tests: Parsed and Upfront.
I only do basic sanity and parsing tests for the most part.
The library is getting practical usage tests in [genc](https://github.com/Ed94/genc) and other projects.