* ASTs are wrapped for the user in a Code struct which is a wrapper for a AST* type.
* Both AST and Code have member symbols but their data layout is enforced to be POD types.
* This library treats memory failures as fatal.
* Cached Strings are stored in their own set of arenas. AST constructors use cached strings for names, and content.
*`StringArenas`, `StringCache`, `Allocator_StringArena`, and `Allocator_StringTable` are the associated containers or allocators.
* Strings used for serialization and file buffers are not contained by those used for cached strings.
* They are currently using `GlobalAllocator`, which are tracked array of arenas that grows as needed (adds buckets when one runs out).
* Memory within the buckets is not reused, so its inherently wasteful.
* I will be augmenting the single arena with a simple slag allocator.
* Linked lists used children nodes on bodies, and parameters.
* Its intended to generate the AST in one go and serialize after. The constructors and serializer are designed to be a "one pass, front to back" setup.
* Allocations can be tuned by defining the folloiwng macros:
*`GEN_GLOBAL_BUCKET_SIZE` : Size of each bucket area for the global allocator
*`GEN_CODEPOOL_NUM_BLOCKS` : Number of blocks per code pool in the code allocator
*`GEN_SIZE_PER_STRING_ARENA` : Size per arena used with string caching.
*`GEN_MAX_COMMENT_LINE_LENGTH` : Longest length a comment can have per line.
*`GEN_MAX_NAME_LENGTH` : Max length of any identifier.
*`GEN_MAX_UNTYPED_STR_LENGTH` : Max content length for any untyped code.
*`GEN_TOKEN_FMT_TOKEN_MAP_MEM_SIZE` : token_fmt_va uses local_persit memory of this size for the hashtable.
*`GEN_LEX_ALLOCATOR_SIZE`
*`GEN_BUILDER_STR_BUFFER_RESERVE`
The following CodeTypes are used which the user may optionally use strong typing with if they enable: `GEN_ENFORCE_STRONG_CODE_TYPES`
* CodeBody : Has support for `for-range` iterating across Code objects.
* CodeAttributes
* CodeComment
* CodeClass
* CodeConstructor
* CodeDefine
* CodeDestructor
* CodeEnum
* CodeExec
* CodeExtern
* CodeInclude
* CodeFriend
* CodeFn
* CodeModule
* CodeNS
* CodeOperator
* CodeOpCast
* CodeParam : Has support for `for-range` iterating across parameters.
* CodePreprocessCond
* CodePragma
* CodeSpecifiers : Has support for `for-range` iterating across specifiers.
* CodeStruct
* CodeTemplate
* CodeType
* CodeTypedef
* CodeUnion
* CodeUsing
* CodeVar
Each Code boy has an associated "filtered AST" with the naming convention: `AST_<CodeName>`
Unrelated fields of the AST for that node type are omitted and only necessary padding members are defined otherwise.
Retrieving a raw version of the ast can be done using the `raw()` function defined in each AST.
## There are three sets of interfaces for Code AST generation the library provides
* Upfront
* Parsing
* Untyped
### Upfront Construction
All component ASTs must be previously constructed, and provided on creation of the code AST.
The construction will fail and return CodeInvalid otherwise.
Interface :``
* def_attributes
* *This is pre-appended right before the function symbol, or placed after the class or struct keyword for any flavor of attributes used.*
* *Its up to the user to use the desired attribute formatting: `[[]]` (standard), `__declspec` (Microsoft), or `__attribute__` (GNU).*
* def_comment
* def_class
* def_constructor
* def_define
* def_destructor
* def_enum
* def_execution
* *This is equivalent to untyped_str, except that its intended for use only in execution scopes.*
* def_extern_link
* def_friend
* def_function
* def_include
* def_module
* def_namespace
* def_operator
* def_operator_cast
* def_param
* def_params
* def_pragma
* def_preprocess_cond
* def_specifier
* def_specifiers
* def_struct
* def_template
* def_type
* def_typedef
* def_union
* def_using
* def_using_namespace
* def_variable
Bodies:
* def_body
* def_class_body
* def_enum_body
* def_export_body
* def_extern_link_body
* def_function_body
* *Use this for operator bodies as well*
* def_global_body
* def_namespace_body
* def_struct_body
* def_union_body
Usage:
```cpp
<name> = def_<functiontype>( ... );
Code <name>
{
...
<name> = def_<functionname>( ... );
}
```
When using the body functions, its recommended to use the args macro to auto determine the number of arguments for the varadic:
Code ASTs are constructed using unvalidated strings.
Interface :
* token_fmt_va
* token_fmt
* untyped_str
* untyped_fmt
* untyped_token_fmt
During serialization any untyped Code AST has its string value directly injected inline of whatever context the content existed as an entry within.
Even though these are not validated from somewhat correct c/c++ syntax or components, it doesn't mean that Untyped code can be added as any component of a Code AST:
* Untyped code cannot have children, thus there cannot be recursive injection this way.
* Untyped code can only be a child of a parent of body AST, or for values of an assignment (ex: variable assignment).
These restrictions help prevent abuse of untyped code to some extent.
The following are provided predefined by the library as they are commonly used:
*`access_public`
*`access_protected`
*`access_private`
*`attrib_api_export`
*`attrib_api_import`
*`module_global_fragment`
*`module_private_fragment`
*`fmt_newline`
*`param_varaidc` (Used for varadic definitions)
*`pragma_once`
*`preprocess_else`
*`preprocess_endif`
*`spec_const`
*`spec_consteval`
*`spec_constexpr`
*`spec_constinit`
*`spec_extern_linkage` (extern)
*`spec_final`
*`spec_forceinline`
*`spec_global` (global macro)
*`spec_inline`
*`spec_internal_linkage` (internal macro)
*`spec_local_persist` (local_persist macro)
*`spec_mutable`
*`spec_neverinline`
*`spec_override`
*`spec_ptr`
*`spec_pure`
*`spec_ref`
*`spec_register`
*`spec_rvalue`
*`spec_static_member` (static)
*`spec_thread_local`
*`spec_virtual`
*`spec_volatile`
*`t_empty` (Used for varaidc macros)
*`t_auto`
*`t_void`
*`t_int`
*`t_bool`
*`t_char`
*`t_wchar_t`
*`t_class`
*`t_typename`
Optionally the following may be defined if `GEN_DEFINE_LIBRARY_CODE_CONSTANTS` is defined
*`t_b32`
*`t_s8`
*`t_s16`
*`t_s32`
*`t_s64`
*`t_u8`
*`t_u16`
*`t_u32`
*`t_u64`
*`t_sw` (ssize_t)
*`t_uw` (size_t)
*`t_f32`
*`t_f64`
## Extent of operator overload validation
The AST and constructors will be able to validate that the arguments provided for the operator type match the expected form:
* If return type must match a parameter
* If number of parameters is correct
* If added as a member symbol to a class or struct, that operator matches the requirements for the class (types match up)
* There is no support for validating new & delete operations (yet)
The user is responsible for making sure the code types provided are correct
and have the desired specifiers assigned to them beforehand.
## Code generation and modification
There are three provided auxillary interfaces:
* Builder
* Editor
* Scanner
Editor and Scanner are disabled by default, use `GEN_FEATURE_EDITOR` and `GEN_FEATURE_SCANNER` to enable them.
### Builder is a similar object to the jai language's string_builder
* The purpose of it is to generate a file.
* A file is specified and opened for writing using the open( file_path) function.
* The code is provided via print( code ) function will be serialized to its buffer.
* When all serialization is finished, use the write() command to write the buffer to the file.
### Scanner Auxillary Interface
Provides *(eventually)*`scan_file` to automatically populate a CodeFile which contains a parsed AST (`Code`) of the file, with any contextual failures that are reported from the parser.