Now its more contexually rich to the ast type, however I need to hookup tokens from parsing to the AST. There needs to be a way for the debug string to lookup the token and provide the contexual line.
Can either pass it ( TokArray* toks ) from the parser on failure (or `CodeFile`)..
Technically there is more than enough room for another Token* ptr. I could add another and specifiers would still have at minimum 14 slots before needing to extended to next specs.
**************... yeah
parse_static_assert now properly adds new-line to end of statement.
I'm going to end up making a static_assert ast... that or when the statement ast is made it will handle adding that newline.
Interface adding has been adjusted to use ParentType->Next.
Using the AST_Class->Next was bad since that breaks the linked list a body AST would have used for traversal.
strip_formatting suffers from some edge failure with what looks to be escaped character literals (not entirely sure).
I've decided to not remove formatting from unvalidated function bodies since I plan to support parsing its content properly.
However expression values for a statement will fail to have their formatting removed with this.
Since I don't plan to parse those anytime soon, I'll have to fix any edge cases for those at least..
Need to make the debug_str provided by the AST type aware to provide as much contextual information as possible (finally got to this point with validation).
Singleheader test now directly calls clang-format to cleanup the reconstructed copy of the singleheader. Its needed to remove any sort of formatting discrepancies found by the parser since its sensistive to that for new-lines, etc.
Wanted to make parser implementation easier to sift through, so I emphasized alphabetical order more.
Since I couldn't just strip whitespace from typenames I decided to make the parse_type more aware of the typename's components if it was a function signature.
This ofc lead to the dark & damp hell that is parsing typenames.
Also made initial implementation to support parsing decltype within a typename signature..
The test failure for the singleheader is still a thing, these changes have not addressed that.
Removed whitespace stripping from parse_type, prepped for doing some major changes for function signature typenames
Moved template argument parsing to its own helper function since its used in more the one spot.
Latest failure is due to stack overflow when validating parameters. (Shouldn't be hard to debug)
Made debug for viewing whitespace in AST::is_equal with String::visualize_whitespace()
Format stripping code is currently confined within parse_define()
I plan to move it to its own function soon, I just want to make sure its finalized first.
Other unvalidated content will need to have an extra check for preprocessed lines.
Example: Function bodies can have a #define <identifier> <definition>. I cannot strip the last <new line> as it will break the semantic importance to distinguish that line.
So it needs to be:
<content before> <new line>
<preprocessed line> <new line>
<content after>
In the content string that is minimally preserved
The NumEntries checks need to be deferred until the end as a final unresolved check on valdiation. As if there really is a discrepancy of entires it should be revealed by the specific entry failing.
Right now the latest failure with the single header check involves a define directive specifically the define does omit whitespace properly and so the check interprets the different cached content to be non-equivalent.
This will happen with all unvalidated aspects of the AST ( expressions, function bodies, etc )
There are two ways to resolve, either make an AST that can tokenize all items (not realistic), or I need to strip non-syntax important whitespace and then cache the string. This would mean removing everything but a single whitespace for all content within a content string. Otherwise, I would have to somehow make sure the content of the string has the exact formatting between both files for the definitions that matter.
AST types with this issue:
* Define Directive
* Pragma Directive
* Comment
* Execution
* Platform Attributes
* Untyped
Comments can technically be left unverified as they do not matter semantically.
When the serialization is first emitted, the content these strings should for the most part be equivalent. However I do see some possible failures for that if a different style of bracket placment is used (between the serialization).
At that point what I could do is just leave those unverified and just emit the content to the user as warning that the ast and the other compared could not be verified.
Those technically can be handled on a per-eye basis, and worst case the tests with the compiler will in the determine if any critical defintions are missing for the user.
Also now doing comment serialization on def_comment directly as parse_comment doesn't need it.
Essentially comment ast types serialize the same way s untyped and execution ASTs
I need to change it so that they all use one big arena allocation for the initial. This can be done when the global allocator is changed to a growable arena.
This isn't the last step though everything in the main project directory that isn't md files needs to be generated only.
Can't do that till testing is robust enough...
Adding the pragma once and includes the files broke compilation, still diagnosing why.
- Some string functions were moved to the cpp, still need to do some more evaluation of it and the containers...
- Added support for forceinline and neverinline to parsing (untested)
- Added support for specifiers in operator cast such as explicit, inline/forceinline/neverinline, etc.
- Before it only support const.
- Still need to support volatile.
- Forceinline was not supported at all for tokenization, fixed that.
Needed so that includes could be added to components so that intellisense would not fail since the parser (for most editors) doesn't properly parse the enviornment and cheats on a per-file basis.
There were multiple issues with comment and newline lexing.
Extended printing functions to support Strings with %S flag (captial 'S').
Allows for length detection. Also made it so that precision for strings is the string length.
Also fix for HashTable<>::rehash_fast not having finished implemenation...
The typedef fix is a sort of hack (like how parsing the rest of the language feels like tbh...).
I might make a def_typedef_fn to make it clearer how to define function typedefs using the upfront interface.
Looking into properly dealing with empty lines...
I want to preserve the text's empty lines in the AST for serialization purposes (perserve formatting for gapes between definitions).
Don't want to introduce the possibility of it breaking though, so will have to ignore empty_lines in a general way (if they are in a bad spot).
Attempted to cover that by having TokArray::current() auto-skip empty lines and eat as well if the type doesn't match.
Isolating large macros to their own directory (components/temp).
- Plan is to remove them soon with proper generation.
Added additional component files, separating the old data_structures header for a set of ast headers.
Header_end also had its inlines extracted out.
Necessary to complete the macro isolation.
ZPL parser dependencies were removed from the core library along with builder, its now generated in bootstrap as pare of making a gen_builder set of files.
Singleheader will be changed in next few commits to reflect this as well (By making builder deps and components a conditional option).
Tests are most likely all broken for now.
Removing the gen. namespace from the files for components, dependencies, and file_processors.
They are only necessary if the include directory is transparent, and in my case those are not.
Made a docs directory. I'm offloading information from the main readme to there along with additional informationn I end up elaborating on down the line.
Enum tables were moved to their own directory (project/enums).
Library will not compile for now. Major refactor occuring with parsing related components.
Both bootstrap and singleheader now name the files the same as the library's default.
Output now directed toward gen directory for the corresponding dir (project, singleheader, or test)
I will most likely need to refactor some of the components & dependencies files to get the desired gneration implementation the way I want.
Specficially I want to be able to eliminate macros I'm using for enums and common patterns for implmeentation of the data structures.
When it comes to the cpp files, I may leave those alone as the macros largely help ith readability.
Replacing those macros is expensive and most likely not worth it.
The macros under consideration with replacing using the library bootstrap are:
* Define_Types
* Define_Operators
* Define_Specifiers
* GEN_Define_Attribute_Tokens
* Define_CodeType
* Using_Code
* Define_TokType
* def_constant_spec ?
* Helper Macros for def_**_body functions ?
* AST unallowed types case macros?
(The last three I'm unsure about as they work fine, and neeeding the debugger steps there is a rare scenario...)
The enums could be manually generated and have its fields derived from a CSV (which is what genc is currently doing).
This would allow the user to specify custom attribute macros as well with greater ease.
I may maually inline ProcessModuleFlags, as its not even necessary as any specific symbol with a module flag will only use the export value. Import is only used on modules themselves (from what I can tell).
The Parser::lex function could be offloaded to its own file in case the user wants to swap the entire thing out.
(Most likely may want to for various purposes)
The problem with extracting any definitions out of a component file currently is that will lead to splintering that componnet to multiple other components.
This is necessary as the proper scanner is not implemented yet (only a reduimentary scan_file proc is made so far).
Interfaces are assumed to have the public access specifier for their content (if its a class definition)
Started to prepare to segement library code into more files (Less scrolling, need for bootstrapping a tailored version + single header support anyway).
- Renamed macro gen_time to GEN_TIME
- Moved scanner and editor to their own headers, I'm going to consider them extensions.
- I'm preparing to setup the library to build on multiple compiler platforms: clang, gcc, msvc.
I'm rewritting it the way I'd like to learn it.
- I want to use csv parsing heavily with the library so I'm just going to add it to the scanner.
- Globaly memory allocator moved to regular gen header/source as its something really just made for the library.
- Some small refactors to macros
- The parser was updated to support tokenizing preprocessor directives.
- The purpose is based off intuition that it will be required for the scanner.
- Added support for anonymous structs.
- Gave Token_Fmt::token_map its own static memory.
- Minor natvis fix for CodeBody
- Renamed ESpecifier::Static_Member to just Static (acts as a general use case) specifier option
- Setup the lex token array with a configurable arena allocator.
Two major things left before V0.3-4:
- Attribute and Module parisng support with base case test
- AST serializtaion strings get a dedicated slag allocator.
- Got rid of dynamic arrays for body entries, were using links only.
- Halfed the size of the ast from 256 to 128 bytes.
- Fields for different ast types are not accessed directly. Each type has a unique filtered AST for ease of use.
Also upated the gencpp.refactor script with almost all relevant symbols.
gen.undef.macros.hpp also filled out
Ready to complete gencpp related todos left in implementation...
Pretty much have a working library now... Albiet with problably quite a few hidden bugs in parsing.
Next steps are to start converting library to use its own Arena, Pool, Array, HashTable types. And either work on zpl dependency gutting or making a more robust set of tests.
- Made a better global allocator for the process.
- Some small fixes to gen.hpp, removed clear_code_memory as I'm designing this library to for now never free any memory.
- Fixes to memory usage for cached strings
- Added missing verification for attributes in some upfront constructors. Added attribute param for def_type procedure.
- Started to use internal and global keywords in gen.cpp for associated definitions
- Progress toward getting the parsing constructors to support operator definitions.
- There was an *attempt* to get parse_type to support parsing function types. Its not tested yet....
- Its not an nice setup, there is no validation of parameters, problably will add that in the future.
- Readonly overcompilcates things for the scope of this project. I'm avoding const correctness to avoid mental overhead.
- Indentation was removed as it still required a formatting pass after, and the only significant thing needed was the newlines.
- Removed some opinionated takes from readme, trying to keep it straight to the point.
- Used def_execution more in array and ring defs (was using untyped_str when could have been using execution...)
token_fmt returns StrC now. Did some fixes so it can properly function now!
Lots of changes to the Array non-parsed implementation.
It shows no errors in the edtitor static analysis, however I haven't tried it yet.
Unfortunately have some friction with how I would like to set it up like with everything in C++...
Can't inherit from ArrayBase to provide the common members to all Array defs, without introducing constructor complexity.
I may be open to using constructors in the future, depends on how I feel when trying out the context pattern used in jai and odin...
Decided not to support the incremental API, its not necessary as the ergonomics are not that big a deal.
Got operators to pass the sanity base cases, which means now all upfront constructors pass the base cases!
Next up is getting it to pass the array container generation.