This commit is contained in:
2025-01-12 10:29:11 -05:00
parent 470800af2a
commit 207fa94803
4 changed files with 105 additions and 86 deletions

View File

@@ -1,58 +1,62 @@
# VE Font Cache
Vertex Engine GPU Font Cache: A text rendering libary.
Vertex Engine GPU Font Cache: A text rendering library.
This started off as a port of the [VEFontCache](https://github.com/hypernewbie/VEFontCache) library to the Odin programming language.
Its original purpose was for use in game engines, however its rendeirng quality and performance is more than adequate for many other applications.
This project started as a port of the [VEFontCache](https://github.com/hypernewbie/VEFontCache) library to the Odin programming language.
While originally intended for game engines, its rendering quality and performance make it suitable for many other applications.
Since then the library has been overhauled to offer higher performance, improved visual fidelity, additional features, and quality of life improvements.
Since then, the library has been overhauled to offer higher performance, improved visual fidelity, additional features, and quality of life improvements.
Features:
* Simple and well documented.
* Load and unload fonts at anytime
* Almost entirely configurabe and tunable at runtime!
* Simple and well documented
* Load and unload fonts at any time
* Almost entirely configurable and tunable at runtime
* Full support for hot-reload
* Clear the caches at any-time!
* Clear the caches at any time
* Robust quality of life features:
* Tracks text layers!
* Push and pop stack for font, font_size, colour, view, position, scale and zoom!
* Enforce even only font-sizing (useful for linear-zoom)
* Snap-positioning to view for better hinting
* Snap positioning to view for better hinting
* Tracks text layers
* Enforce even-only font sizing (useful for linear zoom)
* Push and pop stack for font, font_size, color, view, position, scale, and zoom
* Basic or advanced text shaping via Harfbuzz
* All rendering is real-time, triangulation done on the CPU, vertex rendering and texture blitting on the gpu.
* Can hand thousands of draw text calls with very large or small shapes.
* All rendering is real-time, with triangulation on the CPU, vertex rendering and texture blitting on the GPU
* Can handle thousands of draw text calls with very large or small shapes
* 4-Level Regioned Texture Atlas for caching rendered glyphs
* Text shape caching
* Glyph texture buffer for rendering the text with super-sampling to downsample to the atlas or direct to target screen.
* Glyph texture buffer for rendering text with super-sampling to downsample to the atlas or direct to target screen
* Super-sample by a font size scalar for sharper glyphs
* All caching backed by an optimized 32-bit LRU indexing cache
* Provides a draw list that is backend agnostic (see [backend](./backend) for usage example).
* Provides a backend-agnostic draw list (see [backend](./backend) for usage example)
Upcoming:
* Support for ear-clipping triangulation, or just better triangulation..
* Support for which triangulation method used on a by font basis?
* [paper](https://www.microsoft.com/en-us/research/wp-content/uploads/2005/01/p1000-loop.pdf)
* Multi-threading supported job queue.
* Lift heavy-lifting portion of the library's context into a thread context.
* Synchronize threads by merging their generated layered draw list into a finished draw-list for processing on the user's render thread.
* User defines how thread context's are distributed for drawing (a basic quandrant based selector procedure will be provided.)
* Support choosing between top-left or bottom-left coordinate convention (currently bottom-left)
* Support for better triangulation
* Support for triangulation method selection on a per-font basis
* [Reference paper](https://www.microsoft.com/en-us/research/wp-content/uploads/2005/01/p1000-loop.pdf)
* Better support for tuning glyph render sampling
* Support for sub-pixel AA
* Ability to decide AA method & degree on a per-font basis
* Multi-threading supported job queue
* Lift heavy-lifting portion of the library's context into a thread context
* Synchronize threads by merging their generated layered draw list into a finished draw list for processing on the user's render thread
* User defines how thread contexts are distributed for drawing (a basic quadrant-based selector procedure will be provided)
## Documentation
* [docs/Readme.md](docs/Readme.md) for the library's interface.
* [docs/guide_backend.md](docs/guide_backend.md) for information on whats needed rolling your own backend.
* [docs/guide_architecture.md](docs/guide_architecture.md) for an in-depth breakdown of the significant design decisions, and codepaths.
* [docs/Readme.md](docs/Readme.md) for the library's interface
* [docs/guide_backend.md](docs/guide_backend.md) for information on implementing your own backend
* [docs/guide_architecture.md](docs/guide_architecture.md) for an in-depth breakdown of significant design decisions and code-paths
## Building
See [scripts/Readme.md](scripts/Readme.md) for building examples or utilizing the provided backends.
Currently the scripts provided & the library itself were developed & tested on Windows. There are bash scripts for building on linux (they build on WSL but need additional testing).
Currently, the scripts provided & the library itself were developed & tested on Windows. There are bash scripts for building on Linux (they build on WSL but need additional testing).
The library depends on harfbuzz, & stb_truetype to build.
Note: harfbuzz could technically be gutted if the user removes their definitions, however they have not been made into a conditional compilation option (yet).
The library depends on harfbuzz & stb_truetype to build.
Note: harfbuzz could technically be removed if the user removes their definitions, however this hasn't been made into a conditional compilation option yet.
# Gallery
@@ -62,4 +66,4 @@ https://github.com/user-attachments/assets/db8c7725-84dd-48df-9a3f-65605d3ab444
https://github.com/user-attachments/assets/40030308-37db-492d-a196-f830e8a39f3c
https://github.com/user-attachments/assets/0985246b-74f8-4d1c-82d8-053414c44aec
https://github.com/user-attachments/assets/0985246b-74f8-4d1c-82d8-053414c44aec

View File

@@ -58,7 +58,7 @@ There a total of six procedures, 3 for shapes, 3 for text:
* `draw_shape_normalized_space`
* `draw_shape_view_space`
* `draw_shape`
* `draw_text_normalized_space
* `draw_text_normalized_space`
* `draw_text_view_space`
* `draw_text`

View File

@@ -1,43 +1,42 @@
# Guide: Architecture
Overview on the state of package design and codepath layout.
Overview of the package design and code-path layout.
---
The purpose of this library to really allieviate four issues with one encapsulating package:
The purpose of this library is to alleviate four key challenges with one encapsulating package:
* font parsing
* text codepoint shaping
* glyph shape triangulation
* glyph draw-list generation
* Font parsing
* Text codepoint shaping
* Glyph shape triangulation
* Glyph draw-list generation
Shaping text, getting metrics for the glyphs, triangulating glyphs, and anti-aliasing their render are expensive todo per frame. So anything related to that compute that may be cached, will be.
Shaping text, getting metrics for glyphs, triangulating glyphs, and anti-aliasing their render are expensive operations to perform per frame. Therefore, any compute operations that can be cached, will be.
There are two cache types used:
* shape cache (`Shaped_Text_Cache.state`)
* atlas region cache (`Atlas_Region.state`)
* Shape cache (`Shaped_Text_Cache.state`)
* Atlas region cache (`Atlas_Region.state`)
The shape cache stores all data for a piece of text that will be utilized in a draw call that is not dependent on a specific position & scale (and is faster to lookup vs compute per draw call).
The atlas region cache tracks what slots have glyphs rendered to the texture atlas. This essentially is caching of triangulation and super-sampling compute.
The atlas region cache tracks what slots have glyphs rendered to the texture atlas. This essentially caches triangulation and super-sampling computations.
All caching uses the [LRU.odin](../vefontcache/LRU.odin)
All caching uses [LRU.odin](../vefontcache/LRU.odin)
## Codepaths
## Code Paths
### Lifetime
The library lifetime is pretty straightfoward, you have a startup to do that should just be called sometime in your usual app start.s. From there you may either choose to manually shut it down or let the OS clean it up.
The library lifetime is straightforward: you have a startup procedure that should be called during your usual app initialization. From there you may either choose to manually shut it down or let the OS clean it up.
If hot-reload is desired, you just need to call hot_reload with the context's backing allocator to refresh the procedure references. After the dll has been reloaded those should be the only aspects that have been scrambled.
If hot-reload is desired, you just need to call hot_reload with the context's backing allocator to refresh the procedure references. After the DLL has been reloaded, these should be the only aspects that have been scrambled.
Usually when hot-reloading the library for tuning or major changes, you'd also want to clear the caches. Simply call `clear_atlas_region_caches` & `clear_shape_cache` right after.
Usually when hot-reloading the library for tuning or major changes, you'd also want to clear the caches. So just call the clear_atlas_region_caches` & `clear_shape_cache` right after.
Ideally there should be zero dynamic allocation on a per-frame basis so long as the reserves for the dynamic containers are never exceeded. Its alright if they do as their memory locality is so large their distance in the pages to load into cpu cache won't matter, just needs to be a low incidence.
Ideally, there should be zero dynamic allocation on a per-frame basis as long as the reserves for the dynamic containers are never exceeded. It's acceptable if they do exceed as their memory locality is so large their distance in the pages to load into CPU cache won't matter - it just needs to be a low incidence.
### Shaping Pass
If the user is using the library's cache, then at some point `shaper_shape_text_cached` which handles the hasing and lookup. So long as a shape is found it will not enter uncached codepath. By default this library uses `shaper_shape_harfbuzz` as the `shape_text_uncached` procedure.
If using the library's cache, `shaper_shape_text_cached` handles the hashing and lookup. As long as a shape is found, it will not enter the uncached code path. By default, this library uses `shaper_shape_harfbuzz` as the `shape_text_uncached` procedure.
Shapes are cached using the following parameters to hash a key:
@@ -77,7 +76,7 @@ Shaped_Text :: struct #packed {
}
```
What is actually the result of the shaping process is the arrays of glyphs and their positions for the the shape or most historically known as: *Slug*, of prepared text for printing. The end position of where the user's "cursor" would be is also recorded which provided the end position of the shape. The size of the shape is also resolved here, which if using px_scalar must be downscaled. `measure_shape_size` does the downscaling for the user.
The result of the shaping process is the glyphs and their positions for the the shape; historically resembling whats known as a *Slug* of prepared text for printing. The end position of where the user's "cursor" would be is also recorded which provided the end position of the shape. The size of the shape is also resolved here, which if using px_scalar must be downscaled. `measure_shape_size` does the downscaling for the user.
`visible` tracks which of the glyphs will actually be relevant for the draw_list pass. This is to avoid a conditional jump during the draw list gen pass. When accessing glyph or position during the draw_list gen, they will use visible's relative index.
@@ -89,7 +88,7 @@ As stated under the main heading of this guide, the the following are within sha
* region_kind
* bounds
They're arrays are the same length as `visible`, so indexing those will not need to use visibile's relative index.
These are the same length as the `visible` array, so indexing those will not need to use visibile's relative index.
`shaper_shape_text_latin` does naive shaping by utilizing the codepoint's kern_advance and detecting newlines.
`shaper_shape_harfbuzz` is an actual shaping *engine*. Here is the general idea of how the library utilizes it for shaping:
@@ -98,11 +97,11 @@ They're arrays are the same length as `visible`, so indexing those will not need
2. Determine the line height
3. Go through the codepoints: (for each)
1. Determine the codepoint's script
2. If the script is netural (Uknown, Inherited, or of Common type), or the script has not changed, or this is the first codepoint of the shape we can add the codepoint to the buffer.
3. Otherwise we may have to start a shaping run if we do encounter a significant script change. After, we can add the codepoint to the post-run-cleared hb_buffer.
2. If the script is netural (Uknown, Inherited, or of Common type), the script has not changed, or this is the first codepoint of the shape we can add the codepoint to the buffer.
3. Otherwise we will have to start a shaping run if we do encounter a significant script change. After, we can add the codepoint to the post-run-cleared hb_buffer.
4. This continues until all codepoints have been processed.
4. We do a final shape run after iterating to make sure all codepoints have been processed.
5. Set the size of the shape: x is max line width, y is line height multiplied by the line count.
5. Set the size of the shape: X is max line width, Y is line height multiplied by the line count.
6. Resolve the atlas_lru_code, region_kind, and bounds for all visible glyphs
7. Store the font and px_size information.
@@ -134,15 +133,15 @@ There are other shapers out there:
### Draw List Generation
All interface draw text procedures will ultimately call: `generate_shape_draw_list`. If the draw procedure is given text, it will call `shaper_shape_text_cached` the text immeidately before calling it.
All interface draw text procedures will ultimately call `generate_shape_draw_list`. If the draw procedure is given text, it will call `shaper_shape_text_cached` the text immediately before calling it.
Its implementation uses a batched-pipeline approach where its goal is to populate three arrays behavings as queues:
* oversized: For drawing oversized glyphs
* to_cache: For glyphs that need triangulation/rendering to glyph buffer then blitting to atlas.
* to_cache: For glyphs that need triangulation & rendering to glyph buffer then blitting to atlas.
* cache: For glyphs that are already cached in the atlas and just need to be blit to the render target.
And then sent those off to `batch_generate_glyphs_draw_list` for futher actual generaiton to be done. The size of a batch is determined by the capacity of the glyph_buffer's `batch_cache`. This can be set in `glyph_draw_params` for startup.
And then sent those off to `batch_generate_glyphs_draw_list` for further actual generation to be done. The size of a batch is determined by the capacity of the glyph_buffer's `batch_cache`. This can be set in `glyph_draw_params` for startup.
`glyph_buffer.glyph_pack` is utilized by both `generate_shape_draw_list` and `batch_generate_glyphs_draw_list` to various computed data in an SOA data structure for the glyphs.
@@ -151,12 +150,12 @@ generate_shape_draw_list outline:
1. Prepare glyph_pack, oversized, to_cache, cached, and reset the batch cache
* `glyph_pack` is resized to to the length of `shape.visible`
* The other arrays populated have their reserved set to that length as well (they will not bounds check capacity on append)
2. Iterate though the shape.visible and resolve glyph_pack's positions.
2. Iterate through the shape.visible and resolve glyph_pack's positions.
3. Iterate through shape.visible this time for final region resolution and segregation of glyphs to their appropriate queue.
1. If the glyphs assigned region is `.E` its oversized. The `oversample` used for rendering to render target will either be 2x or 1x depending on how huge it is.
2. The following glyphs are checked to see if their assigned region has the glyph `cached`.
1. If it does, its just appended to cached and marked as seen in the `batch_cache`.
2. If its doesn't then a slot is reseved for within the atlas's region and the glyph is appended to `to_cache`.
2. If its doesn't then a slot is reserved for within the atlas's region and the glyph is appended to `to_cache`.
3. For either case the atlas_region_bbox is computed.
3. After a batch has been resolved, `batch_generate_glyphs_draw_list` is called.
4. If there is an partially filled batch (the usual case), batch_generate_glyphs_draw_list will be called for it.
@@ -171,7 +170,7 @@ The batch is organized into three major stages:
3. blit-from-atlas to render target draw list generation (`to_cache` & `cached`)
Glyph transform & draw quads compute does an iteration for each of the 3 arrays.
Nearly all the math for all three is there *except* for `to_cache`, which does its blitting compute in its glyph_buffer draw-list gen pass.
Nearly all the math for all three is done there *except* for `to_cache`, which does its blitting compute in its glyph_buffer draw-list gen pass.
glyph_buffer draw list generation paths for `oversized` and `to_cache` are unique to each.
@@ -196,9 +195,9 @@ For `to_cached`:
4. free glyph shapes
5. Do blits from atlas to draw list.
`cached` only needsto blit from the atlas to the render target.
`cached` only needs to blit from the atlas to the render target.
`generate_glyph_pass_draw_list`: sets up the draw call for glyph to the glyph buffer. Currently it also handles triangulation as well. For now the shape triangulation is rudimentary and uses triangle fanning. Eventually it would be nice to offer alternatve modes that can be specified on a per-font basis.
`generate_glyph_pass_draw_list`: sets up the draw call for glyph to the glyph buffer. Currently it also handles triangulation as well. For now the shape triangulation is rudimentary and uses triangle fanning. Eventually it would be nice to offer alternative modes that can be specified on a per-font basis.
`flush_glyph_buffer_draw_list`: Will merge the draw_lists contents of the glyph buffer over to the library's general draw_list, the clear the buffer's draw lists.

View File

@@ -1,52 +1,68 @@
# Backend Guide
The end-user needs adapt this library for hookup into their own codebase. As an example they may see the [examples](../examples/) and [backend](../backend/) for working code of what this guide will go over.
The end-user needs to adapt this library to hook into their own codebase. For reference, they can check the [examples](../examples/) and [backend](../backend/) directories for working code that demonstrates what this guide covers.
When rendering text, the two products the user has to deal with: The text to draw and their "layering". Similar to UIs text should be drawn in layer batches, where each layer can represent a pass on some arbitrary set of distictions between the other layers.
When rendering text, users need to handle two main aspects: the text to draw and its "layering". Similar to UIs, text should be drawn in layer batches, where each layer can represent a pass with arbitrary distinctions from other layers.
The following are generally needed:
The following components are required:
* Vertex and Index Buffers for glyph meshes
* Glyph shader for rendering the glyph to the glyph buffer
* Atlas shader for blitting the upscaled glyph quads from the glyph buffer to an atlas region slot downsampled.
* Glyph shader for rendering glyphs to the glyph buffer
* Atlas shader for blitting upscaled glyph quads from the glyph buffer to an atlas region slot (downsampled)
* "Screen or Target" shader for blitting glyph quads from the atlas to a render target or swapchain
* The glyph, atlas, and some "target" image buffers
* The glyph, atlas, and target image buffers
Currently the library doesn't support sub-pixel AA so we're just rendering to R8 images.
Currently, the library doesn't support sub-pixel AA, so we're only rendering to R8 images.
## There are four passes that need to be handled when rendering a draw list
## Rendering Passes
There are four passes that need to be handled when rendering a draw list:
* Glyph: Rendering a glyph mesh to the glyph buffer
* Atlas: Blitting a glyph quad from the glyph buffer to an atlas slot
* Target: Blit from the atlas image to the target image
* Target_Uncached: Blit from the glyph buffer image to the target image
* Target: Blitting from the atlas image to the target image
* Target_Uncached: Blitting from the glyph buffer image to the target image
The Target & Target_Uncached passes can technically be handled in the same case. The user just needs to swap out using the atlas image with the glyph buffer image. This is how the backend_soko.odin's `render_text_layer` has those passes setup.
The Target & Target_Uncached passes can technically be handled in the same case. The user just needs to swap between using the atlas image and the glyph buffer image. This is how the backend_soko.odin's `render_text_layer` has these passes set up.
## The vertex buffer will have the following alyout for all passes
## Vertex Buffer Layout
`[2]f32` for positions
`[2]f32` for texture coords (Offset is naturally `[2]f32`)
With a total stride of `[4]f32`
The vertex buffer has the following layout for all passes:
* `[2]f32` for positions
* `[2]f32` for texture coords (Offset is naturally `[2]f32`)
* Total stride: `[4]f32`
---
The index buffer is just a u32 stream.
The index buffer is a simple u32 stream.
For how a quad mesh is laid out see `blit_quad` in [draw.odin](../vefontcache/draw.odin)
For quad mesh layout details, see `blit_quad` in [draw.odin](../vefontcache/draw.odin).
For how glyph shape triangulation meshes, the library currently only uses a triangle fanning technique so `fill_path_via_fan_triangulation` within [draw.odin](../vefontcache/draw.odin) is where that is being done. Eventually the libary will also support other modes on a per-font basis.
For glyph shape triangulation meshes, the library currently only uses a triangle fanning technique, implemented in `fill_path_via_fan_triangulation` within [draw.odin](../vefontcache/draw.odin). Eventually, the library will support other modes on a per-font basis.
## Keep in mind GLSL vs HLSL UV (texture) coordinate convention
## UV Coordinate Conventions (GLSL vs HLSL)
The UV coordinates used DirectX, Metal, and Vulkan all consider the top-left corner (0, 0), Where the Y axis increases downwards (traditional screenspace). This library follows the convention of (0, 0) being at the bottom-left (Y goes up) which is what OpenGL uses.
DirectX, Metal, and Vulkan consider the top-left corner as (0, 0), where the Y axis increases downward (traditional screenspace). This library follows OpenGL's convention, where (0, 0) is at the bottom-left (Y goes up).
In the shader the UV just has to be adjusted accordingly:
Adjust the UV coordinates in your shader accordingly:
```c
#if ! OpenGL
uv = vec2( v_texture.x, 1.0 - v_texture.y );
#if !OpenGL
uv = vec2(v_texture.x, 1.0 - v_texture.y);
#else
uv = vec2( v_texture.x, v_texture.y );
uv = vec2(v_texture.x, v_texture.y);
#endif
```
Eventually, the library will support both conventions as a comp-time conditional.
## Retrieving & Processing the layer
`get_draw_list_layer` will provide the layer's vertex, index, and draw call slices. Unless the default is overwritten, it will call `optimize_draw_list` before returning the slices (profile to see whats better for your use case).
Once those are retrived, call `flush_draw_list_layer` to update the layer offsets tracked by the library's `Context`.
The vertex and index slices just needed to be appended to your backend's vertex and index buffers.
The draw calls need to be iterated with a switch statement for the aforementioned pass types. Within the case you can construct the enqueue the passes.
---