docs(spec): Add profiling system design spec (Phase 1 Diagnostics + Phase 2 Tracy)

This commit is contained in:
2026-05-15 00:03:57 -04:00
parent 823a10b60d
commit 5fa933728e
@@ -0,0 +1,274 @@
# Performance Profiling System Design Spec
**Date:** 2026-05-15
**Author:** Tier 2 Tech Lead
**Status:** Draft
## Overview
Implement a layered performance profiling system for Manual Slop's `./src` codebase:
- **Phase 1:** Enhanced Diagnostics Panel — sorted component timings, expandable detail rows
- **Phase 2:** Tracy integration via `pytracy` — real-time flamegraph streaming to Tracy GUI
- **Phase 3 (future):** Custom `sys.settrace` sampler + imgui flamegraph as fallback
## Goals
- Surface hot code paths with zero friction (always-on real-time highlighting)
- Enable deep dive profiling on demand via industry-standard tools
- Preserve existing PerformanceMonitor infrastructure
- Self-contained — minimal external dependencies beyond Tracy
## Phase 1: Enhanced Diagnostics Panel
### What's Already There
The Diagnostics Panel (`_render_diagnostics_panel` in `gui_2.py`) already displays:
- FPS, Frame Time, CPU %, Input Lag with live values
- Optional per-metric graphs (toggle checkbox)
- Detailed Component Timings table: Avg, Count, Max, Min per component
- RED highlighting for components with avg > 10ms
- Performance Graphs section with rolling history plots
- Diagnostic Log table
### What's Missing (for D)
1. **No sorting** — components iterate in dict hash order, not worst-first
2. **No expandability** — cannot click a row to see full stats breakdown
### Implementation
**Sort by worst avg:**
```python
sorted_components = sorted(
[(k, v) for k, v in metrics.items() if k.startswith("time_") and k.endswith("_ms")],
key=lambda x: metrics.get(f"{x[0]}_avg", x[1]),
reverse=True
)
```
**Expandable rows:**
```python
for key, val in sorted_components:
# Render collapsed row with summary
expanded = imgui.tree_node(comp_name)
if expanded:
# Show: last_value, avg, count, max, min, peak frames, stddev
imgui.text(f"Last: {val:.2f}ms")
imgui.text(f"StdDev: {stddev:.2f}ms")
imgui.text(f"Peak frame: {peak_val:.2f}ms at frame {peak_frame}")
imgui.tree_pop()
```
### Files Affected
| File | Change |
|------|--------|
| `src/gui_2.py` | `_render_diagnostics_panel` — sort + expandable rows |
### Success Criteria
1. Component timings table sorted by worst avg descending
2. Click a row → expands to show last, stddev, peak info
3. Existing red highlighting (>10ms) preserved
4. No regression to other diagnostics panel features
## Phase 2: Tracy Integration
### Tracy Overview
Tracy is a real-time, nanosecond-resolution, frame-based profiler for game devs and high-performance applications. It streams profiling data to a dedicated GUI client over a TCP connection. Features:
- Live CPU profiling with call stacks
- Memory profiling (allocations, leaks)
- Lock contention visualization
- Frame capture (good for your Dear PyGui render loop)
- Very low overhead (~1-2%)
### pytracy Binding
`pytracy` is a Python binding on PyPI:
```bash
uv add pytracy
```
Basic usage:
```python
import pytracy
pytracy.setproctitle("manual_slop")
# Zone annotations (instrumentation)
def long_running_function():
pytracy.begin("my_zone")
# ... work ...
pytracy.end("my_zone")
```
### Integration Points in Manual Slop
**1. Process naming:**
```python
# In App.__init__ or gui_2.py:
pytracy.setproctitle("manual_slop")
```
**2. Zone instrumentation for render components:**
```python
# In each _render_* method:
with pytracy.ctx_zone(name="_render_discussion_panel"):
# ... render logic ...
```
**3. Memory tracking (optional):**
```python
pytracy.allocator_hook_enable()
```
**4. Connection handling:**
Tracy GUI must be running and listening before manual_slop starts. The app connects to `localhost:8086` by default (configurable).
### Tracy GUI
- Tracy has its own cross-platform UI (Windows/macOS/Linux)
- Download from https://github.com/wolfpld/tracy/releases or build from source
- Once connected, you get live flamegraphs, frame time charts, memory graphs
- Can save trace files for later analysis
### Graceful Degradation
If Tracy is not running or `pytracy` import fails:
- App continues normally — profiling is opt-in
- Existing PerformanceMonitor keeps working
- Log a warning on startup: "Tracy not connected — profiling unavailable"
### Implementation
```python
# src/profiling/tracy_integration.py (new file)
from __future__ import annotations
import sys
from typing import Optional
_tracy_available = False
_tracy = None
def init_tracy() -> bool:
"""Try to initialize pytracy connection. Returns True on success."""
global _tracy_available, _tracy
try:
import pytracy
_tracy = pytracy
pytracy.setproctitle("manual_slop")
_tracy_available = True
return True
except Exception:
_tracy_available = False
return False
def is_tracy_available() -> bool:
return _tracy_available
class TracyZone:
"""Context manager for Tracy zones."""
def __init__(self, name: str) -> None:
self.name = name
self.active = False
def __enter__(self):
if _tracy_available and _tracy:
_tracy.enter(self.name)
self.active = True
return self
def __exit__(self, *args):
if self.active:
_tracy.leave(self.name)
return False
# Convenience decorator
def tracy_zone(name: str):
"""Decorator to wrap a function in a Tracy zone."""
def decorator(func):
def wrapper(*args, **kwargs):
with TracyZone(name):
return func(*args, **kwargs)
return wrapper
return decorator
```
### GUI Button for Tracy Status
Add to Diagnostics Panel:
```python
if imgui.button("Open Tracy GUI"):
import subprocess
subprocess.Popen(["tracy"]) # Or path to Tracy executable
imgui.same_line()
imgui.text(f"Tracy: {'Connected' if tracy_available else 'Not connected'}")
```
### Files Affected
| File | Change |
|------|--------|
| `src/profiling/tracy_integration.py` | New — Tracy integration module |
| `src/gui_2.py` | Add Tracy zone wrappers around render methods, button in Diagnostics |
| `src/app_controller.py` | Optionally add Tracy init in startup |
### Success Criteria
1. `pytracy` is a declared dependency in pyproject.toml
2. Tracy zones wrap every `_render_*` component in `gui_2.py`
3. App starts without error if Tracy GUI is not running (graceful degradation)
4. When Tracy GUI is running, live flamegraph appears for running app
5. "Open Tracy GUI" button launches Tracy if installed
## Phase 3 (Future): Custom Sampler Fallback
Out of scope for initial spec. Would implement `sys.settrace` based sampler if:
- Tracy is unavailable/unwanted
- User wants self-contained flamegraph rendered in imgui directly
## Architecture
```
Diagnostics Panel (gui_2.py)
|
├── PerformanceMonitor (component timings, always-on)
| └── O(1) rolling averages, per-component ms tracking
|
└── Tracy Integration (profiling, on-demand)
└── pytracy → Tracy GUI (live flamegraph + memory + locks)
```
Both run independently. PerformanceMonitor gives per-frame glanceable data. Tracy gives deep dive on demand.
## Dependencies
| Dependency | Purpose | Notes |
|------------|---------|-------|
| `pytracy` | Tracy Python binding | Phase 2 only, graceful degradation if unavailable |
## Files
| File | Action |
|------|--------|
| `src/profiling/tracy_integration.py` | Create — Tracy wrapper with graceful degradation |
| `src/profiling/__init__.py` | Create — Package init |
| `src/gui_2.py` | Modify — Sort + expand Diagnostics, wrap render zones |
| `src/app_controller.py` | Optional — Tracy init in startup |
| `pyproject.toml` | Modify — Add `pytracy` dependency |
## Open Questions
1. **Tracy connection params** — default `localhost:8086`, should be configurable via `config.toml`?
2. **Which render methods to zone** — All `_render_*` or only top-level ones?
3. **Memory profiling** — Enable allocator hook by default or opt-in only?
## Success Criteria Summary
- [ ] Phase 1: Diagnostics panel sorts by worst avg, rows expandable
- [ ] Phase 2: pytracy integrated with graceful degradation
- [ ] Phase 2: Every `_render_*` method in gui_2.py has Tracy zone
- [ ] Phase 2: Tracy GUI shows live flamegraph when connected
- [ ] Phase 3: Future extension point clear for custom sampler