Fix buffer allocation size calculation and add proper bounds checking to
ensure output buffer has sufficient space. This fixes crashes that could
occur with inputs like "AA" and other edge cases where the output buffer
was too small.
Remove #no_bounds_check as proper bounds checking is necessary for safe
error handling. The small performance trade-off is worth the improved
robustness.
Replace assertions with proper error handling in base32.decode() to allow
programs to handle invalid input gracefully rather than crashing.
The function now returns ([]byte, Error) instead of just []byte.