From SwiGLU Backward to INT8 Quantization: Notes from a KernelGen Challenge 9 Win
Published:
A reconstruction of our KernelGen Challenge 9 optimization: starting from the SwiGLU backward formula, then walking through 128x128 tiling, fused INT8 quantization, memory bandwidth limits, and per-backend Triton engineering.
