Official codebase for the ICML 2025 paper ELMO: Efficiency via Low-precision and Peak Memory Optimization in Large Output Spaces. This repository implements end-to-end float8 (FP8) training for the ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results