Pretraining Large Language Models with NVFP4
and 80 more authors
Evan Briones, Ian Buck, Bryan Catanzaro, Muya Chang, Jinhang Choi, Mike Chrzanowski, Eric Chung, Victor Cui, Steve Dai, B. Rouhani, Carlo del Mundo, Deena Donia, Burc Eryilmaz, Henry Estela, Abhinav Goel, O. Goncharov, Yugi Guvvala, Robert Hesse, Russell J. Hewett, Herbert Hum, U. Kapasi, Brucek Khailany, Mikail Khona, Nick Knight, Alex Kondratenko, Ronny Krashinsky, Ben Lanir, Simon Layton, M. Lightstone, D. Lo, P. Micikevicius, Asit Mishra, Tim Moon, Deepak Narayanan, Chao Ni, Abhijit Paithankar, Satish Pasumarthi, Ankit B. Patel, M. Patwary, A. Poojary, G. Prasad, Sweta Priyadarshi, Yigong Qin, Xiao-Shuai Ren, O. Rybakov, Charbel Sakr, S. Satheesh, Stas Sergienko, Pasha Shamis, Kirthi Shankar, Nishant Sharma, M. Shoeybi, Michael Y. Siu, Misha Smelyanskiy, Darko Stosic, Dusan Stosic, Bor-Yiing Su, Frank Sun, Nima Tajbakhsh, S. Thomas, Przemek Tredak, Evgeny Tsykunov, Gandhimathi Vaithilingam, Aditya Vavre, Rangharajan Venkatesan, R. Waleffe, Qiyu Wan, Hexin Wang, Mengdi Wang, Lizzie Wei, Hao Wu, Evan Wu, Keith Wyss, Ning Xu, Jinze Xue, Charlene Yang, Yujia Zhai, Ruoxi Zhang, Jingyang Zhu, Zhongbo Zhu
Journal: ArXiv