DeepNet: Scaling Transformers to 1,000 Layers

Authors: Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Furu Wei

Published: 2024-04-10

DOI: 10.1109/tpami.2024.3386927

Source: Full article


Abstract

No abstract found.