Any plans to use RMSNorm (or FlashNorm) instead of LayerNorm?
1
#12 opened about 2 months ago
by
graefics
lack of digit splitting in slow version of tokenizer
#11 opened 5 months ago
by
Forence
Adding Evaluation Results
#10 opened 8 months ago
by
leaderboard-pr-bot
Big difference between the before-cooldown-ckpt and the final checkpoint in the results of downstream tasks?
1
#9 opened 8 months ago
by
siqi-zz
Adding Evaluation Results
#8 opened 8 months ago
by
leaderboard-pr-bot
Will there be a version with traditional Chinese in the future?
#5 opened 10 months ago
by
win10
Training config link is broken
11
#3 opened 10 months ago
by
davidgortega