why not include Qwen1.5-MoE-A2.7B in the table?

by J22 - opened Apr 13

Discussion

J22

Apr 13

IMHO, Qwen1.5-MoE-A2.7B is SOTA MOE model with 2B active parameters.

YikangS

JetMoE org Apr 13

•

edited Apr 13

Before comparing, it would be good to know how many tokens the model is trained on and what data they used (including the original dense model before upcycling). Furthermore, it should be considered as a concurrent work.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment