why not include Qwen1.5-MoE-A2.7B in the table?

#4
by J22 - opened

IMHO, Qwen1.5-MoE-A2.7B is SOTA MOE model with 2B active parameters.

JetMoE org
edited Apr 13

Before comparing, it would be good to know how many tokens the model is trained on and what data they used (including the original dense model before upcycling). Furthermore, it should be considered as a concurrent work.

Sign up or log in to comment