Omega 2.6B
This model is derived from phi 1.3B using layer stacking techniques to double the number of hidden layers in the model.
The model was then trained for 1 epoch on data from tiny-textbooks and tiny-lessons.
Training
https://wandb.ai/wing-lian/phi-2x-pt-tiny