Сan you increase LLAMA3 8b simply by duplicating some layers?

by Regrin - opened May 13

May 13

Tell me, can you increase LLAMA3 8b simply by duplicating some layers?
Will this be of any use? I would like there to be a model, say, at 13b, so that, on the one hand, it would be easy to train, and on the other, it would be quite smart. I hope that such a transformation can preserve the performance of the model on the one hand, and increase the learning prospects on the other.

mlabonne

Owner May 13

Hey, unfortunately these self-merging technique performs poorly with small models like 8B. This has proven successful with continuous pre-training, as in SOLAR. You'd probably need to retrain the model if you want to get to this size without a massive loss of performance

Regrin

May 13

And if you do this, will the model lose performance at all? If not, that's great! Then it will be possible to train on the generated GPT4 datasets, the result will be much better than with the 8b model.

Am I right that the 8b models have reached the limit of their capabilities?

mlabonne

Owner May 14

I don't think they did. There's still a lot of performance you can squeeze out of 8B models.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment