adamo1139/Danube3-500M-4chan-archive-0709

Description

Danube3 500M model finetuned on adamo1139/Fal7acy_4chan_archive_ShareGPT which is essentially 250M tokens of chat data from 4chan, organized in coherent threads, capturing various boards.

ChatML prompt format, use system prompts such as "A chat on 4chan board /3/", "A chat on 4chan board /biz/" etc, as this was trained in.

This is a very small 500M model, so it's not very smart.

Issues

Dataset doesn't have correctly formatted newspaces, so quoted content doesn't format correctly.

Instead of this:

>what did you say

I didn't say nothing

It will print

>what did you say I didn't say nothing

Training details

1 epoch, 16-bit LoRA, 8192 seq length, 256 rank, 256 alpha, rslora True, batch size 8, learning rate 0.00004, embedding learning rate 0.00001, target modules ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj","embed_tokens", "lm_head"], cosing learning scheduler.