See axolotl config
axolotl version: 0.4.1
base_model: Afterparty-hf/pretrain-0.924
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
- path: Afterparty-hf/synthetic-instruct
type: sharegpt
- path: Afterparty-hf/train-format-server
type: sharegpt
- path: Afterparty-hf/help-channels-formatted
type: sharegpt
- path: Afterparty-hf/constt-augmented
type: sharegpt
- path: Afterparty-hf/transcripts-train
type: sharegpt
chat_template: chatml
dataset_prepared_path: ./prepath
hub_model_id: Afterparty-hf/finetune-0.559
wandb_project: ap_publi
hf_use_auth_token: true
output_dir: ./finetune-559-a
resume_from_checkpoint: ./finetune-559/checkpoint-1026
wandb_watch: all
hub_private_repo: true
hub_strategy: all_checkpoints
push_to_hub: false
hf_use_auth_token: true
max_grad_norm: 0.6
sequence_len: 14256
sample_packing: true
pad_to_sequence_len: true
micro_batch_size: 1
gradient_accumulation_steps: 1
num_epochs: 4
learning_rate: 0.000004
optimizer: adamw_bnb_8bit
#optim_args:
# amsgrad: true
lr_scheduler: cosine
train_on_inputs: false
group_by_length: false
bfloat16: false
#bf16: auto
fp16:
tf32: false
neftune_noise_alpha: 2
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: true
logging_steps: 1
xformers_attention:
flash_attention: true
#unsloth_lora_mlp: true
#unsloth_lora_qkv: true
#unsloth_lora_o: true
#flash_attn_cross_entropy: true
#flash_attn_rms_norm: true
#flash_attn_fuse_qkv: false
#flash_attn_fuse_mlp: true
warmup_ratio: 0.5
evals_per_step: 0.025
eval_table_size:
saves_per_epoch: 5
debug:
torch_compile: true
rank:
deepspeed: deepspeed_configs/zero2.json
save_safetensors: true
weight_decay: 0.01
special_tokens:
bos_token: "<s>"
eos_token: "</s>"
unk_token: "<unk>"
pad_token: "</s>"
tokens: # these are delimiters
- "<|im_start|>"
- "<|im_end|>"
finetune-0.559
This model is a fine-tuned version of Afterparty-hf/pretrain-0.924 on the None dataset.
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 4e-06
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 8
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 310
- num_epochs: 4
Training results
Framework versions
- Transformers 4.41.1
- Pytorch 2.1.2+cu118
- Datasets 2.19.1
- Tokenizers 0.19.1
- Downloads last month
- 16
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for Afterparty-hf/Finetune-test1
Base model
Afterparty-hf/pretrain-0.924