bloom-1b1-emailgen - v1
This model is a fine-tuned version of bigscience/bloom-1b1 on the postbot/multi-emails-100k
dataset.
It achieves the following results on the evaluation set:
- Loss: 1.7397
Model description
More information needed
Intended uses & limitations
⚠️ this model did not have any of the original layers frozen during training ⚠️
- while this is still an area of investigation, the model likely needs to have some layers frozen during fine-tuning to retain the multilingual capabilities in balance with learning how to write emails.
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 7e-05
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 64
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.03
- num_epochs: 2.0
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
1.8465 | 1.0 | 256 | 1.8656 |
1.4903 | 2.0 | 512 | 1.7396 |
details
***** eval metrics *****
epoch = 2.0
eval_loss = 1.7397
eval_runtime = 0:04:27.41
eval_samples = 4216
eval_samples_per_second = 15.766
eval_steps_per_second = 15.766
perplexity = 5.6956
Framework versions
- Transformers 4.25.0.dev0
- Pytorch 1.13.0+cu117
- Datasets 2.6.1
- Tokenizers 0.13.1
- Downloads last month
- 25
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.