π Easy Fine-Tuning with Hugging Face SQL Console, Notebook Creator, and SFT
In this tutorial, we'll take you through an end-to-end process of creating a new dataset, fine-tuning a model with it, and sharing it on Hugging Face. By the end, you'll have a model that can respond in a lovely poetic way! π
What We'll Use:
- Hugging Face Dataset Viewer SQL Console
- Dataset Notebook Create
- Google Colab
For this example, we'll work with a poetry dataset and filter only the poems in the 'Love' category. This will allow us to fine-tune a model to generate answers filled with love and emotion. π
1. Getting the data
Let's start by getting our data. We'll use the Georgii/poetry-genre dataset, which contains poems across various topics:
We only need the 'Love' poems, and we'll filter out any shorter than 150 characters. To do this, we'll use the SQL Console:
Click on SQL Console:
And now, write the following SQL query:
SELECT text AS poem FROM train WHERE genre = 'Love' AND len(text) > 150
π‘ Tip: For more advanced techniques and examples on using the SQL Console, check out this guide.
Now, click on Download to save the filtered dataset as a Parquet file. We'll use this file in the next steps.
2. Uploading the Dataset to Hugging Face
Create a new repository on Hugging Face for your dataset. You can upload the Parquet file manually, or use the following Python snippet to upload it programmatically:
from datasets import load_dataset
# Load the Parquet file into a dataset
dataset = load_dataset('parquet', data_files='query_result.parquet')
# Push the dataset to your Hugging Face repository
dataset.push_to_hub('your_dataset_name')
Or follow these steps to create your dataset.
In my case, I this dataset which now looks this way:
3. Generating the Training Code
Next, we'll use the Notebook Creator app to generate the training code for our dataset:
- Select
asoria/love-poems
as the dataset name
- Choose the
Supervised fine-tuning (SFT)
notebook type.
- Click
Generate Notebook
and open it in Google Colab.
4. Fine-Tuning the Model
Now, it's time to run the scripts in the generated notebook. We'll use the dataset to fine-tune a pre-trained model like facebook/opt-350m
to create a new, more love-inspired version.
Follow the instructions in the notebook to train the model. Once training is complete, you'll have a model that responds in a lovelier way! πΉβ¨
Conclusion
With just a few simple steps, we've created a new version of a dataset using the Hugging Face SQL Console, generated the necessary code with the Notebook Creator, and fine-tuned a model to answer with more love and poetry.
Now, your model is ready to spread love in every response! ππ