Vehicle User Instructions Classification - BERT (Chinese)

This repository contains a fine-tuned BERT model for classifying vehicle user instructions in Chinese. The model is trained on a dataset of user instructions related to various vehicle control commands.

Preface

This fine-tuned model is for Our team's UOW CSIT998 Professional Capstone Project.

Dataset

The dataset used for training and evaluation consists of Chinese text instructions corresponding to different vehicle control commands. The distribution of the dataset is as follows:

Training set: 4499 samples
Validation set: 2249 samples
Test set: 2250 samples

The instructions cover a range of vehicle control commands, including:

{'开车窗': 0, '关左车门': 1, '关右前车窗': 2, '关闭引擎': 3, '关左前车窗': 4, '开右前车窗': 5, '关左后车窗': 6, '开左后车窗': 7, '开后备箱': 8, '关车门': 9, '关车窗': 10, '开左前车窗': 11, '关右后车窗': 12, '开敞篷': 13, '开左侧车窗': 14, '关敞篷': 15, '喇叭': 16, '开右后车窗': 17, '开右车门': 18, '停车点1': 19, '关后备箱': 20, '关右车门': 21, '开左车门': 22, '停车点2': 23, '开车门': 24, '打开引擎': 25, '关左侧车窗': 26}

Model

The model is based on the pre-trained Chinese BERT model (bert-base-chinese). It has been fine-tuned on the vehicle user instructions dataset using the following training arguments:

training_args = TrainingArguments(
    output_dir='',
    do_train=True,
    do_eval=True,
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=32,
    warmup_steps=100,
    weight_decay=0.01,
    logging_strategy='steps',
    logging_dir='',
    logging_steps=50,
    evaluation_strategy="steps",
    eval_steps=50,
    save_strategy="steps",
    save_steps=200,
    fp16=True,
    load_best_model_at_end=True
)

Training Results

The model was trained for 3 epochs, and the training progress can be summarized as follows:

Step	Training Loss	Validation Loss	Accuracy	F1	Precision	Recall
50	3.257000	2.964479	0.168519	0.089801	0.229036	0.126555
100	2.525000	1.711695	0.648288	0.532127	0.595545	0.590985
150	1.197200	0.628560	0.921298	0.888212	0.892879	0.890719
...	...	...	...	...	...	...
8000	0.045900	0.136842	0.969320	0.969658	0.969638	0.970056

Evaluation

The trained model was evaluated on the training, validation, and test sets, achieving the following performance:

	eval_loss	eval_Accuracy	eval_F1	eval_Precision	eval_Recall
train	0.036020	0.991331	0.991048	0.991615	0.990673
val	0.136842	0.969320	0.969658	0.969638	0.970056
test	0.126695	0.974222	0.975473	0.975814	0.975435

The model achieves high accuracy, F1 score, precision, and recall on all three datasets, indicating its effectiveness in classifying vehicle user instructions.

Usage

To use the fine-tuned model for inference, you can utilize the Hugging Face Inference API. Here's an example of how to make a request to the API using Python:

import requests

API_URL = "https://api-inference.huggingface.co/models/lindsey-chang/vehicle-user-instructions-classification-bert-chinese"
headers = {"Authorization": f"Bearer {API_TOKEN}"}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

# Example usage
input_text = "请打开车窗"
output = query({"inputs": input_text})
print(output)

Replace your-username with your Hugging Face username and API_TOKEN with your personal API token, which you can create in your Hugging Face account settings.

The model will return the predicted class index for the input instruction. You can map the class index back to the corresponding vehicle control command using the provided class labels.