File size: 6,605 Bytes
b383856 ddf8681 d07e300 ddf8681 b383856 2459ff0 b383856 2459ff0 b383856 777bc14 b383856 777bc14 b383856 2459ff0 777bc14 2459ff0 777bc14 2459ff0 b383856 777bc14 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 |
---
license: apache-2.0
base_model: distilbert-base-multilingual-cased
tags:
- generated_from_keras_callback
model-index:
- name: DistilFEVERen
results: []
widget:
- text: Soul Food is a 1997 American comedy-drama film produced by Kenneth `` Babyface '' Edmonds , Tracey Edmonds and Robert Teitel and released by Fox 2000 Pictures .Fox 2000 Pictures released the film Soul Food .
language:
- en
---
<!-- This model card has been generated automatically according to the information Keras had access to. You should
probably proofread and complete it, then remove this comment. -->
# DistilFEVERen
This model is a fine-tuned version of [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), specifically trained on the Recognize Textual Entailment (RTE) task using [the first fold split of FEVER dataset in English](https://huggingface.co/datasets/raicrits/fever_folds/blob/main/folds_en/1.json).
RTE focuses on evaluating the support or refutation of claims within a given text. The labels used for classification are as follows:
- 0: SUPPORT (indicating that the claim is supported by the text)
- 1: CONFUTE (indicating that the claim is refuted by the text)
- 2: NOT ENOUGH INFO (indicating that there is insufficient information in the text to support or refute the claim).
## Inference API Usage
When using the Inference API, it is important to note that the input should be provided by pasting the text first, followed by the claim, without any spaces or separators. The model's tokenizer concatenates these inputs in the specified order. Interestingly, inverting the order of pasting (claim first, then text) seems to produce similar results, suggesting that the model generally captures coherence within a given text (the label 0 indicates a coherent text, while the other label 1 signify an incoherent text).
## Training procedure
The model was trained on Kaggle using as accelerator a GPU T4 x2. See the complete notebook here: <TODO>
```python
import json
import numpy as np
import os
import pickle
from IPython.display import clear_output
import pandas as pd
import tensorflow as tf
import transformers
from datasets import load_dataset
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.model_selection import train_test_split
from transformers import DistilBertTokenizer, TFDistilBertForSequenceClassification
import warnings
# Silence all warnings
warnings.filterwarnings("ignore")
# Try to create a directory named "models"
try:
os.makedirs("models")
except:
# If the directory already exists or if there's an error, do nothing (pass)
pass
# Try to create a directory named "results"
try:
os.makedirs("results")
except:
# If the directory already exists or if there's an error, do nothing (pass)
pass
# Try to create a directory named "history"
try:
os.makedirs("history")
except:
# If the directory already exists or if there's an error, do nothing (pass)
pass
# Flag to determine if existing models and histories should be overwritten
overwrite = True
# Load dataset for the first fold
data = load_dataset("raicrits/fever_folds", data_files="folds_en/1.json")['train']
test = data['test'][0]
val_set = data['val'][0]
train_set = data['train'][0]
# Define paths for model, results, and history
model_path = 'models/DistilFEVERen_weights_0.h5'
results_path = "results/DistilFEVERen_0.json"
history_path = 'history/DistilFEVERen_0.pickle'
# Load the tokenizer
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-multilingual-cased')
# Preprocess the data
test_encodings = tokenizer(test['text'], test['claim'], truncation=True, padding=True, max_length=256, return_tensors='tf')
test_labels = tf.convert_to_tensor(test['label'])
train_encodings = tokenizer(train_set['text'], train_set['claim'], truncation=True, padding=True, return_tensors='tf')
val_encodings = tokenizer(val_set['text'], val_set['claim'], truncation=True, padding=True, return_tensors='tf')
train_labels = tf.convert_to_tensor(train_set['label'])
val_labels = tf.convert_to_tensor(val_set['label'])
# Check if the model and history already exist for the first fold
if not overwrite and os.path.exists(model_path):
print("Model and history already exist for fold {}. Loading...".format(0))
model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-multilingual-cased', num_labels=3)
model.load_weights(model_path)
# with open(history_path, 'rb') as file_pi:
# history = pickle.load(file_pi)
else:
# Create a new model and define loss, optimizer, and callbacks
model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-multilingual-cased', num_labels=3)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])
model_checkpoint = tf.keras.callbacks.ModelCheckpoint(
model_path,
monitor='val_loss',
save_best_only=True,
mode='min',
save_weights_only=True
)
early_stopping = tf.keras.callbacks.EarlyStopping(
monitor='val_loss',
patience=1,
mode='min',
restore_best_weights=True
)
# Train the model for the first fold
clear_output(wait=True)
history = model.fit(
[train_encodings['input_ids'], train_encodings['attention_mask']], train_labels,
validation_data=([val_encodings['input_ids'], val_encodings['attention_mask']], val_labels),
batch_size=10,
epochs=100,
callbacks=[early_stopping, model_checkpoint]
)
# Save the training history
with open(history_path, 'wb') as file_pi:
pickle.dump(history.history, file_pi)
```
## Inference procedure
```python
def getPrediction(model,tokenizer,claim,text):
encodings = tokenizer([text], [claim], truncation=True, padding=True, max_length=256, return_tensors='tf')
preds = model.predict([encodings['input_ids'], encodings["attention_mask"]])
return preds
text = "Soul Food is a 1997 American comedy-drama film produced by Kenneth `` Babyface '' Edmonds , Tracey Edmonds and Robert Teitel and released by Fox 2000 Pictures ."
claim = 'Fox 2000 Pictures released the film Soul Food .'
getPrediction(model,tokenizer,claim,text)
```
### Evaluation results
It achieves the following results on the evaluation set:
### Framework versions
- Transformers 4.35.0
- TensorFlow 2.13.0
- Datasets 2.1.0
- Tokenizers 0.14.1
- Numpy 1.24.3
|