arxiv:2311.08648

Explore Spurious Correlations at the Concept Level in Language Models for Text Classification

Published on Nov 15, 2023

Authors:

Furong Huang

Abstract

Language models (LMs) have gained great achievement in various NLP tasks for both fine-tuning and in-context learning (ICL) methods. Despite its outstanding performance, evidence shows that spurious correlations caused by imbalanced label distributions in training data (or exemplars in ICL) lead to robustness issues. However, previous studies mostly focus on word- and phrase-level features and fail to tackle it from the concept level, partly due to the lack of concept labels and subtle and diverse expressions of concepts in text. In this paper, we first use the LLM to label the concept for each text and then measure the concept bias of models for fine-tuning or ICL on the test data. Second, we propose a data rebalancing method to mitigate the spurious correlations by adding the LLM-generated counterfactual data to make a balanced label distribution for each concept. We verify the effectiveness of our mitigation method and show its superiority over the token removal method. Overall, our results show that there exist label distribution biases in concepts across multiple text classification datasets, and LMs will utilize these shortcuts to make predictions in both fine-tuning and ICL methods.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2311.08648 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2311.08648 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2311.08648 in a Space README.md to link it from this page.