igorsterner
/

german-english-code-switching-identification

Token Classification

Inference Endpoints

Model card Files Files and versions Community

Edit model card

German-English Code-Switching Identification

The Tongueswitcher BERT model finetuned for German-English identification. It was introduced in this paper. This model is case sensitive.

Overview

Initialized language model: german-english-code-switching-bert
Training data: The Denglish Corpus
Infrastructure: 1x Nvidia A100 GPU
Published: 16 October 2023

Hyperparameters

batch_size = 16
epochs = 3
n_steps = 789
max_seq_len = 512
learning_rate = 3e-5
weight_decay = 0.01
seed = 2021

Authors

Igor Sterner: is473 [at] cam.ac.uk
Simone Teufel: sht25 [at] cam.ac.uk

BibTeX entry and citation info

@inproceedings{sterner2023tongueswitcher,
  author    = {Igor Sterner and Simone Teufel},
  title     = {TongueSwitcher: Fine-Grained Identification of German-English Code-Switching},
  booktitle = {Sixth Workshop on Computational Approaches to Linguistic Code-Switching},
  publisher = {Empirical Methods in Natural Language Processing},
  year      = {2023},
}

Downloads last month: 22

Inference Examples

Token Classification

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.