facebook/wav2vec2-lv-60-espeak-cv-ft · Customizable Phone and Word Delimiters

Hi!

Is there a way to customize the phone and word delimiters for the output of this model? I tried using the Wav2Vec2PhonemeCTCTokenizer with this model and modified the phone_delimiter_token and word_delimiter_token params but it didn't seem to work. Here is my code if you are interested:

from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC, Wav2Vec2PhonemeCTCTokenizer
from datasets import load_dataset
import torch

processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-lv-60-espeak-cv-ft")
model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-lv-60-espeak-cv-ft")

# I used the vocab.json file from this model only.
tokenizer = Wav2Vec2PhonemeCTCTokenizer("/content/vocab.json", phone_delimiter_token= "|", word_delimiter_token= "-")
    
ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")

input_values = processor(ds[0]["audio"]["array"], return_tensors="pt").input_values

with torch.no_grad():
  logits = model(input_values).logits

predicted_ids = torch.argmax(logits, dim=-1)
transcription = tokenizer.batch_decode(predicted_ids)
print(transcription)

I just receive this output: ['ɐ m æ n s ɛ d t ə ð ə j uː n ɪ v ɚ s s ɚ aɪ ɛ ɡ z ɪ s t']
As you can see, the delimiters haven't changed.

Any help is appreciated!