openai/whisper-large · How can whisper return the language type?

polaris16

Oct 10, 2023

In the example of Long-Form Transcription, pipe does not return the language type. How can return the language type?

import torch
from transformers import pipeline
from datasets import load_dataset

device = "cuda:0" if torch.cuda.is_available() else "cpu"

pipe = pipeline(
"automatic-speech-recognition",
model="openai/whisper-large-v2",
chunk_length_s=30,
device=device,
)

ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample = ds[0]["audio"]

prediction = pipe(sample.copy(), batch_size=8)

for k in prediction:
print(k)

---- output ----
text

sanchit-gandhi

Oct 10, 2023

You can pass the return_language argument to the pipeline to get the language detected for each chunk:

import torch
from transformers import pipeline
from datasets import load_dataset

device = "cuda:0" if torch.cuda.is_available() else "cpu"

pipe = pipeline(
"automatic-speech-recognition",
model="openai/whisper-large-v2",
chunk_length_s=30,
device=device,
)

ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample = ds[0]["audio"]

prediction = pipe(sample), batch_size=8, return_language=True)

print(prediction)

Print Output:

{'text': ' Mr. Quilter is the apostle of the middle classes and we are glad to welcome his gospel.',
 'chunks': [{'language': 'english',
   'text': ' Mr. Quilter is the apostle of the middle classes and we are glad to welcome his gospel.'}]}

unk1911

Dec 19, 2023

should be:

prediction = pipe(sample, batch_size=8, return_language=True)

(minor syntax error)