wsntxxn commited on
Commit
cc8ed6e
1 Parent(s): ef9c1d9

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -0
README.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - agkphysics/AudioSet
5
+ pipeline_tag: audio-classification
6
+ ---
7
+ # Model Details
8
+ This is a CRNN sound event detection model pre-trained on [AudioSet](https://research.google.com/audioset/download.html) and then finetuned on [AudioSet-strong](https://research.google.com/audioset/download_strong.html).
9
+ It contains 8 convolution layers and a GRU, with a time resolution of 40ms and a total of about 6.4 million parameters.
10
+
11
+ # Usage
12
+ ```python
13
+ import torch
14
+ from transformers import AutoModel
15
+ import torchaudio
16
+
17
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
18
+ model = AutoModel.from_pretrained(
19
+ "wsntxxn/cnn8rnn-audioset-sed",
20
+ trust_remote_code=True
21
+ ).to(device)
22
+
23
+ wav1, sr1 = torchaudio.load("/path/to/file1.wav")
24
+ wav1 = torchaudio.functional.resample(wav1, sr1, model.config.sample_rate)
25
+ wav1 = wav1.mean(0) if wav1.size(0) > 1 else wav1[0]
26
+
27
+ wav2, sr2 = torchaudio.load("/path/to/file2.wav")
28
+ wav2 = torchaudio.functional.resample(wav2, sr2, model.config.sample_rate)
29
+ wav2 = wav2.mean(0) if wav2.size(0) > 1 else wav2[0]
30
+
31
+ wav_batch = torch.nn.utils.rnn.pad_sequence([wav1, wav2], batch_first=True)
32
+
33
+ with torch.no_grad():
34
+ output = model(waveform=wav_batch)
35
+ # output: {
36
+ # "framewise_output": (2, 447, n_frames),
37
+ # "clipwise_output": (2, 447)
38
+ # }
39
+
40
+ # classes is in `model.classes`
41
+ # for example, the probability sequence of male speech is:
42
+ male_speech_prob = output[:, model.classes.index("Male speech, man speaking"), :]
43
+
44
+ ```