File size: 5,652 Bytes
d3ece77
 
1a0c6ef
d3ece77
 
 
 
 
 
 
1a0c6ef
d3ece77
c7e8bdd
 
1a0c6ef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
---
title: MusiConGen
emoji: 🪩 
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 4.39.0
app_file: app.py
pinned: false
---
# MusiConGen

arxiv.org/abs/2407.15060


This is the official implementation of paper: "MusiConGen: Rhythm and chord control for Transformer-based text-to-music generation" in Proc. Int. Society for Music Information Retrieval Conf. (ISMIR), 2024.

MusiConGen is based on pretrained [Musicgen](https://github.com/facebookresearch/audiocraft) with additional controls: Rhythm and Chords. The project contains inference, training code and training data (youtube list). 

<br />

[Arxiv Paper]() | [Demo](https://musicongen.github.io/musicongen_demo/) 

<br />

## Installation
MusiConGen requires Python 3.9 and PyTorch 2.0.0. You can run:
```bash
pip install -r requirements.txt
```

We also recommend having `ffmpeg` installed, either through your system or Anaconda:
```bash
sudo apt-get install ffmpeg
# Or if you are using Anaconda or Miniconda
conda install 'ffmpeg<5' -c  conda-forge
```

<br />

## Model
The model is based on the pretrained MusicGen-melody(1.5B). For infernece, GPU with VRAM greater than 12GB is recommended. For training, GPU with VRAM greater than 24GB is recommended.

## Inference

First, the model weight is at [link](https://huggingface.co/Cyan0731/MusiConGen/tree/main).
Move the model weight `compression_state_dict.bin` and `state_dict.bin` to directory `audiocraft/ckpt/musicongen`.

One can simply run inference script with the command to generate music with chord and rhythm condition:
```shell
cd audiocraft
python generate_chord_beat.py
``` 

<br />


## Training 

### Training Data
The training data is provided as json format in 5_genre_songs_list.json. The listed suffixes are for youtube links.

### Data Preprocessing
Before training, one should put audio data in `audiocraft/dataset/$DIR_OF_YOUR_DATA$/full`.
And then run the preprocessing step by step:

```shell
cd preproc
```

### 1. demixing tracks
To remove the vocal stem from the track, we use [Demucs](https://github.com/facebookresearch/demucs).
In `main.py`, change `path_rootdir` to your directory and `ext_src` to the audio extention of your dataset (`'mp3'` or `'wav'`).

```shell
cd 0_demix
python main.py
```

<br />

### 2. beat/downbeat detection and cropping
To extract beat and down beat of songs, you can use [BeatNet](https://github.com/mjhydri/BeatNet) or [Madmom](https://github.com/CPJKU/madmom) as the beat extrctor.
For Beatnet user, change `path_rootdir` to your directory in `main_beat_nn.py`. For Madmom user, change `path_rootdir` to your directory in `main_beat.py`.

Then accroding to the extracted beat and downbeat, each song is cropped into clips in `main_crop.py`. `path_rootdir` should also be changed to your dataset directory.

The last stage is to filter out the clips with low volumn. `path_rootdir` should be changed to `clip` directory.

```shell
cd 1_beats-crop
python main_beat.py
python main_crop.py
python main_filter.py
```

<br />

### 3. chord extraction
To extract chord progression, we use [BTC-ISMIR2019](https://github.com/jayg996/BTC-ISMIR19).
The `root_dir` in `main.py` should be changed to your clips data directory.

```shell
cd 2_chord/BTC-ISMIR19
python main.py
```

<br />

### 4. tags/description labeling (optional)
For dataset crawled from website(e.g. youtube), the description of each song can be obtrained from crawled informaiton `crawl_info.json`(you can change the file name in `3_1_ytjsons2tags/main.py`). We use the title of youtube song as description. The `root_dir` in `main.py` should be changed to your clips data directory.

```shell
cd 3_1_ytjsons2tags
python main.py
```

For dataset without information to describe, you can use [Essentia](https://github.com/MTG/essentia) to extract instrument and genre.
```shell
cd 3_tags/essentia
python main.py
```

After json files are created, run `dump_jsonl.py` to generate jsonl file in training directory.

<br />

### Training stage
The training weight of MusiConGen is at [link](https://huggingface.co/Cyan0731/MusiConGen_training/tree/main). Please place it into the directory `MusiConGen/audiocraft/training_weights/xps/musicongen`.

Before training, you should set your username in environment variable
```shell
export env USER=$YOUR_USER_NAME
```

If using single gpu to finetune, you can use the following command:
```shell
dora run solver=musicgen/single_finetune \
    conditioner=chord2music_inattn.yaml \
    continue_from=//sig/musicongen \ 
    compression_model_checkpoint=//pretrained/facebook/encodec_32khz \
    model/lm/model_scale=medium dset=audio/example \
    transformer_lm.n_q=4 transformer_lm.card=2048
```
the `continue_from` argument can be also provided with your absolute path of your checkpoint. 

If you are using multiple(4) gpus to finetune, you can use the following command:
```shell
dora run -d solver=musicgen/multigpu_finetune \
    conditioner=chord2music_inattn.yaml \
    continue_from=//sig/musicongen \ 
    compression_model_checkpoint=//pretrained/facebook/encodec_32khz \
    model/lm/model_scale=medium dset=audio/example \
    transformer_lm.n_q=4 transformer_lm.card=2048
```

<br />

### export weight
use `export_weight.py` with your training signature `sig` to export your weight to `output_dir`.

<br />

## License
The license of code and model weights follows the [LICENSE file](https://github.com/Cyan0731/MusiConGen/blob/main/LICENSE), LICENSE of MusicGen in [LICENSE file](https://github.com/facebookresearch/audiocraft/blob/main/LICENSE) and [LICENSE_weights file](https://github.com/facebookresearch/audiocraft/blob/main/LICENSE_weights).