Can we do the inferences on ZymCTRL with multi GPUs?
Thank you Prof. Noelia Ferruz for your excellent work!
I have tried on V100-32G GPU, and it took very long time: about 204 minutes for the default enzyme nitrilase (3.5.5.1) in your Example 1. Since my team has a 4xV100-32G Nvidia DGX machine, we wonder if it is possible to modify your script of Example 1 to fully use all 4 GPUs, in order to speed up the inference. We also have tried Example 1 on single RTX6000ada - 48G GPU, it also took as long as 44 minutes. It seems nn.parallel.DistributedDataParallel will do, but when I did as follows:
model = GPT2LMHeadModel.from_pretrained('/my/path/to/zymCTRL').to(device)
if torch.cuda.device_count() > 1:
print('f"Use {torch.cuda_device_count()} GPUs")
model = torch.nn.parallel.DistributedDataParallel(model)
an error message showed: "RuntimeError: Default process group has not been initialized, please make sure to call init_process_group"
guruace
Hi guruace,
How many sequences were you generating during that time? With your GPUs, I'd expect it generates more than 2000 sequences in that time (possibly many more).
Certainly the first batch does not take more than 2-5 minutes when I use an A40.
Are you sure the GPU is being used?
Alternatively, I've never tried, but I think HuggingFace supports inference on multiple GPUs: https://huggingface.co/docs/transformers/perf_infer_gpu_many
Hope this helps,
Noelia
Dear Noelia,
Yes, I was quite sure that it was using GPU, but used single GPU(it is quite sure also from your script). I also tested on my MacBook Pro M1 - 64G, it presumably ran on CPU only and it took 36 hours to produce only 572 sequences. On RTX6000ada and V100-32g, there were 1300 and 1290 sequences generated. Based on M1 results, I was very sure that running on V100-32g was using GPU, not running on CPU alone.
Thank you!
guruace