Can multiple NVIDIA T4 GPUs be used to deploy Gemma2-27B-IT?
#36
by
armanZhou
- opened
If so, how many T4 GPUs are needed?
Deploying Gemma2-27B-IT on Multiple T4 GPUs is not recommended due to the model's architecture, communication overhead and the need to consider different parallelism types. The Gemma 2-27B model is designed to run inference efficiently at full precision on a single Google Cloud TPU host, NVIDIA A100 80GB Tensor Core GPU, or NVIDIA H100 Tensor Core GPU. Please refer to the Gemma2 blog for more details.