Seems like the GPTQ versions are broken
for the bigger models i get:
RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 1, 0] because the unspecified dimension size -1 can be any value and is ambiguous in self.gate...
for this test one i get:
...
File "/home/nepe/.local/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py", line 708, in forward
router_logits = self.gate(hidden_states)
File "/home/nepe/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/nepe/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/nepe/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/nepe/.local/lib/python3.10/site-packages/auto_gptq/nn_modules/qlinear/qlinear_cuda.py", line 227, in forward
zeros = zeros.reshape(self.scales.shape)
RuntimeError: shape '[8, 8]' is invalid for input of size 0
The non GPTQ version of the test model works perfectly.
Yeah, see the READMEs of the proper GPTQs for how to load them - you still need an AutoGPTQ PR at the moment
I tried both the old and the fix branches, same error. I even tried to quantize this model, same error.
As far as i understand there's still some more things to do.
Based on this:
https://github.com/PanQiWei/AutoGPTQ/pull/480
Have to apply this:
https://github.com/huggingface/transformers/pull/27956
And maybe this one too:
https://github.com/huggingface/optimum/pull/1585
My mistake, tried it with AutoModelForCausalLM.from_pretrained instead of AutoGPTQForCausalLM.from_quantized