cognitivecomputations/dolphin-2.7-mixtral-8x7b · fine tuning with axolotl not working

I am trying to fine tune with axolotl (using axolotl's docker), but I get either

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:6 and cuda:0!

or when I change the config.json part like this:

  "output_router_logits": false,

(as hinted by https://huggingface.co/mistralai/Mixtral-8x7B-v0.1/discussions/5 )

I get :

RuntimeError: !grad_accumulator_.expired() INTERNAL ASSERT FAILED at "../torch/csrc/autograd/saved_variable.cpp":226, please report a bug to PyTorch. No grad accumulator for a saved leaf

Any hints?

No accelerate, just trying to run the training straight through python.

It worked with 2.5. I diffed 2.5 and 2.7 config.json and output_router_logits (and transformers version) is the only difference.