When will config.json be available?

#10

by DevJain7 - opened Aug 8

Discussion

DevJain7

Aug 8

I wanted to use the raw weights locally by downloading them. Without config.json I am facing errors.

Is there any other way to use the weights. I don't want to use inference.

pandora-s

Mistral AI_ org Aug 8

Hi @DevJain7 , how are you running the model? Mamba Codestral is based on the Mamba architecture, it wont work like other Transformers based models. Hence why we advise using mistral-inference.

DevJain7

Aug 8

•

edited Aug 8

@pandora-s
I want to use the model like:

local_directory = "/home/admin/mamba_codestral/model_weights"

model = AutoModelForCausalLM.from_pretrained(local_directory, device_map=device)
tokenizer = AutoTokenizer.from_pretrained(local_directory)

user_query = "some_sample_query"

(Then I want to generate text based on the user_query)
output = model.generate(
inputs["input_ids"],
attention_mask=inputs["attention_mask"],
pad_token_id=tokenizer.eos_token_id,
max_new_tokens=8160,
num_return_sequences=1,
do_sample=True,
top_p=0.9,
temperature=0.1,).to(device)

Hope this gives you an idea of how I tend to use the model.

Molbap

Aug 9

Hi @DevJain7 , you can use an HF revision with the most recent version of transformers, we ported it to transformers recently (latest version is necessary). Weights are in this repo on a different revision than main, namely refs/pr/9. I don't think it has the AutoModel mapping just yet but this should work

from transformers import MambaConfig, Mamba2ForCausalLM, AutoTokenizer
import torch
model_id = 'mistralai/Mamba-Codestral-7B-v0.1'
tokenizer = AutoTokenizer.from_pretrained(model_id, revision='refs/pr/9', from_slow=True, legacy=False)
model = Mamba2ForCausalLM.from_pretrained(model_id, revision='refs/pr/9')
input_ids = tokenizer("Hey how are you doing?", return_tensors= "pt")["input_ids"]

out = model.generate(input_ids, max_new_tokens=10)
print(tokenizer.batch_decode(out))

If you want to download the weights locally I suggest using something like

pip install --upgrade huggingface_hub
huggingface_cli login  # add your token here when prompted
huggingface_cli download 'mistralai/Mamba-Codestral-7B-v0.1' --local_dir . --revision="refs/pr/9"

That being said, mistral-inference should work as well!

gaganyatri

Aug 9

Hey,
I am able to run with mistral-inference.
Since i want to connect with continue-dev, i am exploring other options.

The commands are incorrect w.r.t huggingface

pip install -U "huggingface_hub[cli]
huggingface-cli login --token $HF_TOKEN
huggingface-cli download 'mistralai/Mamba-Codestral-7B-v0.1' --local-dir . --revision="refs/pr/9"

Nurb4000

Aug 9

Hi @DevJain7 , you can use an HF revision with the most recent version of transformers, we ported it to transformers recently (latest version is necessary). Weights are in this repo on a different revision than

That is good news for those of us who normally use things like text-gen instead of custom code.

gaganyatri

Aug 9

Current progress to run codestral-mamba locally - https://github.com/slabstech/llm-recipes/tree/main/tutorials/mamba

need to quantise the model, not fitting into 24GB 4090 card, some issue to make it running on 4x 4090,
hope to solve it tonight

gaganyatri

Aug 10

I tried to run a space with the weights at - https://huggingface.co/spaces/gaganyatri/codestral-api using the weights from https://huggingface.co/gaganyatri/codestral-7B
Waiting for free gpu, to verify the modifications.

What is the expected VRAM for the mamba model, it does not fit in 24GB card

DevJain7

Sep 10

This comment has been hidden

DevJain7 changed discussion status to closed Sep 10

DevJain7 changed discussion status to open Sep 10

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment