Unbabel/wmt22-comet-da · Regarding Roberta Large

Jun 27

I see Unbabel comet is downloading models--xlm-roberta-large folder every time, is there any way to load it from local, if yes please share the hack.

RicardoRei

Unbabel org Jun 28

It usually uses HF cache to cache those models. In any case, you can always clone this repo and then use the "load_from_checkpoint" method to load the checkpoint path

RicardoRei changed discussion status to closed Jun 28

Atulad

Jun 28

See my point is I am using comet model path....but it is downloading models--xlm-roberta-large but using huggingface url....but in my production environment that url is blocked now how to use local path of roberta large

RicardoRei

Unbabel org Jun 28

ahh I see I misunderstood what you were saying. But that should not happen because for models that are trained the encoder models are not downloaded. We prevent that here and here.

I would have to investigate that further... But should not happen

RicardoRei changed discussion status to open Jun 28

RicardoRei

Unbabel org Jun 28

The only thing that might be happening is that you still need to download some things like the tokenizer and the configs of the model... maybe then the best thing for you is to modify the functions I presented above to load directly from a local xlm-roberta file

RicardoRei

Unbabel org Jun 28

its probably that... its attempting to download the configs and tokenizer

Atulad

Jun 28

Can't load tokenizer for 'xlm-roberta-large'. If you were trying to load it from '
https://huggingface.co/models
', make sure you don't have a local directory with the same name. Otherwise, make sure 'xlm-roberta-large' is the correct path to a directory containing all relevant files for a XLMRobertaTokenizerFast tokenizer

This is the error I am facing in production environment because https://huggingface.co/models is blocked....if roberta large model is present in cache it works perfectly if not it dowloads

esesjay

Jul 15

•

edited Jul 15

Hey, facing the same issue as Atulad, while what i did was load the tokenizer of 'xlm-roberta-large' seperately but still faced the same error. Right now i have all the relevant files for the tokenizer but it seems the directory is incorrect. May I ask the syntax or the preset local directory for the tokenizer so it will not trigger this error:

Can't load tokenizer for 'xlm-roberta-large'. If you were trying to load it from '
https://huggingface.co/models
', make sure you don't have a local directory with the same name. Otherwise, make sure 'xlm-roberta-large' is the correct path to a directory containing all relevant files for a XLMRobertaTokenizerFast tokenizer

Atulad

Jul 15

Hi I solved this issue, while pushing the code in gitlab/github do push the directory of xlm-roberta-large size(15mb) and create a docker file and copy the directory in your root/home directory.
RUN mkdir -p $WORKING_DIR/.cache/huggingface/hub
ADD models--xlm-roberta-large $WORKING_DIR/.cache/huggingface/hub/models--xlm-roberta-large

esesjay

Jul 15

Hi I solved this issue, while pushing the code in gitlab/github do push the directory of xlm-roberta-large size(15mb) and create a docker file and copy the directory in your root/home directory.
RUN mkdir -p $WORKING_DIR/.cache/huggingface/hub
ADD models--xlm-roberta-large $WORKING_DIR/.cache/huggingface/hub/models--xlm-roberta-large

Hey Atulad, I fixed the issue by just moving my directory up a level. Easy fix!