突然加载不了tokenizer了,是不是升级导致的版本不兼容
AttributeError Traceback (most recent call last)
测试.ipynb Cell 19 line 4
1 # vpm_resampler_embedtokens_weight = torch.load(f"{model_dir}/vpm_resampler_embedtokens.pt")
2
3 # msg = model.load_state_dict(vpm_resampler_embedtokens_weight, strict=False)
----> 4 tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
File /usr/local/lib/python3.9/dist-packages/transformers/models/auto/tokenization_auto.py:877, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
875 else:
876 class_ref = tokenizer_auto_map[0]
--> 877 tokenizer_class = get_class_from_dynamic_module(class_ref, pretrained_model_name_or_path, **kwargs)
878 _ = kwargs.pop("code_revision", None)
879 if os.path.isdir(pretrained_model_name_or_path):
File /usr/local/lib/python3.9/dist-packages/transformers/dynamic_module_utils.py:514, in get_class_from_dynamic_module(class_reference, pretrained_model_name_or_path, cache_dir, force_download, resume_download, proxies, token, revision, local_files_only, repo_type, code_revision, **kwargs)
501 # And lastly we get the class inside our newly created module
502 final_module = get_cached_module_file(
503 repo_id,
504 module_file + ".py",
ref='/usr/local/lib/python3.9/dist-packages/transformers/dynamic_module_utils.py:0'>0;32m (...)
512 repo_type=repo_type,
513 )
--> 514 return get_class_in_module(class_name, final_module)
File /usr/local/lib/python3.9/dist-packages/transformers/dynamic_module_utils.py:213, in get_class_in_module(class_name, module_path)
211 # reload in both cases
212 module_spec.loader.exec_module(module)
--> 213 return getattr(module, class_name)
AttributeError: module 'transformers_modules.openbmb.MiniCPM-Llama3-V-2_5.287e3f85192a7c4acf2564fc6bda0637439a9d78.modeling_minicpmv' has no attribute 'PreTrainedTokenizerFastWrapper'
请问model_dir 是有小数点吗?MiniCPM-Llama3-V-2_5.287e3f85192a7c4acf2564fc6bda0637439a9d78 这样? 由于huggingface的机制问题,model_dir中带小数点会导致dynamic import错误,请把小数点改成其他符号
请问model_dir 是有小数点吗?MiniCPM-Llama3-V-2_5.287e3f85192a7c4acf2564fc6bda0637439a9d78 这样? 由于huggingface的机制问题,model_dir中带小数点会导致dynamic import错误,请把小数点改成其他符号
hi,谢谢您的回复!
model_dir中没有小数点,我加载的代码如下:
self.model = AutoPeftModelForCausalLM.from_pretrained(model_dir,
device_map='cuda:0',
trust_remote_code=True,
torch_dtype=torch.float16
).eval()
vpm_resampler_embedtokens_weight = torch.load(f"{model_dir}/vpm_resampler_embedtokens.pt")
self.msg = self.model.load_state_dict(vpm_resampler_embedtokens_weight, strict=False)
self.tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
在7天前的commit(287e3f8)之后,代码执行会报以上错误,我目前理解的过程是,因为用PEFT加载模型基本上必须要 trust_remote_code=True,每次会拉最新的代码,也就是transformers_modules.openbmb.MiniCPM-Llama3-V-2_5.287e3f85192a7c4acf2564fc6bda0637439a9d78.modeling_minicpmv;最近这个版本的提交,tokenizer的定义中少了PreTrainedTokenizerFastWrapper,导致加载失败,我自己修改需要把整个仓库clone下来自己修复,后续跟进不了最新版本的代码,不知道是否你们能够修复呢,或者如何避免呢