baichuan-inc
/

Baichuan-13B-Chat

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

英文推理报错

#26

by hanswang73 - opened Aug 10, 2023

Aug 10, 2023

Traceback (most recent call last):
File "/root/miniconda3/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/root/miniconda3/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/transformers/generation/utils.py", line 1588, in generate
return self.sample(
File "/root/miniconda3/lib/python3.8/site-packages/transformers/generation/utils.py", line 2678, in sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either inf, nan or element < 0

transformer == 4.31.0
刚下载的最新的模型文件
prompt = “Do you know ifax ?” (其它的英文提问也出类似报错)

Baichuan Intelligent Technology org Aug 11, 2023

是不是修改了哪里？推理和输入是英文还是中文没有关系的

Baichuan Intelligent Technology org Aug 11, 2023

•

edited Aug 12, 2023

能不能贴一下代码和说明一下使用的平台？你的prompt我这边验证没问题

Aug 17, 2023

之前用的是 8bit 量化，就会固定出现问题，后来改成不量化的版本，英文没问题了，但如果中文 prompt 比较长的话，还是会偶尔出现（概率小很多）

Aug 17, 2023

ubuntu，pytorch 2.0.0，transformer == 4.31.0，GPU is A5000(8bit量化) 和 A40(不量化)

Baichuan Intelligent Technology org Aug 17, 2023

ubuntu，pytorch 2.0.0，transformer == 4.31.0，GPU is A5000(8bit量化) 和 A40(不量化)

这就很奇怪了。如果能构建出一个必然出问题的case就好了。

Aug 17, 2023

今天有一个 prompt 很长的例子，复现了错误，然后我把 config 文件里的 do_sample 改为 false，这个 prompt 就没复现错误了

Baichuan Intelligent Technology org Aug 17, 2023

你是单卡还是多卡？fp16推的？

Aug 17, 2023

但 do_sample 改为 false 后，刚开始还行，提几次问题，生成速度就会明显变慢 ........

Aug 17, 2023

你是单卡还是多卡？fp16推的？

13b，A40单卡，fp16

Baichuan Intelligent Technology org Aug 17, 2023

你是单卡还是多卡？fp16推的？

13b，A40单卡，fp16

你可以把你长的prompt发我，我看看能不能复现。

Baichuan Intelligent Technology org Aug 17, 2023

但 do_sample 改为 false 后，刚开始还行，提几次问题，生成速度就会明显变慢 ........

变慢是因为多轮对话会把之前的内容拼接起来，导致context很长。

Aug 17, 2023

但 do_sample 改为 false 后，刚开始还行，提几次问题，生成速度就会明显变慢 ........

变慢是因为多轮对话会把之前的内容拼接起来，导致context很长。

不是这样，chat函数的messages参数，我会控制整体的长度，包括history和注入的prompt，同样的messages长度，开始和之后，速度不一样

Aug 17, 2023

而且，这个现象，在do_sample没改之前，是不存在的

Aug 17, 2023

你是单卡还是多卡？fp16推的？

13b，A40单卡，fp16

你可以把你长的prompt发我，我看看能不能复现。

不太方便

Aug 17, 2023

https://github.com/THUDM/ChatGLM-6B/issues/31

Aug 17, 2023

这个链接可以参考下，我根据其中的建议，加了half和改了do_sample

Aug 17, 2023

但 do_sample 改为 false 后，刚开始还行，提几次问题，生成速度就会明显变慢 ........

我发现这个问题，是因为用stream方式中途取消生成后（即stream方式未生成完就停止了），新的生成就会变慢 .......

Aug 18, 2023

好像就是 do_sample = True 引起的 ...

Baichuan Intelligent Technology org Aug 19, 2023

这个就不清楚了，我是do_sample测试的。

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment