Example fails to run: ValueError: not enough values to unpack (expected 2, got 1)
#1
by
arnocandel
- opened
model=huggingface/falcon-40b-gptq
num_shard=2
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:0.8 --model-id $model --num-shard $num_shard --quantize gptq
2023-06-28T23:51:56.447415Z INFO text_generation_launcher: Args { model_id: "huggingface/falcon-40b-gptq", revision: None, sharded: None, num_shard: Some(2), quantize: Some(Gptq), trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1000, max_total_tokens: 1512, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, env: false }
2023-06-28T23:51:56.447443Z INFO text_generation_launcher: Sharding model on 2 processes
2023-06-28T23:51:56.447520Z INFO text_generation_launcher: Starting download process.
2023-06-28T23:51:58.197443Z INFO download: text_generation_launcher: Files are already present on the host. Skipping download.
2023-06-28T23:51:58.450323Z INFO text_generation_launcher: Successfully downloaded weights.
2023-06-28T23:51:58.450498Z INFO text_generation_launcher: Starting shard 0
2023-06-28T23:51:58.450721Z INFO text_generation_launcher: Starting shard 1
2023-06-28T23:52:04.793519Z ERROR shard-manager: text_generation_launcher: Error when initializing model
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 67, in serve
server.serve(model_id, revision, sharded, quantize, trust_remote_code, uds_path)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 155, in serve
asyncio.run(serve_inner(model_id, revision, sharded, quantize, trust_remote_code))
File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
self._run_once()
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
handle._run()
File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 124, in serve_inner
model = get_model(model_id, revision, sharded, quantize, trust_remote_code)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 209, in get_model
return FlashRWSharded(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_rw.py", line 150, in __init__
self.load_weights(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_rw.py", line 186, in load_weights
module_name, param_name = name.rsplit(".", 1)
ValueError: not enough values to unpack (expected 2, got 1)
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id $model --num-shard $num_shard --quantize gptq
works now that https://github.com/huggingface/text-generation-inference/pull/438 has been merged.
arnocandel
changed discussion status to
closed