Python Error When Interacting With Model
Hello,
I am receiving a handful of warnings/errors via model logs when uploading .WAV files from Gradio controller. They seem to be a fairly common issue with no solution? Transformer version issue? If so, how can this be resolved if LLama model requires transformer version to be current or near-current?
I have seen a solution to delete the generation_config.json
in the model_name_or_path
folder as a solution but this was not successful. Hope someone has some insights here as I am still fairly new to Python and the realm of NLP's. Basically where my head is at: these seem to be warnings in the error descriptions up at the top; however, something seems to be breaking the model from running entirely?? Transformer?? Something??
Assumed Errors
1)~/miniconda3/envs/llama-omni/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:567: UserWarning:
do_sample
is set to False
. However, temperature
is set to 0.0
-- this flag is only used in sample-based generation mo>
2024-10-27 20:48:23 | ERROR | stderr | warnings.warn(
2~/miniconda3/envs/llama-omni/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:572: UserWarning:
do_sample
is set to False
. However, top_p
is set to 0.7
-- this flag is only used in sample-based generation modes. Y>
2024-10-27 20:48:23 | ERROR | stderr | warnings.warn(
3WARNING | transformers.generation.utils | The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's
attention_mask
to obtain reliable results.
Here is a more robust look at the log as well:2024-10-27 20:47:50 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=40000, worker_address='http://localhost:40000', controller_address='http://localhost:10000', model_pat>
2024-10-27 20:47:53 | ERROR | stderr | ^MLoading checkpoint shards: 0%| >
2024-10-27 20:47:54 | ERROR | stderr | ^MLoading checkpoint shards: 25%|βββββββββββββββββββββββββββββββ >
2024-10-27 20:47:54 | ERROR | stderr | ^MLoading checkpoint shards: 75%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ >
2024-10-27 20:47:54 | ERROR | stderr | ^MLoading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ>
2024-10-27 20:47:54 | ERROR | stderr | ^MLoading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ>
2024-10-27 20:47:54 | ERROR | stderr |
2024-10-27 20:48:14 | INFO | model_worker | Register to controller
2024-10-27 20:48:14 | ERROR | stderr | ^[[32mINFO^[[0m: Started server process [^[[36m30772^[[0m]
2024-10-27 20:48:14 | ERROR | stderr | ^[[32mINFO^[[0m: Waiting for application startup.
2024-10-27 20:48:14 | ERROR | stderr | ^[[32mINFO^[[0m: Application startup complete.
2024-10-27 20:48:14 | ERROR | stderr | ^[[32mINFO^[[0m: Uvicorn running on ^[[1mhttp://0.0.0.0:40000^[[0m (Press CTRL+C to quit)
2024-10-27 20:48:15 | INFO | stdout | ^[[32mINFO^[[0m: 127.0.0.1:37698 - "^[[1mPOST /worker_get_status HTTP/1.1^[[0m" ^[[32m200 OK^[[0m
2024-10-27 20:48:15 | INFO | stdout | ^[[32mINFO^[[0m: 127.0.0.1:37702 - "^[[1mPOST /worker_get_status HTTP/1.1^[[0m" ^[[32m200 OK^[[0m
2024-10-27 20:48:15 | INFO | stdout | ^[[32mINFO^[[0m: 127.0.0.1:37712 - "^[[1mPOST /worker_get_status HTTP/1.1^[[0m" ^[[32m200 OK^[[0m
2024-10-27 20:48:22 | INFO | model_worker | Send heart beat. Models: ['Llama-3.1-8B-Omni']. Semaphore: Semaphore(value=4, locked=False). global_counter: 1
2024-10-27 20:48:22 | INFO | stdout | ^[[32mINFO^[[0m: 127.0.0.1:57020 - "^[[1mPOST /worker_generate_stream HTTP/1.1^[[0m" ^[[32m200 OK^[[0m
2024-10-27 20:48:23 | ERROR | stderr | /home/cookinsteve/miniconda3/envs/llama-omni/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:567: UserWarning: do_s> 2024-10-27 20:48:23 | ERROR | stderr | warnings.warn( 2024-10-27 20:48:23 | ERROR | stderr | /home/cookinsteve/miniconda3/envs/llama-omni/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:572: UserWarning:
do_s>
2024-10-27 20:48:23 | ERROR | stderr | warnings.warn(
2024-10-27 20:49:40 | INFO | stdout | ^[[32mINFO^[[0m: 127.0.0.1:56502 - "^[[1mPOST /worker_generate_stream HTTP/1.1^[[0m" ^[[32m200 OK^[[0m
2024-10-27 20:49:40 | WARNING | transformers.generation.utils | The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pas>
2024-10-27 20:49:40 | WARNING | transformers.generation.utils | The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pas>
2024-10-27 20:49:40 | WARNING | transformers.generation.utils | Setting pad_token_id
to eos_token_id
:128001 for open-end generation.
2024-10-27 20:49:40 | WARNING | transformers.generation.utils | Setting pad_token_id
to eos_token_id
:128001 for open-end generation.
2024-10-27 20:49:40 | ERROR | stderr | Exception in thread Thread-4 (generate):
2024-10-27 20:49:40 | ERROR | stderr | Traceback (most recent call last):
2024-10-27 20:49:40 | ERROR | stderr | File "/home/cookinsteve/miniconda3/envs/llama-omni/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
2024-10-27 20:49:40 | ERROR | stderr | self.run()
2024-10-27 20:49:40 | ERROR | stderr | File "/home/cookinsteve/miniconda3/envs/llama-omni/lib/python3.10/threading.py", line 953, in run
2024-10-27 20:49:40 | ERROR | stderr | self._target(*self._args, **self._kwargs)
2024-10-27 20:49:40 | ERROR | stderr | File "/home/cookinsteve/miniconda3/envs/llama-omni/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-10-27 20:49:40 | ERROR | stderr | return func(*args, **kwargs)
2024-10-27 20:49:40 | ERROR | stderr | File "/home/cookinsteve/LLaMA-Omni/omni_speech/model/language_model/omni_speech2s_llama.py", line 167, in generate
2024-10-27 20:49:40 | ERROR | stderr | outputs = GenerationWithCTC.generate(
2024-10-27 20:49:40 | ERROR | stderr | File "/home/cookinsteve/miniconda3/envs/llama-omni/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-10-27 20:49:40 | ERROR | stderr | return func(*args, **kwargs)
2024-10-27 20:49:40 | ERROR | stderr | File "/home/cookinsteve/LLaMA-Omni/omni_speech/model/speech_generator/generation.py", line 281, in generate
2024-10-27 20:49:40 | ERROR | stderr | return self._sample_streaming_unit(
2024-10-27 20:49:40 | ERROR | stderr | File "/home/cookinsteve/LLaMA-Omni/omni_speech/model/speech_generator/generation.py", line 551, in _sample_streaming_unit
2024-10-27 20:49:40 | ERROR | stderr | ctc_pred = self.speech_generator.predict(hidden_states.squeeze(0))
2024-10-27 20:49:40 | ERROR | stderr | File "/home/cookinsteve/LLaMA-Omni/omni_speech/model/speech_generator/speech_generator.py", line 100, in predict
2024-10-27 20:49:40 | ERROR | stderr | layer_outputs = layer(
2024-10-27 20:49:40 | ERROR | stderr | File "/home/cookinsteve/miniconda3/envs/llama-omni/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
2024-10-27 20:49:40 | ERROR | stderr | return self._call_impl(*args, **kwargs)
2024-10-27 20:49:40 | ERROR | stderr | File "/home/cookinsteve/miniconda3/envs/llama-omni/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
2024-10-27 20:49:40 | ERROR | stderr | return forward_call(*args, **kwargs)
2024-10-27 20:49:40 | ERROR | stderr | File "/home/cookinsteve/miniconda3/envs/llama-omni/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 500, in for>
2024-10-27 20:49:40 | ERROR | stderr | attn_output = _flash_attention_forward(
2024-10-27 20:49:40 | ERROR | stderr | File "/home/cookinsteve/miniconda3/envs/llama-omni/lib/python3.10/site-packages/transformers/modeling_flash_attention_utils.py", line 180, in >
2024-10-27 20:49:40 | ERROR | stderr | _flash_supports_window_size and sliding_window is not None and key_states.shape[1] > sliding_window
2024-10-27 20:49:40 | ERROR | stderr | NameError: name '_flash_supports_window_size' is not defined
2024-10-27 20:49:44 | INFO | model_worker | Send heart beat. Models: ['Llama-3.1-8B-Omni']. Semaphore: Semaphore(value=4, locked=False). global_counter: 2
2024-10-27 20:49:55 | INFO | stdout | Caught Unknown Error
2024-10-27 20:49:55 | INFO | model_worker | Send heart beat. Models: ['Llama-3.1-8B-Omni']. Semaphore: Semaphore(value=5, locked=False). global_counter: 2
2024-10-27 20:49:59 | INFO | model_worker | Send heart beat. Models: ['Llama-3.1-8B-Omni']. Semaphore: Semaphore(value=5, locked=False). global_counter: 2
2024-10-27 20:50:14 | INFO | model_worker | Send heart beat. Models: ['Llama-3.1-8B-Omni']. Semaphore: Semaphore(value=5, locked=False). global_counter: 2
2024-10-27 20:50:29 | INFO | model_worker | Send heart beat. Models: ['Llama-3.1-8B-Omni']. Semaphore: Semaphore(value=5, locked=False). global_counter: 2
2024-10-27 20:50:44 | INFO | model_worker | Send heart beat. Models: ['Llama-3.1-8B-Omni']. Semaphore: Semaphore(value=5, locked=False). global_counter: 2
2024-10-27 20:50:59 | INFO | model_worker | Send heart beat. Models: ['Llama-3.1-8B-Omni']. Semaphore: Semaphore(value=5, locked=False). global_counter: 2
2024-10-27 20:51:14 | INFO | model_worker | Send heart beat. Models: ['Llama-3.1-8B-Omni']. Semaphore: Semaphore(value=5, locked=False). global_counter: 2
2024-10-27 20:51:29 | INFO | model_worker | Send heart beat. Models: ['Llama-3.1-8B-Omni']. Semaphore: Semaphore(value=5, locked=False). global_counter: 2
2024-10-27 20:51:44 | INFO | model_worker | Send heart beat. Models: ['Llama-3.1-8B-Omni']. Semaphore: Semaphore(value=5, locked=False). global_counter: 2
2024-10-27 20:51:51 | ERROR | stderr | ^[[32mINFO^[[0m: Shutting down
2024-10-27 20:51:51 | ERROR | stderr | ^[[32mINFO^[[0m: Waiting for application shutdown.
2024-10-27 20:51:51 | ERROR | stderr | ^[[32mINFO^[[0m: Application shutdown complete.
2024-10-27 20:51:51 | ERROR | stderr | ^[[32mINFO^[[0m: Finished server process [^[[36m30772^[[0m]