m3_ai / output.log
YoungPanda's picture
Upload folder using huggingface_hub
12aafa1 verified
raw
history blame contribute delete
No virus
197 kB
[2024-05-29 00:46:30,045] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2
 [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible
[2024-05-29 00:46:30,940] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-05-29 00:46:30,940] [INFO] [runner.py:568:main] cmd = /opt/conda/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=29500 --module --enable_each_rank_log=None FlagEmbedding.BGE_M3.run --knowledge_distillation True --output_dir new_mode --model_name_or_path BAAI/bge-m3 --normlized True --temperature 0.02 --do_train --train_data new_dataset --cache_path /home/datasets/.cache --per_device_train_batch_size 6 --query_max_len 512 --passage_max_len 8192 --small_threshold 200 --drop_threshold 200 --bf16 --save_steps 1500 --train_group_size 6 --learning_rate 5e-6 --num_train_epochs 3 --max_steps -1 --negatives_cross_device False --logging_steps 10 --warmup_ratio 0.1 --weight_decay 0.01 --overwrite_output_dir True --gradient_checkpointing --sentence_pooling_method cls --same_task_within_batch True --shuffle_ratio 0.002 --enable_sub_batch True --deepspeed train/ds_config.json --unified_finetuning True --use_self_distill True
[2024-05-29 00:46:33,003] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2
 [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible
[2024-05-29 00:46:33,888] [INFO] [launch.py:139:main] 0 NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.17.1-1+cuda12.1
[2024-05-29 00:46:33,888] [INFO] [launch.py:139:main] 0 NV_LIBNCCL_DEV_PACKAGE_VERSION=2.17.1-1
[2024-05-29 00:46:33,888] [INFO] [launch.py:139:main] 0 NCCL_VERSION=2.17.1-1
[2024-05-29 00:46:33,888] [INFO] [launch.py:139:main] 0 NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev
[2024-05-29 00:46:33,888] [INFO] [launch.py:139:main] 0 NV_LIBNCCL_PACKAGE=libnccl2=2.17.1-1+cuda12.1
[2024-05-29 00:46:33,888] [INFO] [launch.py:139:main] 0 NV_LIBNCCL_PACKAGE_NAME=libnccl2
[2024-05-29 00:46:33,888] [INFO] [launch.py:139:main] 0 NV_LIBNCCL_PACKAGE_VERSION=2.17.1-1
[2024-05-29 00:46:33,888] [INFO] [launch.py:146:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}
[2024-05-29 00:46:33,888] [INFO] [launch.py:152:main] nnodes=1, num_local_procs=8, node_rank=0
[2024-05-29 00:46:33,888] [INFO] [launch.py:163:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})
[2024-05-29 00:46:33,888] [INFO] [launch.py:164:main] dist_world_size=8
[2024-05-29 00:46:33,888] [INFO] [launch.py:168:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
[2024-05-29 00:46:33,889] [INFO] [launch.py:256:main] process 16380 spawned with command: ['/opt/conda/bin/python', '-u', '-m', 'FlagEmbedding.BGE_M3.run', '--local_rank=0', '--knowledge_distillation', 'True', '--output_dir', 'new_mode', '--model_name_or_path', 'BAAI/bge-m3', '--normlized', 'True', '--temperature', '0.02', '--do_train', '--train_data', 'new_dataset', '--cache_path', '/home/datasets/.cache', '--per_device_train_batch_size', '6', '--query_max_len', '512', '--passage_max_len', '8192', '--small_threshold', '200', '--drop_threshold', '200', '--bf16', '--save_steps', '1500', '--train_group_size', '6', '--learning_rate', '5e-6', '--num_train_epochs', '3', '--max_steps', '-1', '--negatives_cross_device', 'False', '--logging_steps', '10', '--warmup_ratio', '0.1', '--weight_decay', '0.01', '--overwrite_output_dir', 'True', '--gradient_checkpointing', '--sentence_pooling_method', 'cls', '--same_task_within_batch', 'True', '--shuffle_ratio', '0.002', '--enable_sub_batch', 'True', '--deepspeed', 'train/ds_config.json', '--unified_finetuning', 'True', '--use_self_distill', 'True']
[2024-05-29 00:46:33,889] [INFO] [launch.py:256:main] process 16381 spawned with command: ['/opt/conda/bin/python', '-u', '-m', 'FlagEmbedding.BGE_M3.run', '--local_rank=1', '--knowledge_distillation', 'True', '--output_dir', 'new_mode', '--model_name_or_path', 'BAAI/bge-m3', '--normlized', 'True', '--temperature', '0.02', '--do_train', '--train_data', 'new_dataset', '--cache_path', '/home/datasets/.cache', '--per_device_train_batch_size', '6', '--query_max_len', '512', '--passage_max_len', '8192', '--small_threshold', '200', '--drop_threshold', '200', '--bf16', '--save_steps', '1500', '--train_group_size', '6', '--learning_rate', '5e-6', '--num_train_epochs', '3', '--max_steps', '-1', '--negatives_cross_device', 'False', '--logging_steps', '10', '--warmup_ratio', '0.1', '--weight_decay', '0.01', '--overwrite_output_dir', 'True', '--gradient_checkpointing', '--sentence_pooling_method', 'cls', '--same_task_within_batch', 'True', '--shuffle_ratio', '0.002', '--enable_sub_batch', 'True', '--deepspeed', 'train/ds_config.json', '--unified_finetuning', 'True', '--use_self_distill', 'True']
[2024-05-29 00:46:33,890] [INFO] [launch.py:256:main] process 16382 spawned with command: ['/opt/conda/bin/python', '-u', '-m', 'FlagEmbedding.BGE_M3.run', '--local_rank=2', '--knowledge_distillation', 'True', '--output_dir', 'new_mode', '--model_name_or_path', 'BAAI/bge-m3', '--normlized', 'True', '--temperature', '0.02', '--do_train', '--train_data', 'new_dataset', '--cache_path', '/home/datasets/.cache', '--per_device_train_batch_size', '6', '--query_max_len', '512', '--passage_max_len', '8192', '--small_threshold', '200', '--drop_threshold', '200', '--bf16', '--save_steps', '1500', '--train_group_size', '6', '--learning_rate', '5e-6', '--num_train_epochs', '3', '--max_steps', '-1', '--negatives_cross_device', 'False', '--logging_steps', '10', '--warmup_ratio', '0.1', '--weight_decay', '0.01', '--overwrite_output_dir', 'True', '--gradient_checkpointing', '--sentence_pooling_method', 'cls', '--same_task_within_batch', 'True', '--shuffle_ratio', '0.002', '--enable_sub_batch', 'True', '--deepspeed', 'train/ds_config.json', '--unified_finetuning', 'True', '--use_self_distill', 'True']
[2024-05-29 00:46:33,890] [INFO] [launch.py:256:main] process 16383 spawned with command: ['/opt/conda/bin/python', '-u', '-m', 'FlagEmbedding.BGE_M3.run', '--local_rank=3', '--knowledge_distillation', 'True', '--output_dir', 'new_mode', '--model_name_or_path', 'BAAI/bge-m3', '--normlized', 'True', '--temperature', '0.02', '--do_train', '--train_data', 'new_dataset', '--cache_path', '/home/datasets/.cache', '--per_device_train_batch_size', '6', '--query_max_len', '512', '--passage_max_len', '8192', '--small_threshold', '200', '--drop_threshold', '200', '--bf16', '--save_steps', '1500', '--train_group_size', '6', '--learning_rate', '5e-6', '--num_train_epochs', '3', '--max_steps', '-1', '--negatives_cross_device', 'False', '--logging_steps', '10', '--warmup_ratio', '0.1', '--weight_decay', '0.01', '--overwrite_output_dir', 'True', '--gradient_checkpointing', '--sentence_pooling_method', 'cls', '--same_task_within_batch', 'True', '--shuffle_ratio', '0.002', '--enable_sub_batch', 'True', '--deepspeed', 'train/ds_config.json', '--unified_finetuning', 'True', '--use_self_distill', 'True']
[2024-05-29 00:46:33,891] [INFO] [launch.py:256:main] process 16384 spawned with command: ['/opt/conda/bin/python', '-u', '-m', 'FlagEmbedding.BGE_M3.run', '--local_rank=4', '--knowledge_distillation', 'True', '--output_dir', 'new_mode', '--model_name_or_path', 'BAAI/bge-m3', '--normlized', 'True', '--temperature', '0.02', '--do_train', '--train_data', 'new_dataset', '--cache_path', '/home/datasets/.cache', '--per_device_train_batch_size', '6', '--query_max_len', '512', '--passage_max_len', '8192', '--small_threshold', '200', '--drop_threshold', '200', '--bf16', '--save_steps', '1500', '--train_group_size', '6', '--learning_rate', '5e-6', '--num_train_epochs', '3', '--max_steps', '-1', '--negatives_cross_device', 'False', '--logging_steps', '10', '--warmup_ratio', '0.1', '--weight_decay', '0.01', '--overwrite_output_dir', 'True', '--gradient_checkpointing', '--sentence_pooling_method', 'cls', '--same_task_within_batch', 'True', '--shuffle_ratio', '0.002', '--enable_sub_batch', 'True', '--deepspeed', 'train/ds_config.json', '--unified_finetuning', 'True', '--use_self_distill', 'True']
[2024-05-29 00:46:33,891] [INFO] [launch.py:256:main] process 16385 spawned with command: ['/opt/conda/bin/python', '-u', '-m', 'FlagEmbedding.BGE_M3.run', '--local_rank=5', '--knowledge_distillation', 'True', '--output_dir', 'new_mode', '--model_name_or_path', 'BAAI/bge-m3', '--normlized', 'True', '--temperature', '0.02', '--do_train', '--train_data', 'new_dataset', '--cache_path', '/home/datasets/.cache', '--per_device_train_batch_size', '6', '--query_max_len', '512', '--passage_max_len', '8192', '--small_threshold', '200', '--drop_threshold', '200', '--bf16', '--save_steps', '1500', '--train_group_size', '6', '--learning_rate', '5e-6', '--num_train_epochs', '3', '--max_steps', '-1', '--negatives_cross_device', 'False', '--logging_steps', '10', '--warmup_ratio', '0.1', '--weight_decay', '0.01', '--overwrite_output_dir', 'True', '--gradient_checkpointing', '--sentence_pooling_method', 'cls', '--same_task_within_batch', 'True', '--shuffle_ratio', '0.002', '--enable_sub_batch', 'True', '--deepspeed', 'train/ds_config.json', '--unified_finetuning', 'True', '--use_self_distill', 'True']
[2024-05-29 00:46:33,892] [INFO] [launch.py:256:main] process 16386 spawned with command: ['/opt/conda/bin/python', '-u', '-m', 'FlagEmbedding.BGE_M3.run', '--local_rank=6', '--knowledge_distillation', 'True', '--output_dir', 'new_mode', '--model_name_or_path', 'BAAI/bge-m3', '--normlized', 'True', '--temperature', '0.02', '--do_train', '--train_data', 'new_dataset', '--cache_path', '/home/datasets/.cache', '--per_device_train_batch_size', '6', '--query_max_len', '512', '--passage_max_len', '8192', '--small_threshold', '200', '--drop_threshold', '200', '--bf16', '--save_steps', '1500', '--train_group_size', '6', '--learning_rate', '5e-6', '--num_train_epochs', '3', '--max_steps', '-1', '--negatives_cross_device', 'False', '--logging_steps', '10', '--warmup_ratio', '0.1', '--weight_decay', '0.01', '--overwrite_output_dir', 'True', '--gradient_checkpointing', '--sentence_pooling_method', 'cls', '--same_task_within_batch', 'True', '--shuffle_ratio', '0.002', '--enable_sub_batch', 'True', '--deepspeed', 'train/ds_config.json', '--unified_finetuning', 'True', '--use_self_distill', 'True']
[2024-05-29 00:46:33,892] [INFO] [launch.py:256:main] process 16387 spawned with command: ['/opt/conda/bin/python', '-u', '-m', 'FlagEmbedding.BGE_M3.run', '--local_rank=7', '--knowledge_distillation', 'True', '--output_dir', 'new_mode', '--model_name_or_path', 'BAAI/bge-m3', '--normlized', 'True', '--temperature', '0.02', '--do_train', '--train_data', 'new_dataset', '--cache_path', '/home/datasets/.cache', '--per_device_train_batch_size', '6', '--query_max_len', '512', '--passage_max_len', '8192', '--small_threshold', '200', '--drop_threshold', '200', '--bf16', '--save_steps', '1500', '--train_group_size', '6', '--learning_rate', '5e-6', '--num_train_epochs', '3', '--max_steps', '-1', '--negatives_cross_device', 'False', '--logging_steps', '10', '--warmup_ratio', '0.1', '--weight_decay', '0.01', '--overwrite_output_dir', 'True', '--gradient_checkpointing', '--sentence_pooling_method', 'cls', '--same_task_within_batch', 'True', '--shuffle_ratio', '0.002', '--enable_sub_batch', 'True', '--deepspeed', 'train/ds_config.json', '--unified_finetuning', 'True', '--use_self_distill', 'True']
[2024-05-29 00:46:38,585] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2
 [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible
[2024-05-29 00:46:39,041] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
[2024-05-29 00:46:39,152] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-05-29 00:46:39,156] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[2024-05-29 00:46:39,240] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
[2024-05-29 00:46:39,253] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-05-29 00:46:39,254] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-05-29 00:46:39,295] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[2024-05-29 00:46:39,310] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2
 [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible
[2024-05-29 00:46:39,583] [INFO] [comm.py:637:init_distributed] cdb=None
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2
 [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2
 [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2
 [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible
[2024-05-29 00:46:39,710] [INFO] [comm.py:637:init_distributed] cdb=None
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2
 [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2
 [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible
[2024-05-29 00:46:39,810] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-05-29 00:46:39,816] [INFO] [comm.py:637:init_distributed] cdb=None
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2
 [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible
[2024-05-29 00:46:39,839] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-05-29 00:46:39,889] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-05-29 00:46:39,890] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-05-29 00:46:39,947] [INFO] [comm.py:637:init_distributed] cdb=None
05/29/2024 00:46:40 - WARNING - __main__ - Process rank: 7, device: cuda:7, n_gpu: 1, distributed training: True, 16-bits training: False
05/29/2024 00:46:40 - WARNING - __main__ - Process rank: 6, device: cuda:6, n_gpu: 1, distributed training: True, 16-bits training: False
05/29/2024 00:46:41 - WARNING - __main__ - Process rank: 4, device: cuda:4, n_gpu: 1, distributed training: True, 16-bits training: False
05/29/2024 00:46:41 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, 16-bits training: False
05/29/2024 00:46:41 - INFO - __main__ - Training/evaluation parameters RetrieverTrainingArguments(
_n_gpu=1,
accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None},
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
batch_eval_metrics=False,
bf16=True,
bf16_full_eval=False,
colbert_dim=-1,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
dataloader_prefetch_factor=None,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=train/ds_config.json,
disable_tqdm=False,
dispatch_batches=None,
do_eval=False,
do_predict=False,
do_train=True,
enable_sub_batch=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_do_concat_batches=True,
eval_steps=None,
eval_strategy=no,
evaluation_strategy=None,
fix_encoder=False,
fix_position_embedding=False,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=True,
gradient_checkpointing_kwargs=None,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_num_input_tokens_seen=False,
include_tokens_per_second=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=5e-06,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=new_mode/runs/May29_00-46-37_09cfdefd9b99,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=10,
logging_strategy=steps,
lr_scheduler_kwargs={},
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
neftune_noise_alpha=None,
negatives_cross_device=False,
no_cuda=False,
normlized=True,
num_train_epochs=3.0,
optim=adamw_torch,
optim_args=None,
optim_target_modules=None,
output_dir=new_mode,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=6,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['wandb'],
restore_callback_states_from_checkpoint=False,
resume_from_checkpoint=None,
run_name=new_mode,
save_on_each_node=False,
save_only_model=False,
save_safetensors=True,
save_steps=1500,
save_strategy=steps,
save_total_limit=None,
seed=42,
self_distill_start_step=-1,
sentence_pooling_method=cls,
skip_memory_metrics=True,
split_batches=None,
temperature=0.02,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
unified_finetuning=True,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
use_self_distill=True,
warmup_ratio=0.1,
warmup_steps=0,
weight_decay=0.01,
)
05/29/2024 00:46:41 - INFO - __main__ - Model parameters ModelArguments(model_name_or_path='BAAI/bge-m3', config_name=None, tokenizer_name=None, cache_dir=None)
05/29/2024 00:46:41 - INFO - __main__ - Data parameters DataArguments(knowledge_distillation=True, train_data=['new_dataset'], cache_path='/home/datasets/.cache', train_group_size=6, query_max_len=512, passage_max_len=8192, max_example_num_per_dataset=None, query_instruction_for_retrieval=None, passage_instruction_for_retrieval=None, same_task_within_batch=True, shuffle_ratio=0.002, small_threshold=200, drop_threshold=200)
05/29/2024 00:46:41 - WARNING - __main__ - Process rank: 5, device: cuda:5, n_gpu: 1, distributed training: True, 16-bits training: False
05/29/2024 00:46:41 - WARNING - __main__ - Process rank: 1, device: cuda:1, n_gpu: 1, distributed training: True, 16-bits training: False
05/29/2024 00:46:41 - WARNING - __main__ - Process rank: 2, device: cuda:2, n_gpu: 1, distributed training: True, 16-bits training: False
05/29/2024 00:46:41 - WARNING - __main__ - Process rank: 3, device: cuda:3, n_gpu: 1, distributed training: True, 16-bits training: False
/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
Fetching 30 files: 0%| | 0/30 [00:00<?, ?it/s] Fetching 30 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 30/30 [00:00<00:00, 111748.77it/s]
Fetching 30 files: 0%| | 0/30 [00:00<?, ?it/s] Fetching 30 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 30/30 [00:00<00:00, 148208.62it/s]
/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
05/29/2024 00:46:43 - INFO - __main__ - Config: XLMRobertaConfig {
"_name_or_path": "BAAI/bge-m3",
"architectures": [
"XLMRobertaModel"
],
"attention_probs_dropout_prob": 0.1,
"bos_token_id": 0,
"classifier_dropout": null,
"eos_token_id": 2,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 1024,
"id2label": {
"0": "LABEL_0"
},
"initializer_range": 0.02,
"intermediate_size": 4096,
"label2id": {
"LABEL_0": 0
},
"layer_norm_eps": 1e-05,
"max_position_embeddings": 8194,
"model_type": "xlm-roberta",
"num_attention_heads": 16,
"num_hidden_layers": 24,
"output_past": true,
"pad_token_id": 1,
"position_embedding_type": "absolute",
"torch_dtype": "float32",
"transformers_version": "4.41.1",
"type_vocab_size": 1,
"use_cache": true,
"vocab_size": 250002
}
Fetching 30 files: 0%| | 0/30 [00:00<?, ?it/s] Fetching 30 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 30/30 [00:00<00:00, 210065.31it/s]
/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
Fetching 30 files: 0%| | 0/30 [00:00<?, ?it/s] Fetching 30 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 30/30 [00:00<00:00, 173557.41it/s]
Fetching 30 files: 0%| | 0/30 [00:00<?, ?it/s] Fetching 30 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 30/30 [00:00<00:00, 131758.24it/s]
Fetching 30 files: 0%| | 0/30 [00:00<?, ?it/s] Fetching 30 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 30/30 [00:00<00:00, 157878.44it/s]
Fetching 30 files: 0%| | 0/30 [00:00<?, ?it/s] Fetching 30 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 30/30 [00:00<00:00, 177474.08it/s]
Fetching 30 files: 0%| | 0/30 [00:00<?, ?it/s] Fetching 30 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 30/30 [00:00<00:00, 147860.31it/s]
05/29/2024 00:47:09 - INFO - FlagEmbedding.BGE_M3.modeling - loading existing colbert_linear and sparse_linear---------
=========================
Batch Size Dict:
['0-500: 6',
'500-1000: 6',
'1000-2000: 6',
'2000-3000: 6',
'3000-4000: 6',
'4000-5000: 6',
'5000-6000: 6',
'6000-7000: 6',
'7000-inf: 6']
=========================
loading data from new_dataset/train_dataset_len-0-500.jsonl ...
loading data from new_dataset/train_dataset_len-500-1000.jsonl ...
loading data from new_dataset/train_dataset_len-1000-2000.jsonl ...
loading data from new_dataset/train_dataset_len-2000-3000.jsonl ...
loading data from new_dataset/train_dataset_len-3000-4000.jsonl ...
loading data from new_dataset/train_dataset_len-4000-5000.jsonl ...
loading data from new_dataset/train_dataset_len-5000-6000.jsonl ...
loading data from new_dataset/train_dataset_len-6000-7000.jsonl ...
---------------------------*Rank 6: refresh data---------------------------
---------------------------*Rank 7: refresh data---------------------------
loading data from new_dataset/train_dataset_len-7000-inf.jsonl ...
---------------------------*Rank 2: refresh data---------------------------
---------------------------*Rank 0: refresh data---------------------------
---------------------------*Rank 5: refresh data---------------------------
---------------------------*Rank 4: refresh data---------------------------
---------------------------*Rank 1: refresh data---------------------------
---------------------------*Rank 3: refresh data---------------------------
Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py310_cu121/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_adam...
Time to load fused_adam op: 0.08090758323669434 seconds
Loading extension module fused_adam...
Time to load fused_adam op: 0.10301613807678223 seconds
Loading extension module fused_adam...
Loading extension module fused_adam...
Time to load fused_adam op: 0.10164332389831543 seconds
Time to load fused_adam op: 0.10223555564880371 seconds
Loading extension module fused_adam...
Time to load fused_adam op: 0.10178637504577637 seconds
Loading extension module fused_adam...
Time to load fused_adam op: 0.10168576240539551 seconds
Loading extension module fused_adam...
Time to load fused_adam op: 0.10281610488891602 seconds
Loading extension module fused_adam...
Time to load fused_adam op: 0.10135197639465332 seconds
/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
wandb: Currently logged in as: andrewohdang (dangfutures). Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.17.0
wandb: Run data is saved locally in /FlagEmbedding/wandb/run-20240529_004739-5f9gceeo
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run new_mode
wandb: ⭐️ View project at https://wandb.ai/dangfutures/huggingface
wandb: πŸš€ View run at https://wandb.ai/dangfutures/huggingface/runs/5f9gceeo
0%| | 0/1788 [00:00<?, ?it/s]/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
0%| | 1/1788 [00:03<1:52:44, 3.79s/it] 0%| | 2/1788 [00:09<2:20:26, 4.72s/it] 0%| | 3/1788 [00:14<2:26:13, 4.92s/it] 0%| | 4/1788 [00:19<2:29:25, 5.03s/it] 0%| | 5/1788 [00:24<2:33:42, 5.17s/it] 0%| | 6/1788 [00:27<2:04:36, 4.20s/it] 0%| | 7/1788 [00:32<2:16:04, 4.58s/it] 0%| | 8/1788 [00:37<2:21:00, 4.75s/it] 1%| | 9/1788 [00:40<1:58:06, 3.98s/it] 1%| | 10/1788 [00:43<1:55:04, 3.88s/it] {'loss': 0.2636, 'grad_norm': 12.310157775878906, 'learning_rate': 2.219407982341937e-06, 'epoch': 0.02}
1%| | 10/1788 [00:46<1:55:04, 3.88s/it] 1%| | 11/1788 [00:49<2:08:14, 4.33s/it] 1%| | 12/1788 [00:51<1:49:51, 3.71s/it] 1%| | 13/1788 [00:55<1:53:45, 3.85s/it] 1%| | 14/1788 [00:59<1:54:09, 3.86s/it] 1%| | 15/1788 [01:04<2:07:09, 4.30s/it] 1%| | 16/1788 [01:10<2:18:32, 4.69s/it] 1%| | 17/1788 [01:12<1:56:54, 3.96s/it] 1%| | 18/1788 [01:18<2:14:09, 4.55s/it] 1%| | 19/1788 [01:20<1:53:32, 3.85s/it] 1%| | 20/1788 [01:25<2:05:12, 4.25s/it] {'loss': 0.2521, 'grad_norm': 12.777554512023926, 'learning_rate': 2.887516357642935e-06, 'epoch': 0.03}
1%| | 20/1788 [01:25<2:05:12, 4.25s/it] 1%| | 21/1788 [01:31<2:19:28, 4.74s/it] 1%| | 22/1788 [01:35<2:09:12, 4.39s/it] 1%|▏ | 23/1788 [01:37<1:50:37, 3.76s/it] 1%|▏ | 24/1788 [01:39<1:37:56, 3.33s/it] 1%|▏ | 25/1788 [01:42<1:28:27, 3.01s/it] 1%|▏ | 26/1788 [01:47<1:49:44, 3.74s/it] 2%|▏ | 27/1788 [01:52<2:02:10, 4.16s/it] 2%|▏ | 28/1788 [01:55<1:45:14, 3.59s/it] 2%|▏ | 29/1788 [01:57<1:33:30, 3.19s/it] 2%|▏ | 30/1788 [01:59<1:25:14, 2.91s/it] {'loss': 0.2234, 'grad_norm': 6.998483657836914, 'learning_rate': 3.278334703611756e-06, 'epoch': 0.05}
2%|▏ | 30/1788 [01:59<1:25:14, 2.91s/it] 2%|▏ | 31/1788 [02:05<1:50:12, 3.76s/it] 2%|▏ | 32/1788 [03:25<13:03:28, 26.77s/it] 2%|▏ | 33/1788 [03:28<9:29:31, 19.47s/it] 2%|▏ | 34/1788 [03:30<6:58:31, 14.32s/it] 2%|▏ | 35/1788 [03:35<5:37:53, 11.56s/it] 2%|▏ | 36/1788 [03:40<4:38:09, 9.53s/it] 2%|▏ | 37/1788 [03:44<3:50:48, 7.91s/it] 2%|▏ | 38/1788 [03:46<3:01:44, 6.23s/it] 2%|▏ | 39/1788 [03:54<3:10:34, 6.54s/it] 2%|▏ | 40/1788 [03:59<2:57:42, 6.10s/it] {'loss': 0.2513, 'grad_norm': 10.56714153289795, 'learning_rate': 3.555624732943933e-06, 'epoch': 0.07}
2%|▏ | 40/1788 [03:59<2:57:42, 6.10s/it] 2%|▏ | 41/1788 [04:01<2:24:40, 4.97s/it] 2%|▏ | 42/1788 [04:06<2:22:06, 4.88s/it] 2%|▏ | 43/1788 [04:10<2:16:03, 4.68s/it] 2%|▏ | 44/1788 [04:15<2:15:48, 4.67s/it] 3%|β–Ž | 45/1788 [04:18<2:05:11, 4.31s/it] 3%|β–Ž | 46/1788 [04:23<2:13:25, 4.60s/it] 3%|β–Ž | 47/1788 [04:29<2:20:17, 4.84s/it] 3%|β–Ž | 48/1788 [04:31<1:57:57, 4.07s/it] 3%|β–Ž | 49/1788 [04:35<1:57:14, 4.04s/it] 3%|β–Ž | 50/1788 [04:38<1:51:58, 3.87s/it] {'loss': 0.1757, 'grad_norm': 8.475292205810547, 'learning_rate': 3.770707589382874e-06, 'epoch': 0.08}
3%|β–Ž | 50/1788 [04:38<1:51:58, 3.87s/it] 3%|β–Ž | 51/1788 [04:44<2:05:14, 4.33s/it] 3%|β–Ž | 52/1788 [04:49<2:12:26, 4.58s/it] 3%|β–Ž | 53/1788 [04:51<1:52:39, 3.90s/it] 3%|β–Ž | 54/1788 [04:55<1:46:50, 3.70s/it] 3%|β–Ž | 55/1788 [04:58<1:44:31, 3.62s/it] 3%|β–Ž | 56/1788 [05:02<1:44:26, 3.62s/it] 3%|β–Ž | 57/1788 [05:07<2:01:07, 4.20s/it] 3%|β–Ž | 58/1788 [05:12<2:09:21, 4.49s/it] 3%|β–Ž | 59/1788 [05:16<2:00:10, 4.17s/it] 3%|β–Ž | 60/1788 [05:18<1:44:08, 3.62s/it] {'loss': 0.1527, 'grad_norm': 8.238431930541992, 'learning_rate': 3.946443078912754e-06, 'epoch': 0.1}
3%|β–Ž | 60/1788 [05:18<1:44:08, 3.62s/it] 3%|β–Ž | 61/1788 [05:21<1:41:40, 3.53s/it] 3%|β–Ž | 62/1788 [05:25<1:41:29, 3.53s/it] 4%|β–Ž | 63/1788 [05:27<1:31:15, 3.17s/it] 4%|β–Ž | 64/1788 [05:33<1:50:07, 3.83s/it] 4%|β–Ž | 65/1788 [05:37<1:56:10, 4.05s/it] 4%|β–Ž | 66/1788 [05:42<1:59:06, 4.15s/it] 4%|β–Ž | 67/1788 [05:47<2:08:48, 4.49s/it] 4%|▍ | 68/1788 [05:52<2:17:02, 4.78s/it] 4%|▍ | 69/1788 [05:56<2:03:49, 4.32s/it] 4%|▍ | 70/1788 [06:01<2:13:02, 4.65s/it] {'loss': 0.2172, 'grad_norm': 7.161587715148926, 'learning_rate': 4.095025318211104e-06, 'epoch': 0.12}
4%|▍ | 70/1788 [06:01<2:13:02, 4.65s/it] 4%|▍ | 71/1788 [06:07<2:24:08, 5.04s/it] 4%|▍ | 72/1788 [06:12<2:28:55, 5.21s/it] 4%|▍ | 73/1788 [06:15<2:03:50, 4.33s/it] 4%|▍ | 74/1788 [06:21<2:18:03, 4.83s/it] 4%|▍ | 75/1788 [06:23<1:56:12, 4.07s/it] 4%|▍ | 76/1788 [06:28<2:07:04, 4.45s/it] 4%|▍ | 77/1788 [06:34<2:19:21, 4.89s/it] 4%|▍ | 78/1788 [06:38<2:05:57, 4.42s/it] 4%|▍ | 79/1788 [06:41<1:59:09, 4.18s/it] 4%|▍ | 80/1788 [06:44<1:43:17, 3.63s/it] {'loss': 0.1594, 'grad_norm': 5.243127346038818, 'learning_rate': 4.2237331082449316e-06, 'epoch': 0.13}
4%|▍ | 80/1788 [06:44<1:43:17, 3.63s/it] 5%|▍ | 81/1788 [06:55<2:53:41, 6.11s/it] 5%|▍ | 82/1788 [07:01<2:45:18, 5.81s/it] 5%|▍ | 83/1788 [07:03<2:14:52, 4.75s/it] 5%|▍ | 84/1788 [07:08<2:18:23, 4.87s/it] 5%|▍ | 85/1788 [07:10<1:56:20, 4.10s/it] 5%|▍ | 86/1788 [07:15<2:04:40, 4.40s/it] 5%|▍ | 87/1788 [07:18<1:46:36, 3.76s/it] 5%|▍ | 88/1788 [07:22<1:54:44, 4.05s/it] 5%|▍ | 89/1788 [07:25<1:39:36, 3.52s/it] 5%|β–Œ | 90/1788 [07:27<1:29:03, 3.15s/it] {'loss': 0.1583, 'grad_norm': 7.849573612213135, 'learning_rate': 4.337261424881575e-06, 'epoch': 0.15}
5%|β–Œ | 90/1788 [07:27<1:29:03, 3.15s/it] 5%|β–Œ | 91/1788 [07:31<1:35:31, 3.38s/it] 5%|β–Œ | 92/1788 [07:36<1:52:30, 3.98s/it] 5%|β–Œ | 93/1788 [07:39<1:38:08, 3.47s/it] 5%|β–Œ | 94/1788 [07:41<1:28:40, 3.14s/it] 5%|β–Œ | 95/1788 [07:44<1:30:24, 3.20s/it] 5%|β–Œ | 96/1788 [07:49<1:46:41, 3.78s/it] 5%|β–Œ | 97/1788 [07:55<2:04:33, 4.42s/it] 5%|β–Œ | 98/1788 [08:01<2:11:05, 4.65s/it] 6%|β–Œ | 99/1788 [08:06<2:16:59, 4.87s/it] 6%|β–Œ | 100/1788 [08:08<1:55:08, 4.09s/it] {'loss': 0.1383, 'grad_norm': 11.401176452636719, 'learning_rate': 4.438815964683874e-06, 'epoch': 0.17}
6%|β–Œ | 100/1788 [08:08<1:55:08, 4.09s/it] 6%|β–Œ | 101/1788 [08:14<2:06:13, 4.49s/it] 6%|β–Œ | 102/1788 [08:19<2:13:52, 4.76s/it] 6%|β–Œ | 103/1788 [08:24<2:11:46, 4.69s/it] 6%|β–Œ | 104/1788 [08:29<2:17:00, 4.88s/it] 6%|β–Œ | 105/1788 [08:31<1:54:52, 4.10s/it] 6%|β–Œ | 106/1788 [08:33<1:39:58, 3.57s/it] 6%|β–Œ | 107/1788 [08:37<1:42:49, 3.67s/it] 6%|β–Œ | 108/1788 [08:39<1:29:38, 3.20s/it] 6%|β–Œ | 109/1788 [08:42<1:21:11, 2.90s/it] 6%|β–Œ | 110/1788 [08:46<1:32:44, 3.32s/it] {'loss': 0.1518, 'grad_norm': 5.522379398345947, 'learning_rate': 4.530683220534605e-06, 'epoch': 0.18}
6%|β–Œ | 110/1788 [08:46<1:32:44, 3.32s/it] 6%|β–Œ | 111/1788 [08:51<1:47:34, 3.85s/it] 6%|β–‹ | 112/1788 [08:53<1:34:27, 3.38s/it] 6%|β–‹ | 113/1788 [08:59<1:50:42, 3.97s/it] 6%|β–‹ | 114/1788 [09:01<1:36:35, 3.46s/it] 6%|β–‹ | 115/1788 [09:03<1:26:53, 3.12s/it] 6%|β–‹ | 116/1788 [09:09<1:51:08, 3.99s/it] 7%|β–‹ | 117/1788 [09:14<2:00:24, 4.32s/it] 7%|β–‹ | 118/1788 [09:18<1:52:57, 4.06s/it] 7%|β–‹ | 119/1788 [09:20<1:38:26, 3.54s/it] 7%|β–‹ | 120/1788 [09:25<1:47:47, 3.88s/it] {'loss': 0.1389, 'grad_norm': 7.7420806884765625, 'learning_rate': 4.614551454213752e-06, 'epoch': 0.2}
7%|β–‹ | 120/1788 [09:25<1:47:47, 3.88s/it] 7%|β–‹ | 121/1788 [09:27<1:35:04, 3.42s/it] 7%|β–‹ | 122/1788 [09:32<1:49:51, 3.96s/it] 7%|β–‹ | 123/1788 [09:37<1:55:43, 4.17s/it] 7%|β–‹ | 124/1788 [09:42<2:03:33, 4.46s/it] 7%|β–‹ | 125/1788 [09:47<2:04:14, 4.48s/it] 7%|β–‹ | 126/1788 [09:49<1:45:47, 3.82s/it] 7%|β–‹ | 127/1788 [09:51<1:33:10, 3.37s/it] 7%|β–‹ | 128/1788 [09:56<1:44:39, 3.78s/it] 7%|β–‹ | 129/1788 [10:01<1:52:20, 4.06s/it] 7%|β–‹ | 130/1788 [10:06<2:01:23, 4.39s/it] {'loss': 0.1206, 'grad_norm': 17.252323150634766, 'learning_rate': 4.691702750328465e-06, 'epoch': 0.22}
7%|β–‹ | 130/1788 [10:06<2:01:23, 4.39s/it] 7%|β–‹ | 131/1788 [10:08<1:44:03, 3.77s/it] 7%|β–‹ | 132/1788 [10:12<1:47:50, 3.91s/it] 7%|β–‹ | 133/1788 [10:16<1:44:21, 3.78s/it] 7%|β–‹ | 134/1788 [10:22<2:02:07, 4.43s/it] 8%|β–Š | 135/1788 [10:24<1:44:15, 3.78s/it] 8%|β–Š | 136/1788 [10:26<1:31:34, 3.33s/it] 8%|β–Š | 137/1788 [10:32<1:47:36, 3.91s/it] 8%|β–Š | 138/1788 [11:52<12:17:29, 26.82s/it] 8%|β–Š | 139/1788 [11:54<8:55:45, 19.49s/it] 8%|β–Š | 140/1788 [11:57<6:33:21, 14.32s/it] {'loss': 0.1468, 'grad_norm': 10.098404884338379, 'learning_rate': 4.763133693512101e-06, 'epoch': 0.23}
8%|β–Š | 140/1788 [11:57<6:33:21, 14.32s/it] 8%|β–Š | 141/1788 [11:59<4:53:40, 10.70s/it] 8%|β–Š | 142/1788 [12:01<3:44:18, 8.18s/it] 8%|β–Š | 143/1788 [12:06<3:19:06, 7.26s/it] 8%|β–Š | 144/1788 [12:11<2:57:03, 6.46s/it] 8%|β–Š | 145/1788 [12:13<2:22:24, 5.20s/it] 8%|β–Š | 146/1788 [12:15<1:58:38, 4.34s/it] 8%|β–Š | 147/1788 [12:21<2:05:25, 4.59s/it] 8%|β–Š | 148/1788 [12:27<2:16:08, 4.98s/it] 8%|β–Š | 149/1788 [12:29<1:54:22, 4.19s/it] 8%|β–Š | 150/1788 [12:31<1:38:31, 3.61s/it] {'loss': 0.1291, 'grad_norm': 4.6663994789123535, 'learning_rate': 4.8296343106526936e-06, 'epoch': 0.25}
8%|β–Š | 150/1788 [12:31<1:38:31, 3.61s/it] 8%|β–Š | 151/1788 [12:37<1:57:11, 4.30s/it] 9%|β–Š | 152/1788 [12:42<2:04:25, 4.56s/it] 9%|β–Š | 153/1788 [12:54<3:03:44, 6.74s/it] 9%|β–Š | 154/1788 [12:56<2:27:30, 5.42s/it] 9%|β–Š | 155/1788 [13:01<2:17:01, 5.03s/it] 9%|β–Š | 156/1788 [13:06<2:19:36, 5.13s/it] 9%|β–‰ | 157/1788 [13:10<2:08:41, 4.73s/it] 9%|β–‰ | 158/1788 [13:12<1:49:02, 4.01s/it] 9%|β–‰ | 159/1788 [13:17<1:59:11, 4.39s/it] 9%|β–‰ | 160/1788 [13:20<1:41:46, 3.75s/it] {'loss': 0.1602, 'grad_norm': 9.340888977050781, 'learning_rate': 4.89184148354593e-06, 'epoch': 0.27}
9%|β–‰ | 160/1788 [13:20<1:41:46, 3.75s/it] 9%|β–‰ | 161/1788 [13:22<1:29:15, 3.29s/it] 9%|β–‰ | 162/1788 [13:24<1:20:40, 2.98s/it] 9%|β–‰ | 163/1788 [13:29<1:37:06, 3.59s/it] 9%|β–‰ | 164/1788 [13:34<1:44:44, 3.87s/it] 9%|β–‰ | 165/1788 [13:38<1:46:47, 3.95s/it] 9%|β–‰ | 166/1788 [13:43<1:53:46, 4.21s/it] 9%|β–‰ | 167/1788 [13:47<1:51:58, 4.14s/it] 9%|β–‰ | 168/1788 [13:49<1:37:02, 3.59s/it] 9%|β–‰ | 169/1788 [13:51<1:26:25, 3.20s/it] 10%|β–‰ | 170/1788 [13:58<1:58:38, 4.40s/it] {'loss': 0.1237, 'grad_norm': 6.990540027618408, 'learning_rate': 4.950276140312903e-06, 'epoch': 0.29}
10%|β–‰ | 170/1788 [13:58<1:58:38, 4.40s/it] 10%|β–‰ | 171/1788 [14:01<1:41:27, 3.76s/it] 10%|β–‰ | 172/1788 [14:03<1:29:54, 3.34s/it] 10%|β–‰ | 173/1788 [15:23<11:52:43, 26.48s/it] 10%|β–‰ | 174/1788 [15:26<8:37:57, 19.26s/it] 10%|β–‰ | 175/1788 [15:30<6:37:57, 14.80s/it] 10%|β–‰ | 176/1788 [15:35<5:16:25, 11.78s/it] 10%|β–‰ | 177/1788 [15:39<4:15:16, 9.51s/it] 10%|β–‰ | 178/1788 [15:44<3:41:43, 8.26s/it] 10%|β–ˆ | 179/1788 [15:47<2:53:19, 6.46s/it] 10%|β–ˆ | 180/1788 [15:51<2:33:08, 5.71s/it] {'loss': 0.1419, 'grad_norm': 3.2124414443969727, 'learning_rate': 5e-06, 'epoch': 0.3}
10%|β–ˆ | 180/1788 [15:51<2:33:08, 5.71s/it] 10%|β–ˆ | 181/1788 [15:55<2:25:03, 5.42s/it] 10%|β–ˆ | 182/1788 [16:01<2:27:56, 5.53s/it] 10%|β–ˆ | 183/1788 [16:04<2:02:04, 4.56s/it] 10%|β–ˆ | 184/1788 [16:09<2:06:19, 4.73s/it] 10%|β–ˆ | 185/1788 [16:11<1:46:45, 4.00s/it] 10%|β–ˆ | 186/1788 [16:13<1:32:49, 3.48s/it] 10%|β–ˆ | 187/1788 [16:18<1:40:31, 3.77s/it] 11%|β–ˆ | 188/1788 [16:21<1:39:54, 3.75s/it] 11%|β–ˆ | 189/1788 [16:25<1:38:16, 3.69s/it] 11%|β–ˆ | 190/1788 [16:27<1:26:35, 3.25s/it] {'loss': 0.1106, 'grad_norm': 4.708935737609863, 'learning_rate': 4.968924798011188e-06, 'epoch': 0.32}
11%|β–ˆ | 190/1788 [16:27<1:26:35, 3.25s/it] 11%|β–ˆ | 191/1788 [16:31<1:31:30, 3.44s/it] 11%|β–ˆ | 192/1788 [16:33<1:21:53, 3.08s/it] 11%|β–ˆ | 193/1788 [16:36<1:15:17, 2.83s/it] 11%|β–ˆ | 194/1788 [16:40<1:30:15, 3.40s/it] 11%|β–ˆ | 195/1788 [16:45<1:39:57, 3.77s/it] 11%|β–ˆ | 196/1788 [16:47<1:28:06, 3.32s/it] 11%|β–ˆ | 197/1788 [16:49<1:19:48, 3.01s/it] 11%|β–ˆ | 198/1788 [16:55<1:36:45, 3.65s/it] 11%|β–ˆ | 199/1788 [17:03<2:12:01, 4.98s/it] 11%|β–ˆ | 200/1788 [17:08<2:14:36, 5.09s/it] {'loss': 0.1295, 'grad_norm': 11.444799423217773, 'learning_rate': 4.937849596022375e-06, 'epoch': 0.34}
11%|β–ˆ | 200/1788 [17:08<2:14:36, 5.09s/it] 11%|β–ˆ | 201/1788 [17:14<2:21:57, 5.37s/it] 11%|β–ˆβ– | 202/1788 [17:19<2:15:51, 5.14s/it] 11%|β–ˆβ– | 203/1788 [17:21<1:53:07, 4.28s/it] 11%|β–ˆβ– | 204/1788 [17:23<1:37:37, 3.70s/it] 11%|β–ˆβ– | 205/1788 [17:29<1:51:26, 4.22s/it] 12%|β–ˆβ– | 206/1788 [17:33<1:55:32, 4.38s/it] 12%|β–ˆβ– | 207/1788 [18:54<11:54:49, 27.13s/it] 12%|β–ˆβ– | 208/1788 [19:00<9:08:45, 20.84s/it] 12%|β–ˆβ– | 209/1788 [19:02<6:41:42, 15.26s/it] 12%|β–ˆβ– | 210/1788 [19:07<5:18:18, 12.10s/it] {'loss': 0.1212, 'grad_norm': 7.033312797546387, 'learning_rate': 4.906774394033561e-06, 'epoch': 0.35}
12%|β–ˆβ– | 210/1788 [19:07<5:18:18, 12.10s/it] 12%|β–ˆβ– | 211/1788 [19:09<3:59:44, 9.12s/it] 12%|β–ˆβ– | 212/1788 [19:11<3:05:32, 7.06s/it] 12%|β–ˆβ– | 213/1788 [19:14<2:27:50, 5.63s/it] 12%|β–ˆβ– | 214/1788 [19:19<2:29:34, 5.70s/it] 12%|β–ˆβ– | 215/1788 [19:22<2:02:56, 4.69s/it] 12%|β–ˆβ– | 216/1788 [19:34<2:59:28, 6.85s/it] 12%|β–ˆβ– | 217/1788 [19:36<2:24:00, 5.50s/it] 12%|β–ˆβ– | 218/1788 [19:39<2:08:37, 4.92s/it] 12%|β–ˆβ– | 219/1788 [19:42<1:48:09, 4.14s/it] 12%|β–ˆβ– | 220/1788 [19:47<1:57:17, 4.49s/it] {'loss': 0.0886, 'grad_norm': 7.490005016326904, 'learning_rate': 4.875699192044749e-06, 'epoch': 0.37}
12%|β–ˆβ– | 220/1788 [19:47<1:57:17, 4.49s/it] 12%|β–ˆβ– | 221/1788 [19:51<1:52:58, 4.33s/it] 12%|β–ˆβ– | 222/1788 [19:55<1:46:47, 4.09s/it] 12%|β–ˆβ– | 223/1788 [19:57<1:32:18, 3.54s/it] 13%|β–ˆβ–Ž | 224/1788 [19:59<1:22:12, 3.15s/it] 13%|β–ˆβ–Ž | 225/1788 [20:01<1:15:33, 2.90s/it] 13%|β–ˆβ–Ž | 226/1788 [20:07<1:35:16, 3.66s/it] 13%|β–ˆβ–Ž | 227/1788 [20:12<1:46:24, 4.09s/it] 13%|β–ˆβ–Ž | 228/1788 [20:15<1:40:58, 3.88s/it] 13%|β–ˆβ–Ž | 229/1788 [20:18<1:28:21, 3.40s/it] 13%|β–ˆβ–Ž | 230/1788 [20:23<1:41:55, 3.93s/it] {'loss': 0.1029, 'grad_norm': 1.6660929918289185, 'learning_rate': 4.844623990055936e-06, 'epoch': 0.39}
13%|β–ˆβ–Ž | 230/1788 [20:23<1:41:55, 3.93s/it] 13%|β–ˆβ–Ž | 231/1788 [20:25<1:29:08, 3.44s/it] 13%|β–ˆβ–Ž | 232/1788 [20:27<1:19:53, 3.08s/it] 13%|β–ˆβ–Ž | 233/1788 [20:30<1:13:28, 2.83s/it] 13%|β–ˆβ–Ž | 234/1788 [20:34<1:26:43, 3.35s/it] 13%|β–ˆβ–Ž | 235/1788 [20:36<1:18:41, 3.04s/it] 13%|β–ˆβ–Ž | 236/1788 [20:42<1:40:36, 3.89s/it] 13%|β–ˆβ–Ž | 237/1788 [20:45<1:28:15, 3.41s/it] 13%|β–ˆβ–Ž | 238/1788 [20:50<1:42:53, 3.98s/it] 13%|β–ˆβ–Ž | 239/1788 [20:55<1:53:57, 4.41s/it] 13%|β–ˆβ–Ž | 240/1788 [20:59<1:48:01, 4.19s/it] {'loss': 0.0883, 'grad_norm': 7.962102890014648, 'learning_rate': 4.813548788067122e-06, 'epoch': 0.4}
13%|β–ˆβ–Ž | 240/1788 [20:59<1:48:01, 4.19s/it] 13%|β–ˆβ–Ž | 241/1788 [21:03<1:43:47, 4.03s/it] 14%|β–ˆβ–Ž | 242/1788 [21:05<1:30:13, 3.50s/it] 14%|β–ˆβ–Ž | 243/1788 [21:09<1:30:58, 3.53s/it] 14%|β–ˆβ–Ž | 244/1788 [21:11<1:21:12, 3.16s/it] 14%|β–ˆβ–Ž | 245/1788 [21:13<1:14:15, 2.89s/it] 14%|β–ˆβ– | 246/1788 [21:16<1:17:13, 3.00s/it] 14%|β–ˆβ– | 247/1788 [21:19<1:11:35, 2.79s/it] 14%|β–ˆβ– | 248/1788 [21:21<1:08:07, 2.65s/it] 14%|β–ˆβ– | 249/1788 [21:25<1:15:21, 2.94s/it] 14%|β–ˆβ– | 250/1788 [21:28<1:20:32, 3.14s/it] {'loss': 0.1061, 'grad_norm': 4.872562885284424, 'learning_rate': 4.78247358607831e-06, 'epoch': 0.42}
14%|β–ˆβ– | 250/1788 [21:28<1:20:32, 3.14s/it] 14%|β–ˆβ– | 251/1788 [21:30<1:13:57, 2.89s/it] 14%|β–ˆβ– | 252/1788 [21:33<1:09:24, 2.71s/it] 14%|β–ˆβ– | 253/1788 [21:35<1:06:10, 2.59s/it] 14%|β–ˆβ– | 254/1788 [21:40<1:20:08, 3.13s/it] 14%|β–ˆβ– | 255/1788 [21:45<1:41:28, 3.97s/it] 14%|β–ˆβ– | 256/1788 [21:48<1:28:41, 3.47s/it] 14%|β–ˆβ– | 257/1788 [21:52<1:35:56, 3.76s/it] 14%|β–ˆβ– | 258/1788 [21:55<1:25:07, 3.34s/it] 14%|β–ˆβ– | 259/1788 [22:00<1:40:42, 3.95s/it] 15%|β–ˆβ– | 260/1788 [22:04<1:40:50, 3.96s/it] {'loss': 0.1114, 'grad_norm': 7.391303539276123, 'learning_rate': 4.751398384089497e-06, 'epoch': 0.44}
15%|β–ˆβ– | 260/1788 [22:04<1:40:50, 3.96s/it] 15%|β–ˆβ– | 261/1788 [22:06<1:28:11, 3.47s/it] 15%|β–ˆβ– | 262/1788 [22:10<1:33:16, 3.67s/it] 15%|β–ˆβ– | 263/1788 [22:13<1:22:38, 3.25s/it] 15%|β–ˆβ– | 264/1788 [22:15<1:15:04, 2.96s/it] 15%|β–ˆβ– | 265/1788 [22:17<1:09:58, 2.76s/it] 15%|β–ˆβ– | 266/1788 [22:21<1:14:45, 2.95s/it] 15%|β–ˆβ– | 267/1788 [22:24<1:20:41, 3.18s/it] 15%|β–ˆβ– | 268/1788 [22:28<1:24:18, 3.33s/it] 15%|β–ˆβ–Œ | 269/1788 [22:32<1:27:33, 3.46s/it] 15%|β–ˆβ–Œ | 270/1788 [22:34<1:18:32, 3.10s/it] {'loss': 0.1109, 'grad_norm': 4.072627067565918, 'learning_rate': 4.720323182100684e-06, 'epoch': 0.45}
15%|β–ˆβ–Œ | 270/1788 [22:34<1:18:32, 3.10s/it] 15%|β–ˆβ–Œ | 271/1788 [22:39<1:35:23, 3.77s/it] 15%|β–ˆβ–Œ | 272/1788 [22:45<1:47:12, 4.24s/it] 15%|β–ˆβ–Œ | 273/1788 [22:47<1:32:29, 3.66s/it] 15%|β–ˆβ–Œ | 274/1788 [22:51<1:37:04, 3.85s/it] 15%|β–ˆβ–Œ | 275/1788 [22:54<1:25:10, 3.38s/it] 15%|β–ˆβ–Œ | 276/1788 [22:56<1:16:51, 3.05s/it] 15%|β–ˆβ–Œ | 277/1788 [22:59<1:18:32, 3.12s/it] 16%|β–ˆβ–Œ | 278/1788 [23:01<1:12:17, 2.87s/it] 16%|β–ˆβ–Œ | 279/1788 [23:13<2:20:46, 5.60s/it] 16%|β–ˆβ–Œ | 280/1788 [23:19<2:18:52, 5.53s/it] {'loss': 0.1039, 'grad_norm': 4.37451171875, 'learning_rate': 4.689247980111871e-06, 'epoch': 0.47}
16%|β–ˆβ–Œ | 280/1788 [23:19<2:18:52, 5.53s/it] 16%|β–ˆβ–Œ | 281/1788 [23:23<2:10:13, 5.19s/it] 16%|β–ˆβ–Œ | 282/1788 [23:27<1:59:07, 4.75s/it] 16%|β–ˆβ–Œ | 283/1788 [23:32<1:58:47, 4.74s/it] 16%|β–ˆβ–Œ | 284/1788 [23:34<1:39:44, 3.98s/it] 16%|β–ˆβ–Œ | 285/1788 [23:36<1:26:51, 3.47s/it] 16%|β–ˆβ–Œ | 286/1788 [23:40<1:32:27, 3.69s/it] 16%|β–ˆβ–Œ | 287/1788 [23:45<1:38:13, 3.93s/it] 16%|β–ˆβ–Œ | 288/1788 [23:47<1:25:44, 3.43s/it] 16%|β–ˆβ–Œ | 289/1788 [23:51<1:27:00, 3.48s/it] 16%|β–ˆβ–Œ | 290/1788 [23:53<1:18:11, 3.13s/it] {'loss': 0.0831, 'grad_norm': 6.223955154418945, 'learning_rate': 4.658172778123058e-06, 'epoch': 0.49}
16%|β–ˆβ–Œ | 290/1788 [23:53<1:18:11, 3.13s/it] 16%|β–ˆβ–‹ | 291/1788 [23:59<1:42:44, 4.12s/it] 16%|β–ˆβ–‹ | 292/1788 [24:05<1:53:01, 4.53s/it] 16%|β–ˆβ–‹ | 293/1788 [24:07<1:35:35, 3.84s/it] 16%|β–ˆβ–‹ | 294/1788 [24:09<1:23:19, 3.35s/it] 16%|β–ˆβ–‹ | 295/1788 [24:12<1:15:20, 3.03s/it] 17%|β–ˆβ–‹ | 296/1788 [24:14<1:09:09, 2.78s/it] 17%|β–ˆβ–‹ | 297/1788 [24:16<1:04:51, 2.61s/it] 17%|β–ˆβ–‹ | 298/1788 [24:18<1:02:10, 2.50s/it] 17%|β–ˆβ–‹ | 299/1788 [24:23<1:17:44, 3.13s/it] 17%|β–ˆβ–‹ | 300/1788 [24:26<1:20:04, 3.23s/it] {'loss': 0.0896, 'grad_norm': 5.713517665863037, 'learning_rate': 4.627097576134245e-06, 'epoch': 0.5}
17%|β–ˆβ–‹ | 300/1788 [24:26<1:20:04, 3.23s/it] 17%|β–ˆβ–‹ | 301/1788 [24:29<1:12:48, 2.94s/it] 17%|β–ˆβ–‹ | 302/1788 [24:31<1:08:15, 2.76s/it] 17%|β–ˆβ–‹ | 303/1788 [24:33<1:04:50, 2.62s/it] 17%|β–ˆβ–‹ | 304/1788 [24:35<1:02:10, 2.51s/it] 17%|β–ˆβ–‹ | 305/1788 [24:38<1:00:01, 2.43s/it] 17%|β–ˆβ–‹ | 306/1788 [24:41<1:09:02, 2.80s/it] 17%|β–ˆβ–‹ | 307/1788 [24:45<1:19:23, 3.22s/it] 17%|β–ˆβ–‹ | 308/1788 [24:49<1:18:45, 3.19s/it] 17%|β–ˆβ–‹ | 309/1788 [24:51<1:12:03, 2.92s/it] 17%|β–ˆβ–‹ | 310/1788 [24:56<1:28:16, 3.58s/it] {'loss': 0.1053, 'grad_norm': 5.372918128967285, 'learning_rate': 4.596022374145433e-06, 'epoch': 0.52}
17%|β–ˆβ–‹ | 310/1788 [24:56<1:28:16, 3.58s/it] 17%|β–ˆβ–‹ | 311/1788 [24:59<1:24:44, 3.44s/it] 17%|β–ˆβ–‹ | 312/1788 [25:01<1:16:10, 3.10s/it] 18%|β–ˆβ–Š | 313/1788 [25:07<1:33:09, 3.79s/it] 18%|β–ˆβ–Š | 314/1788 [25:12<1:46:13, 4.32s/it] 18%|β–ˆβ–Š | 315/1788 [25:17<1:51:10, 4.53s/it] 18%|β–ˆβ–Š | 316/1788 [25:20<1:34:37, 3.86s/it] 18%|β–ˆβ–Š | 317/1788 [25:24<1:36:21, 3.93s/it] 18%|β–ˆβ–Š | 318/1788 [25:28<1:40:17, 4.09s/it] 18%|β–ˆβ–Š | 319/1788 [25:31<1:27:24, 3.57s/it] 18%|β–ˆβ–Š | 320/1788 [25:34<1:27:23, 3.57s/it] {'loss': 0.0849, 'grad_norm': 3.0682899951934814, 'learning_rate': 4.5649471721566195e-06, 'epoch': 0.54}
18%|β–ˆβ–Š | 320/1788 [25:34<1:27:23, 3.57s/it] 18%|β–ˆβ–Š | 321/1788 [25:37<1:23:17, 3.41s/it] 18%|β–ˆβ–Š | 322/1788 [25:40<1:15:10, 3.08s/it] 18%|β–ˆβ–Š | 323/1788 [25:44<1:23:03, 3.40s/it] 18%|β–ˆβ–Š | 324/1788 [25:47<1:21:39, 3.35s/it] 18%|β–ˆβ–Š | 325/1788 [25:53<1:38:10, 4.03s/it] 18%|β–ˆβ–Š | 326/1788 [25:57<1:39:29, 4.08s/it] 18%|β–ˆβ–Š | 327/1788 [26:01<1:43:13, 4.24s/it] 18%|β–ˆβ–Š | 328/1788 [26:04<1:28:53, 3.65s/it] 18%|β–ˆβ–Š | 329/1788 [26:09<1:41:32, 4.18s/it] 18%|β–ˆβ–Š | 330/1788 [26:14<1:50:23, 4.54s/it] {'loss': 0.0964, 'grad_norm': 5.97300386428833, 'learning_rate': 4.533871970167806e-06, 'epoch': 0.55}
18%|β–ˆβ–Š | 330/1788 [26:14<1:50:23, 4.54s/it] 19%|β–ˆβ–Š | 331/1788 [26:17<1:33:48, 3.86s/it] 19%|β–ˆβ–Š | 332/1788 [26:19<1:22:22, 3.39s/it] 19%|β–ˆβ–Š | 333/1788 [26:21<1:14:48, 3.08s/it] 19%|β–ˆβ–Š | 334/1788 [26:24<1:09:00, 2.85s/it] 19%|β–ˆβ–Š | 335/1788 [26:28<1:22:57, 3.43s/it] 19%|β–ˆβ–‰ | 336/1788 [26:33<1:32:13, 3.81s/it] 19%|β–ˆβ–‰ | 337/1788 [26:39<1:43:17, 4.27s/it] 19%|β–ˆβ–‰ | 338/1788 [26:42<1:39:16, 4.11s/it] 19%|β–ˆβ–‰ | 339/1788 [26:45<1:25:51, 3.56s/it] 19%|β–ˆβ–‰ | 340/1788 [26:47<1:17:07, 3.20s/it] {'loss': 0.1085, 'grad_norm': 5.460120677947998, 'learning_rate': 4.502796768178994e-06, 'epoch': 0.57}
19%|β–ˆβ–‰ | 340/1788 [26:47<1:17:07, 3.20s/it] 19%|β–ˆβ–‰ | 341/1788 [26:50<1:20:03, 3.32s/it] 19%|β–ˆβ–‰ | 342/1788 [26:53<1:12:20, 3.00s/it] 19%|β–ˆβ–‰ | 343/1788 [26:55<1:07:22, 2.80s/it] 19%|β–ˆβ–‰ | 344/1788 [26:59<1:15:52, 3.15s/it] 19%|β–ˆβ–‰ | 345/1788 [27:05<1:35:32, 3.97s/it] 19%|β–ˆβ–‰ | 346/1788 [27:07<1:23:21, 3.47s/it] 19%|β–ˆβ–‰ | 347/1788 [27:10<1:14:49, 3.12s/it] 19%|β–ˆβ–‰ | 348/1788 [27:12<1:09:06, 2.88s/it] 20%|β–ˆβ–‰ | 349/1788 [27:16<1:21:52, 3.41s/it] 20%|β–ˆβ–‰ | 350/1788 [27:22<1:39:36, 4.16s/it] {'loss': 0.0909, 'grad_norm': 10.999938011169434, 'learning_rate': 4.4717215661901805e-06, 'epoch': 0.59}
20%|β–ˆβ–‰ | 350/1788 [27:22<1:39:36, 4.16s/it] 20%|β–ˆβ–‰ | 351/1788 [27:27<1:43:02, 4.30s/it] 20%|β–ˆβ–‰ | 352/1788 [27:30<1:36:18, 4.02s/it] 20%|β–ˆβ–‰ | 353/1788 [27:35<1:39:32, 4.16s/it] 20%|β–ˆβ–‰ | 354/1788 [27:40<1:42:44, 4.30s/it] 20%|β–ˆβ–‰ | 355/1788 [27:45<1:54:23, 4.79s/it] 20%|β–ˆβ–‰ | 356/1788 [27:51<1:58:24, 4.96s/it] 20%|β–ˆβ–‰ | 357/1788 [27:54<1:49:06, 4.57s/it] 20%|β–ˆβ–ˆ | 358/1788 [27:57<1:32:42, 3.89s/it] 20%|β–ˆβ–ˆ | 359/1788 [27:59<1:21:09, 3.41s/it] 20%|β–ˆβ–ˆ | 360/1788 [28:03<1:22:40, 3.47s/it] {'loss': 0.0992, 'grad_norm': 4.350360870361328, 'learning_rate': 4.440646364201367e-06, 'epoch': 0.6}
20%|β–ˆβ–ˆ | 360/1788 [28:03<1:22:40, 3.47s/it] 20%|β–ˆβ–ˆ | 361/1788 [28:08<1:36:25, 4.05s/it] 20%|β–ˆβ–ˆ | 362/1788 [28:13<1:45:59, 4.46s/it] 20%|β–ˆβ–ˆ | 363/1788 [28:16<1:30:22, 3.81s/it] 20%|β–ˆβ–ˆ | 364/1788 [28:18<1:19:00, 3.33s/it] 20%|β–ˆβ–ˆ | 365/1788 [28:20<1:11:07, 3.00s/it] 20%|β–ˆβ–ˆ | 366/1788 [28:23<1:06:20, 2.80s/it] 21%|β–ˆβ–ˆ | 367/1788 [28:28<1:24:39, 3.57s/it] 21%|β–ˆβ–ˆ | 368/1788 [28:32<1:27:45, 3.71s/it] 21%|β–ˆβ–ˆ | 369/1788 [28:34<1:17:28, 3.28s/it] 21%|β–ˆβ–ˆ | 370/1788 [28:39<1:25:32, 3.62s/it] {'loss': 0.0722, 'grad_norm': 6.475912094116211, 'learning_rate': 4.409571162212555e-06, 'epoch': 0.62}
21%|β–ˆβ–ˆ | 370/1788 [28:39<1:25:32, 3.62s/it] 21%|β–ˆβ–ˆ | 371/1788 [28:41<1:16:12, 3.23s/it] 21%|β–ˆβ–ˆ | 372/1788 [28:46<1:26:18, 3.66s/it] 21%|β–ˆβ–ˆ | 373/1788 [28:49<1:22:37, 3.50s/it] 21%|β–ˆβ–ˆ | 374/1788 [28:51<1:14:09, 3.15s/it] 21%|β–ˆβ–ˆ | 375/1788 [28:57<1:34:06, 4.00s/it] 21%|β–ˆβ–ˆ | 376/1788 [28:59<1:22:02, 3.49s/it] 21%|β–ˆβ–ˆ | 377/1788 [29:04<1:28:39, 3.77s/it] 21%|β–ˆβ–ˆ | 378/1788 [29:09<1:41:13, 4.31s/it] 21%|β–ˆβ–ˆ | 379/1788 [29:15<1:49:04, 4.64s/it] 21%|β–ˆβ–ˆβ– | 380/1788 [29:20<1:52:07, 4.78s/it] {'loss': 0.1005, 'grad_norm': 15.557165145874023, 'learning_rate': 4.378495960223742e-06, 'epoch': 0.64}
21%|β–ˆβ–ˆβ– | 380/1788 [29:20<1:52:07, 4.78s/it] 21%|β–ˆβ–ˆβ– | 381/1788 [29:25<1:55:58, 4.95s/it] 21%|β–ˆβ–ˆβ– | 382/1788 [29:29<1:46:49, 4.56s/it] 21%|β–ˆβ–ˆβ– | 383/1788 [29:31<1:30:57, 3.88s/it] 21%|β–ˆβ–ˆβ– | 384/1788 [29:35<1:31:48, 3.92s/it] 22%|β–ˆβ–ˆβ– | 385/1788 [29:41<1:43:34, 4.43s/it] 22%|β–ˆβ–ˆβ– | 386/1788 [29:43<1:28:13, 3.78s/it] 22%|β–ˆβ–ˆβ– | 387/1788 [29:47<1:27:21, 3.74s/it] 22%|β–ˆβ–ˆβ– | 388/1788 [29:52<1:39:12, 4.25s/it] 22%|β–ˆβ–ˆβ– | 389/1788 [29:56<1:36:24, 4.14s/it] 22%|β–ˆβ–ˆβ– | 390/1788 [29:58<1:23:28, 3.58s/it] {'loss': 0.0963, 'grad_norm': 5.6789350509643555, 'learning_rate': 4.347420758234929e-06, 'epoch': 0.65}
22%|β–ˆβ–ˆβ– | 390/1788 [29:58<1:23:28, 3.58s/it] 22%|β–ˆβ–ˆβ– | 391/1788 [30:00<1:13:38, 3.16s/it] 22%|β–ˆβ–ˆβ– | 392/1788 [30:03<1:07:19, 2.89s/it] 22%|β–ˆβ–ˆβ– | 393/1788 [30:05<1:02:49, 2.70s/it] 22%|β–ˆβ–ˆβ– | 394/1788 [30:10<1:17:05, 3.32s/it] 22%|β–ˆβ–ˆβ– | 395/1788 [30:15<1:29:35, 3.86s/it] 22%|β–ˆβ–ˆβ– | 396/1788 [30:17<1:18:47, 3.40s/it] 22%|β–ˆβ–ˆβ– | 397/1788 [30:21<1:18:58, 3.41s/it] 22%|β–ˆβ–ˆβ– | 398/1788 [30:27<1:36:41, 4.17s/it] 22%|β–ˆβ–ˆβ– | 399/1788 [30:29<1:23:43, 3.62s/it] 22%|β–ˆβ–ˆβ– | 400/1788 [30:33<1:27:47, 3.80s/it] {'loss': 0.0855, 'grad_norm': 2.9128787517547607, 'learning_rate': 4.316345556246116e-06, 'epoch': 0.67}
22%|β–ˆβ–ˆβ– | 400/1788 [30:33<1:27:47, 3.80s/it] 22%|β–ˆβ–ˆβ– | 401/1788 [30:35<1:16:49, 3.32s/it] 22%|β–ˆβ–ˆβ– | 402/1788 [30:38<1:09:40, 3.02s/it] 23%|β–ˆβ–ˆβ–Ž | 403/1788 [30:43<1:27:12, 3.78s/it] 23%|β–ˆβ–ˆβ–Ž | 404/1788 [30:46<1:17:52, 3.38s/it] 23%|β–ˆβ–ˆβ–Ž | 405/1788 [30:51<1:29:52, 3.90s/it] 23%|β–ˆβ–ˆβ–Ž | 406/1788 [30:55<1:34:21, 4.10s/it] 23%|β–ˆβ–ˆβ–Ž | 407/1788 [31:00<1:37:17, 4.23s/it] 23%|β–ˆβ–ˆβ–Ž | 408/1788 [31:04<1:36:40, 4.20s/it] 23%|β–ˆβ–ˆβ–Ž | 409/1788 [31:09<1:43:12, 4.49s/it] 23%|β–ˆβ–ˆβ–Ž | 410/1788 [31:14<1:42:45, 4.47s/it] {'loss': 0.11, 'grad_norm': 5.32689905166626, 'learning_rate': 4.285270354257303e-06, 'epoch': 0.69}
23%|β–ˆβ–ˆβ–Ž | 410/1788 [31:14<1:42:45, 4.47s/it] 23%|β–ˆβ–ˆβ–Ž | 411/1788 [31:19<1:47:41, 4.69s/it] 23%|β–ˆβ–ˆβ–Ž | 412/1788 [31:25<1:56:29, 5.08s/it] 23%|β–ˆβ–ˆβ–Ž | 413/1788 [31:29<1:48:23, 4.73s/it] 23%|β–ˆβ–ˆβ–Ž | 414/1788 [31:34<1:52:30, 4.91s/it] 23%|β–ˆβ–ˆβ–Ž | 415/1788 [31:38<1:43:37, 4.53s/it] 23%|β–ˆβ–ˆβ–Ž | 416/1788 [31:40<1:28:08, 3.85s/it] 23%|β–ˆβ–ˆβ–Ž | 417/1788 [31:42<1:17:32, 3.39s/it] 23%|β–ˆβ–ˆβ–Ž | 418/1788 [31:47<1:26:02, 3.77s/it] 23%|β–ˆβ–ˆβ–Ž | 419/1788 [31:49<1:15:46, 3.32s/it] 23%|β–ˆβ–ˆβ–Ž | 420/1788 [31:53<1:21:09, 3.56s/it] {'loss': 0.0562, 'grad_norm': 2.710726737976074, 'learning_rate': 4.25419515226849e-06, 'epoch': 0.7}
23%|β–ˆβ–ˆβ–Ž | 420/1788 [31:53<1:21:09, 3.56s/it] 24%|β–ˆβ–ˆβ–Ž | 421/1788 [31:56<1:12:33, 3.18s/it] 24%|β–ˆβ–ˆβ–Ž | 422/1788 [31:58<1:05:56, 2.90s/it] 24%|β–ˆβ–ˆβ–Ž | 423/1788 [32:00<1:01:30, 2.70s/it] 24%|β–ˆβ–ˆβ–Ž | 424/1788 [32:02<58:23, 2.57s/it] 24%|β–ˆβ–ˆβ– | 425/1788 [32:05<56:41, 2.50s/it] 24%|β–ˆβ–ˆβ– | 426/1788 [32:08<1:02:30, 2.75s/it] 24%|β–ˆβ–ˆβ– | 427/1788 [32:10<58:48, 2.59s/it] 24%|β–ˆβ–ˆβ– | 428/1788 [32:12<54:18, 2.40s/it] 24%|β–ˆβ–ˆβ– | 429/1788 [32:14<51:48, 2.29s/it] 24%|β–ˆβ–ˆβ– | 430/1788 [32:18<1:01:42, 2.73s/it] {'loss': 0.1113, 'grad_norm': 4.67482852935791, 'learning_rate': 4.223119950279677e-06, 'epoch': 0.72}
24%|β–ˆβ–ˆβ– | 430/1788 [32:18<1:01:42, 2.73s/it] 24%|β–ˆβ–ˆβ– | 431/1788 [32:23<1:17:52, 3.44s/it] 24%|β–ˆβ–ˆβ– | 432/1788 [32:25<1:09:12, 3.06s/it] 24%|β–ˆβ–ˆβ– | 433/1788 [32:27<1:03:28, 2.81s/it] 24%|β–ˆβ–ˆβ– | 434/1788 [32:31<1:07:48, 3.00s/it] 24%|β–ˆβ–ˆβ– | 435/1788 [32:37<1:25:21, 3.79s/it] 24%|β–ˆβ–ˆβ– | 436/1788 [32:42<1:35:47, 4.25s/it] 24%|β–ˆβ–ˆβ– | 437/1788 [32:47<1:39:06, 4.40s/it] 24%|β–ˆβ–ˆβ– | 438/1788 [32:49<1:24:58, 3.78s/it] 25%|β–ˆβ–ˆβ– | 439/1788 [32:54<1:30:47, 4.04s/it] 25%|β–ˆβ–ˆβ– | 440/1788 [32:56<1:19:03, 3.52s/it] {'loss': 0.1013, 'grad_norm': 7.16488790512085, 'learning_rate': 4.192044748290864e-06, 'epoch': 0.74}
25%|β–ˆβ–ˆβ– | 440/1788 [32:56<1:19:03, 3.52s/it] 25%|β–ˆβ–ˆβ– | 441/1788 [33:02<1:35:40, 4.26s/it] 25%|β–ˆβ–ˆβ– | 442/1788 [33:06<1:37:53, 4.36s/it] 25%|β–ˆβ–ˆβ– | 443/1788 [33:12<1:48:18, 4.83s/it] 25%|β–ˆβ–ˆβ– | 444/1788 [33:18<1:51:33, 4.98s/it] 25%|β–ˆβ–ˆβ– | 445/1788 [33:21<1:42:15, 4.57s/it] 25%|β–ˆβ–ˆβ– | 446/1788 [33:24<1:26:52, 3.88s/it] 25%|β–ˆβ–ˆβ–Œ | 447/1788 [33:30<1:40:12, 4.48s/it] 25%|β–ˆβ–ˆβ–Œ | 448/1788 [33:32<1:25:18, 3.82s/it] 25%|β–ˆβ–ˆβ–Œ | 449/1788 [33:38<1:39:10, 4.44s/it] 25%|β–ˆβ–ˆβ–Œ | 450/1788 [33:40<1:24:45, 3.80s/it] {'loss': 0.0858, 'grad_norm': 5.816267013549805, 'learning_rate': 4.160969546302051e-06, 'epoch': 0.76}
25%|β–ˆβ–ˆβ–Œ | 450/1788 [33:40<1:24:45, 3.80s/it] 25%|β–ˆβ–ˆβ–Œ | 451/1788 [33:46<1:39:25, 4.46s/it] 25%|β–ˆβ–ˆβ–Œ | 452/1788 [33:48<1:25:01, 3.82s/it] 25%|β–ˆβ–ˆβ–Œ | 453/1788 [33:52<1:26:12, 3.87s/it] 25%|β–ˆβ–ˆβ–Œ | 454/1788 [33:55<1:15:35, 3.40s/it] 25%|β–ˆβ–ˆβ–Œ | 455/1788 [33:57<1:08:16, 3.07s/it] 26%|β–ˆβ–ˆβ–Œ | 456/1788 [34:02<1:21:53, 3.69s/it] 26%|β–ˆβ–ˆβ–Œ | 457/1788 [34:07<1:29:43, 4.04s/it] 26%|β–ˆβ–ˆβ–Œ | 458/1788 [34:12<1:37:03, 4.38s/it] 26%|β–ˆβ–ˆβ–Œ | 459/1788 [34:17<1:43:49, 4.69s/it] 26%|β–ˆβ–ˆβ–Œ | 460/1788 [34:21<1:35:03, 4.30s/it] {'loss': 0.0921, 'grad_norm': 7.29515266418457, 'learning_rate': 4.1298943443132386e-06, 'epoch': 0.77}
26%|β–ˆβ–ˆβ–Œ | 460/1788 [34:21<1:35:03, 4.30s/it] 26%|β–ˆβ–ˆβ–Œ | 461/1788 [34:23<1:21:40, 3.69s/it] 26%|β–ˆβ–ˆβ–Œ | 462/1788 [34:25<1:12:27, 3.28s/it] 26%|β–ˆβ–ˆβ–Œ | 463/1788 [34:31<1:27:46, 3.98s/it] 26%|β–ˆβ–ˆβ–Œ | 464/1788 [34:37<1:40:35, 4.56s/it] 26%|β–ˆβ–ˆβ–Œ | 465/1788 [34:42<1:45:48, 4.80s/it] 26%|β–ˆβ–ˆβ–Œ | 466/1788 [34:47<1:44:58, 4.76s/it] 26%|β–ˆβ–ˆβ–Œ | 467/1788 [34:51<1:37:36, 4.43s/it] 26%|β–ˆβ–ˆβ–Œ | 468/1788 [34:53<1:22:57, 3.77s/it] 26%|β–ˆβ–ˆβ–Œ | 469/1788 [34:55<1:13:08, 3.33s/it] 26%|β–ˆβ–ˆβ–‹ | 470/1788 [34:57<1:05:55, 3.00s/it] {'loss': 0.1031, 'grad_norm': 4.7909836769104, 'learning_rate': 4.098819142324425e-06, 'epoch': 0.79}
26%|β–ˆβ–ˆβ–‹ | 470/1788 [34:57<1:05:55, 3.00s/it] 26%|β–ˆβ–ˆβ–‹ | 471/1788 [35:00<1:00:37, 2.76s/it] 26%|β–ˆβ–ˆβ–‹ | 472/1788 [35:02<57:07, 2.60s/it] 26%|β–ˆβ–ˆβ–‹ | 473/1788 [35:07<1:15:43, 3.46s/it] 27%|β–ˆβ–ˆβ–‹ | 474/1788 [35:13<1:28:30, 4.04s/it] 27%|β–ˆβ–ˆβ–‹ | 475/1788 [35:15<1:16:49, 3.51s/it] 27%|β–ˆβ–ˆβ–‹ | 476/1788 [35:17<1:09:06, 3.16s/it] 27%|β–ˆβ–ˆβ–‹ | 477/1788 [35:29<2:03:52, 5.67s/it] 27%|β–ˆβ–ˆβ–‹ | 478/1788 [35:31<1:42:15, 4.68s/it] 27%|β–ˆβ–ˆβ–‹ | 479/1788 [35:36<1:39:50, 4.58s/it] 27%|β–ˆβ–ˆβ–‹ | 480/1788 [35:38<1:24:39, 3.88s/it] {'loss': 0.0994, 'grad_norm': 9.43322467803955, 'learning_rate': 4.067743940335612e-06, 'epoch': 0.81}
27%|β–ˆβ–ˆβ–‹ | 480/1788 [35:38<1:24:39, 3.88s/it] 27%|β–ˆβ–ˆβ–‹ | 481/1788 [35:42<1:28:36, 4.07s/it] 27%|β–ˆβ–ˆβ–‹ | 482/1788 [35:48<1:36:45, 4.45s/it] 27%|β–ˆβ–ˆβ–‹ | 483/1788 [36:00<2:25:02, 6.67s/it] 27%|β–ˆβ–ˆβ–‹ | 484/1788 [36:02<1:56:36, 5.37s/it] 27%|β–ˆβ–ˆβ–‹ | 485/1788 [36:07<1:56:31, 5.37s/it] 27%|β–ˆβ–ˆβ–‹ | 486/1788 [36:12<1:52:25, 5.18s/it] 27%|β–ˆβ–ˆβ–‹ | 487/1788 [36:14<1:34:03, 4.34s/it] 27%|β–ˆβ–ˆβ–‹ | 488/1788 [36:20<1:43:56, 4.80s/it] 27%|β–ˆβ–ˆβ–‹ | 489/1788 [36:23<1:27:33, 4.04s/it] 27%|β–ˆβ–ˆβ–‹ | 490/1788 [36:28<1:36:00, 4.44s/it] {'loss': 0.0971, 'grad_norm': 6.166449069976807, 'learning_rate': 4.0366687383467996e-06, 'epoch': 0.82}
27%|β–ˆβ–ˆβ–‹ | 490/1788 [36:28<1:36:00, 4.44s/it] 27%|β–ˆβ–ˆβ–‹ | 491/1788 [36:30<1:22:17, 3.81s/it] 28%|β–ˆβ–ˆβ–Š | 492/1788 [36:36<1:32:34, 4.29s/it] 28%|β–ˆβ–ˆβ–Š | 493/1788 [36:38<1:19:44, 3.69s/it] 28%|β–ˆβ–ˆβ–Š | 494/1788 [36:43<1:28:51, 4.12s/it] 28%|β–ˆβ–ˆβ–Š | 495/1788 [36:48<1:36:58, 4.50s/it] 28%|β–ˆβ–ˆβ–Š | 496/1788 [36:54<1:42:51, 4.78s/it] 28%|β–ˆβ–ˆβ–Š | 497/1788 [36:59<1:47:11, 4.98s/it] 28%|β–ˆβ–ˆβ–Š | 498/1788 [37:04<1:44:22, 4.85s/it] 28%|β–ˆβ–ˆβ–Š | 499/1788 [37:06<1:27:36, 4.08s/it] 28%|β–ˆβ–ˆβ–Š | 500/1788 [37:10<1:27:42, 4.09s/it] {'loss': 0.0866, 'grad_norm': 7.390334606170654, 'learning_rate': 4.005593536357987e-06, 'epoch': 0.84}
28%|β–ˆβ–ˆβ–Š | 500/1788 [37:10<1:27:42, 4.09s/it] 28%|β–ˆβ–ˆβ–Š | 501/1788 [37:15<1:34:03, 4.38s/it] 28%|β–ˆβ–ˆβ–Š | 502/1788 [37:19<1:29:20, 4.17s/it] 28%|β–ˆβ–ˆβ–Š | 503/1788 [37:24<1:32:41, 4.33s/it] 28%|β–ˆβ–ˆβ–Š | 504/1788 [37:28<1:34:19, 4.41s/it] 28%|β–ˆβ–ˆβ–Š | 505/1788 [37:31<1:21:24, 3.81s/it] 28%|β–ˆβ–ˆβ–Š | 506/1788 [37:36<1:31:46, 4.30s/it] 28%|β–ˆβ–ˆβ–Š | 507/1788 [37:41<1:33:06, 4.36s/it] 28%|β–ˆβ–ˆβ–Š | 508/1788 [37:43<1:20:06, 3.76s/it] 28%|β–ˆβ–ˆβ–Š | 509/1788 [37:48<1:30:26, 4.24s/it] 29%|β–ˆβ–ˆβ–Š | 510/1788 [37:54<1:37:49, 4.59s/it] {'loss': 0.1007, 'grad_norm': 5.356869697570801, 'learning_rate': 3.974518334369174e-06, 'epoch': 0.86}
29%|β–ˆβ–ˆβ–Š | 510/1788 [37:54<1:37:49, 4.59s/it] 29%|β–ˆβ–ˆβ–Š | 511/1788 [37:56<1:23:18, 3.91s/it] 29%|β–ˆβ–ˆβ–Š | 512/1788 [38:01<1:32:22, 4.34s/it] 29%|β–ˆβ–ˆβ–Š | 513/1788 [38:04<1:19:18, 3.73s/it] 29%|β–ˆβ–ˆβ–Š | 514/1788 [38:07<1:16:46, 3.62s/it] 29%|β–ˆβ–ˆβ–‰ | 515/1788 [38:12<1:24:23, 3.98s/it] 29%|β–ˆβ–ˆβ–‰ | 516/1788 [38:14<1:13:41, 3.48s/it] 29%|β–ˆβ–ˆβ–‰ | 517/1788 [38:20<1:25:26, 4.03s/it] 29%|β–ˆβ–ˆβ–‰ | 518/1788 [38:24<1:26:32, 4.09s/it] 29%|β–ˆβ–ˆβ–‰ | 519/1788 [38:29<1:34:06, 4.45s/it] 29%|β–ˆβ–ˆβ–‰ | 520/1788 [38:41<2:21:19, 6.69s/it] {'loss': 0.1029, 'grad_norm': 8.45686149597168, 'learning_rate': 3.9434431323803606e-06, 'epoch': 0.87}
29%|β–ˆβ–ˆβ–‰ | 520/1788 [38:41<2:21:19, 6.69s/it] 29%|β–ˆβ–ˆβ–‰ | 521/1788 [38:46<2:11:37, 6.23s/it] 29%|β–ˆβ–ˆβ–‰ | 522/1788 [38:51<2:01:07, 5.74s/it] 29%|β–ˆβ–ˆβ–‰ | 523/1788 [38:55<1:49:17, 5.18s/it] 29%|β–ˆβ–ˆβ–‰ | 524/1788 [38:58<1:36:11, 4.57s/it] 29%|β–ˆβ–ˆβ–‰ | 525/1788 [39:00<1:21:38, 3.88s/it] 29%|β–ˆβ–ˆβ–‰ | 526/1788 [39:05<1:29:17, 4.25s/it] 29%|β–ˆβ–ˆβ–‰ | 527/1788 [39:09<1:25:20, 4.06s/it] 30%|β–ˆβ–ˆβ–‰ | 528/1788 [39:13<1:24:13, 4.01s/it] 30%|β–ˆβ–ˆβ–‰ | 529/1788 [39:15<1:13:41, 3.51s/it] 30%|β–ˆβ–ˆβ–‰ | 530/1788 [39:20<1:24:44, 4.04s/it] {'loss': 0.0878, 'grad_norm': 4.9290452003479, 'learning_rate': 3.912367930391548e-06, 'epoch': 0.89}
30%|β–ˆβ–ˆβ–‰ | 530/1788 [39:20<1:24:44, 4.04s/it] 30%|β–ˆβ–ˆβ–‰ | 531/1788 [39:23<1:13:39, 3.52s/it] 30%|β–ˆβ–ˆβ–‰ | 532/1788 [39:25<1:06:05, 3.16s/it] 30%|β–ˆβ–ˆβ–‰ | 533/1788 [39:30<1:21:28, 3.90s/it] 30%|β–ˆβ–ˆβ–‰ | 534/1788 [39:36<1:29:24, 4.28s/it] 30%|β–ˆβ–ˆβ–‰ | 535/1788 [39:42<1:39:33, 4.77s/it] 30%|β–ˆβ–ˆβ–‰ | 536/1788 [39:45<1:32:37, 4.44s/it] 30%|β–ˆβ–ˆβ–ˆ | 537/1788 [39:51<1:37:52, 4.69s/it] 30%|β–ˆβ–ˆβ–ˆ | 538/1788 [39:55<1:38:08, 4.71s/it] 30%|β–ˆβ–ˆβ–ˆ | 539/1788 [39:59<1:31:59, 4.42s/it] 30%|β–ˆβ–ˆβ–ˆ | 540/1788 [40:04<1:37:53, 4.71s/it] {'loss': 0.1081, 'grad_norm': 5.403077125549316, 'learning_rate': 3.881292728402735e-06, 'epoch': 0.91}
30%|β–ˆβ–ˆβ–ˆ | 540/1788 [40:04<1:37:53, 4.71s/it] 30%|β–ˆβ–ˆβ–ˆ | 541/1788 [40:07<1:22:47, 3.98s/it] 30%|β–ˆβ–ˆβ–ˆ | 542/1788 [40:11<1:25:36, 4.12s/it] 30%|β–ˆβ–ˆβ–ˆ | 543/1788 [40:13<1:14:21, 3.58s/it] 30%|β–ˆβ–ˆβ–ˆ | 544/1788 [40:19<1:26:58, 4.20s/it] 30%|β–ˆβ–ˆβ–ˆ | 545/1788 [40:21<1:15:18, 3.64s/it] 31%|β–ˆβ–ˆβ–ˆ | 546/1788 [40:27<1:25:44, 4.14s/it] 31%|β–ˆβ–ˆβ–ˆ | 547/1788 [40:29<1:14:05, 3.58s/it] 31%|β–ˆβ–ˆβ–ˆ | 548/1788 [40:33<1:14:54, 3.62s/it] 31%|β–ˆβ–ˆβ–ˆ | 549/1788 [40:38<1:24:44, 4.10s/it] 31%|β–ˆβ–ˆβ–ˆ | 550/1788 [40:43<1:32:45, 4.50s/it] {'loss': 0.0838, 'grad_norm': 4.801890850067139, 'learning_rate': 3.8502175264139215e-06, 'epoch': 0.92}
31%|β–ˆβ–ˆβ–ˆ | 550/1788 [40:43<1:32:45, 4.50s/it] 31%|β–ˆβ–ˆβ–ˆ | 551/1788 [40:48<1:34:07, 4.57s/it] 31%|β–ˆβ–ˆβ–ˆ | 552/1788 [40:54<1:40:25, 4.88s/it] 31%|β–ˆβ–ˆβ–ˆ | 553/1788 [40:58<1:35:27, 4.64s/it] 31%|β–ˆβ–ˆβ–ˆ | 554/1788 [41:00<1:21:03, 3.94s/it] 31%|β–ˆβ–ˆβ–ˆ | 555/1788 [41:12<2:09:48, 6.32s/it] 31%|β–ˆβ–ˆβ–ˆ | 556/1788 [41:14<1:45:03, 5.12s/it] 31%|β–ˆβ–ˆβ–ˆ | 557/1788 [41:17<1:28:46, 4.33s/it] 31%|β–ˆβ–ˆβ–ˆ | 558/1788 [41:21<1:31:05, 4.44s/it] 31%|β–ˆβ–ˆβ–ˆβ– | 559/1788 [41:24<1:18:20, 3.82s/it] 31%|β–ˆβ–ˆβ–ˆβ– | 560/1788 [41:28<1:21:48, 4.00s/it] {'loss': 0.1168, 'grad_norm': 6.135326862335205, 'learning_rate': 3.819142324425109e-06, 'epoch': 0.94}
31%|β–ˆβ–ˆβ–ˆβ– | 560/1788 [41:28<1:21:48, 4.00s/it] 31%|β–ˆβ–ˆβ–ˆβ– | 561/1788 [41:31<1:11:24, 3.49s/it] 31%|β–ˆβ–ˆβ–ˆβ– | 562/1788 [41:34<1:10:21, 3.44s/it] 31%|β–ˆβ–ˆβ–ˆβ– | 563/1788 [41:38<1:16:19, 3.74s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 564/1788 [41:43<1:21:28, 3.99s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 565/1788 [41:45<1:11:05, 3.49s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 566/1788 [41:51<1:22:37, 4.06s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 567/1788 [41:56<1:29:15, 4.39s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 568/1788 [41:58<1:16:25, 3.76s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 569/1788 [42:00<1:06:27, 3.27s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 570/1788 [42:02<1:00:10, 2.96s/it] {'loss': 0.0566, 'grad_norm': 4.115556716918945, 'learning_rate': 3.7880671224362962e-06, 'epoch': 0.96}
32%|β–ˆβ–ˆβ–ˆβ– | 570/1788 [42:02<1:00:10, 2.96s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 571/1788 [42:06<1:06:23, 3.27s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 572/1788 [42:09<1:00:06, 2.97s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 573/1788 [42:11<55:49, 2.76s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 574/1788 [42:14<59:13, 2.93s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 575/1788 [42:20<1:13:23, 3.63s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 576/1788 [42:24<1:16:31, 3.79s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 577/1788 [42:30<1:29:23, 4.43s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 578/1788 [42:34<1:27:18, 4.33s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 579/1788 [42:37<1:21:44, 4.06s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 580/1788 [42:39<1:10:46, 3.52s/it] {'loss': 0.1073, 'grad_norm': 11.748637199401855, 'learning_rate': 3.756991920447483e-06, 'epoch': 0.97}
32%|β–ˆβ–ˆβ–ˆβ– | 580/1788 [42:39<1:10:46, 3.52s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 581/1788 [42:43<1:12:11, 3.59s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 582/1788 [42:48<1:21:41, 4.06s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 583/1788 [42:51<1:11:01, 3.54s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 584/1788 [42:56<1:21:03, 4.04s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 585/1788 [43:00<1:24:14, 4.20s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 586/1788 [43:04<1:20:02, 4.00s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 587/1788 [43:09<1:28:10, 4.40s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 588/1788 [43:13<1:22:45, 4.14s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 589/1788 [43:15<1:11:39, 3.59s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 590/1788 [43:19<1:12:25, 3.63s/it] {'loss': 0.1032, 'grad_norm': 6.442657947540283, 'learning_rate': 3.7259167184586705e-06, 'epoch': 0.99}
33%|β–ˆβ–ˆβ–ˆβ–Ž | 590/1788 [43:19<1:12:25, 3.63s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 591/1788 [43:22<1:09:33, 3.49s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 592/1788 [43:24<1:02:23, 3.13s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 593/1788 [43:28<1:02:55, 3.16s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 594/1788 [43:33<1:15:08, 3.78s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 595/1788 [43:38<1:25:02, 4.28s/it]---------------------------*Rank 3: refresh data--------------------------- 33%|β–ˆβ–ˆβ–ˆβ–Ž | 596/1788 [43:49<2:05:56, 6.34s/it]
---------------------------*Rank 7: refresh data------------------------------------------------------*Rank 2: refresh data---------------------------
---------------------------*Rank 0: refresh data---------------------------
---------------------------*Rank 5: refresh data---------------------------
---------------------------*Rank 6: refresh data---------------------------
---------------------------*Rank 4: refresh data---------------------------
---------------------------*Rank 1: refresh data---------------------------
33%|β–ˆβ–ˆβ–ˆβ–Ž | 597/1788 [43:54<1:53:48, 5.73s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 598/1788 [43:59<1:49:57, 5.54s/it] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 599/1788 [44:01<1:31:04, 4.60s/it] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 600/1788 [44:06<1:31:56, 4.64s/it] {'loss': 0.0808, 'grad_norm': 4.0611467361450195, 'learning_rate': 3.6948415164698577e-06, 'epoch': 1.01}
34%|β–ˆβ–ˆβ–ˆβ–Ž | 600/1788 [44:06<1:31:56, 4.64s/it] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 601/1788 [44:13<1:43:54, 5.25s/it] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 602/1788 [44:15<1:26:04, 4.35s/it] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 603/1788 [44:17<1:13:33, 3.72s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 604/1788 [44:23<1:23:39, 4.24s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 605/1788 [44:25<1:11:50, 3.64s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 606/1788 [44:27<1:03:41, 3.23s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 607/1788 [44:29<58:32, 2.97s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 608/1788 [44:32<54:26, 2.77s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 609/1788 [44:36<1:05:49, 3.35s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 610/1788 [44:39<59:44, 3.04s/it] {'loss': 0.057, 'grad_norm': 3.64019513130188, 'learning_rate': 3.6637663144810444e-06, 'epoch': 1.02}
34%|β–ˆβ–ˆβ–ˆβ– | 610/1788 [44:39<59:44, 3.04s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 611/1788 [44:42<1:03:21, 3.23s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 612/1788 [44:48<1:14:49, 3.82s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 613/1788 [44:53<1:24:13, 4.30s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 614/1788 [44:57<1:25:12, 4.35s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 615/1788 [45:03<1:29:25, 4.57s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 616/1788 [45:08<1:33:19, 4.78s/it] 35%|β–ˆβ–ˆβ–ˆβ– | 617/1788 [45:10<1:18:33, 4.03s/it] 35%|β–ˆβ–ˆβ–ˆβ– | 618/1788 [45:14<1:14:53, 3.84s/it] 35%|β–ˆβ–ˆβ–ˆβ– | 619/1788 [45:18<1:20:13, 4.12s/it] 35%|β–ˆβ–ˆβ–ˆβ– | 620/1788 [45:24<1:26:48, 4.46s/it] {'loss': 0.0755, 'grad_norm': 3.229475736618042, 'learning_rate': 3.6326911124922315e-06, 'epoch': 1.04}
35%|β–ˆβ–ˆβ–ˆβ– | 620/1788 [45:24<1:26:48, 4.46s/it] 35%|β–ˆβ–ˆβ–ˆβ– | 621/1788 [45:27<1:19:58, 4.11s/it] 35%|β–ˆβ–ˆβ–ˆβ– | 622/1788 [45:31<1:21:52, 4.21s/it] 35%|β–ˆβ–ˆβ–ˆβ– | 623/1788 [45:36<1:27:25, 4.50s/it] 35%|β–ˆβ–ˆβ–ˆβ– | 624/1788 [45:41<1:27:09, 4.49s/it] 35%|β–ˆβ–ˆβ–ˆβ– | 625/1788 [45:43<1:14:18, 3.83s/it] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 626/1788 [45:46<1:05:18, 3.37s/it] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 627/1788 [45:51<1:18:00, 4.03s/it] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 628/1788 [45:57<1:26:17, 4.46s/it] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 629/1788 [47:17<8:47:04, 27.29s/it] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 630/1788 [47:20<6:22:35, 19.82s/it] {'loss': 0.0844, 'grad_norm': 4.590421676635742, 'learning_rate': 3.6016159105034186e-06, 'epoch': 1.06}
35%|β–ˆβ–ˆβ–ˆβ–Œ | 630/1788 [47:20<6:22:35, 19.82s/it] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 631/1788 [47:25<4:57:06, 15.41s/it] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 632/1788 [47:30<3:57:23, 12.32s/it] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 633/1788 [47:35<3:15:52, 10.18s/it] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 634/1788 [47:37<2:30:16, 7.81s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 635/1788 [47:39<1:58:14, 6.15s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 636/1788 [47:43<1:42:25, 5.33s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 637/1788 [47:45<1:24:39, 4.41s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 638/1788 [47:48<1:12:57, 3.81s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 639/1788 [47:53<1:22:16, 4.30s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 640/1788 [47:58<1:28:40, 4.63s/it] {'loss': 0.0772, 'grad_norm': 4.31036901473999, 'learning_rate': 3.5705407085146054e-06, 'epoch': 1.07}
36%|β–ˆβ–ˆβ–ˆβ–Œ | 640/1788 [47:58<1:28:40, 4.63s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 641/1788 [48:10<2:08:00, 6.70s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 642/1788 [48:14<1:50:03, 5.76s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 643/1788 [48:16<1:30:14, 4.73s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 644/1788 [48:20<1:27:25, 4.59s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 645/1788 [48:25<1:30:59, 4.78s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 646/1788 [48:31<1:34:44, 4.98s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 647/1788 [48:36<1:35:08, 5.00s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 648/1788 [48:38<1:19:33, 4.19s/it] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 649/1788 [48:43<1:25:39, 4.51s/it] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 650/1788 [48:46<1:13:04, 3.85s/it] {'loss': 0.088, 'grad_norm': 1.6511598825454712, 'learning_rate': 3.5394655065257925e-06, 'epoch': 1.09}
36%|β–ˆβ–ˆβ–ˆβ–‹ | 650/1788 [48:46<1:13:04, 3.85s/it] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 651/1788 [48:48<1:04:05, 3.38s/it] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 652/1788 [48:50<57:49, 3.05s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 653/1788 [48:53<53:21, 2.82s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 654/1788 [48:56<58:17, 3.08s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 655/1788 [49:00<1:00:16, 3.19s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 656/1788 [49:06<1:15:43, 4.01s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 657/1788 [49:08<1:05:52, 3.49s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 658/1788 [49:10<58:54, 3.13s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 659/1788 [49:12<53:56, 2.87s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 660/1788 [49:16<1:00:23, 3.21s/it] {'loss': 0.0577, 'grad_norm': 4.515713214874268, 'learning_rate': 3.50839030453698e-06, 'epoch': 1.11}
37%|β–ˆβ–ˆβ–ˆβ–‹ | 660/1788 [49:16<1:00:23, 3.21s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 661/1788 [49:22<1:12:52, 3.88s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 662/1788 [49:24<1:03:34, 3.39s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 663/1788 [49:26<57:07, 3.05s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 664/1788 [49:29<52:52, 2.82s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 665/1788 [49:33<1:00:17, 3.22s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 666/1788 [49:38<1:13:42, 3.94s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 667/1788 [49:42<1:10:05, 3.75s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 668/1788 [49:46<1:11:18, 3.82s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 669/1788 [49:48<1:03:14, 3.39s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 670/1788 [49:50<56:58, 3.06s/it] {'loss': 0.0732, 'grad_norm': 4.7008585929870605, 'learning_rate': 3.4773151025481668e-06, 'epoch': 1.12}
37%|β–ˆβ–ˆβ–ˆβ–‹ | 670/1788 [49:50<56:58, 3.06s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 671/1788 [49:56<1:10:52, 3.81s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 672/1788 [49:58<1:02:16, 3.35s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 673/1788 [50:02<1:05:44, 3.54s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 674/1788 [50:04<58:28, 3.15s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 675/1788 [50:07<53:39, 2.89s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 676/1788 [50:10<57:55, 3.13s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 677/1788 [50:15<1:05:41, 3.55s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 678/1788 [50:17<58:09, 3.14s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 679/1788 [50:19<52:43, 2.85s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 680/1788 [50:22<49:29, 2.68s/it] {'loss': 0.0748, 'grad_norm': 4.698460578918457, 'learning_rate': 3.446239900559354e-06, 'epoch': 1.14}
38%|β–ˆβ–ˆβ–ˆβ–Š | 680/1788 [50:22<49:29, 2.68s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 681/1788 [50:24<47:17, 2.56s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 682/1788 [50:30<1:06:41, 3.62s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 683/1788 [50:32<59:26, 3.23s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 684/1788 [50:38<1:11:04, 3.86s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 685/1788 [50:43<1:20:46, 4.39s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 686/1788 [50:46<1:09:11, 3.77s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 687/1788 [50:51<1:17:53, 4.24s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 688/1788 [50:53<1:06:59, 3.65s/it] 39%|β–ˆβ–ˆβ–ˆβ–Š | 689/1788 [50:59<1:17:07, 4.21s/it] 39%|β–ˆβ–ˆβ–ˆβ–Š | 690/1788 [51:01<1:06:40, 3.64s/it] {'loss': 0.064, 'grad_norm': 3.243607759475708, 'learning_rate': 3.415164698570541e-06, 'epoch': 1.16}
39%|β–ˆβ–ˆβ–ˆβ–Š | 690/1788 [51:01<1:06:40, 3.64s/it] 39%|β–ˆβ–ˆβ–ˆβ–Š | 691/1788 [51:06<1:16:14, 4.17s/it] 39%|β–ˆβ–ˆβ–ˆβ–Š | 692/1788 [51:09<1:05:53, 3.61s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 693/1788 [51:14<1:15:14, 4.12s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 694/1788 [51:18<1:12:06, 3.96s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 695/1788 [51:23<1:19:41, 4.37s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 696/1788 [51:26<1:14:03, 4.07s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 697/1788 [51:32<1:20:08, 4.41s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 698/1788 [51:36<1:21:34, 4.49s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 699/1788 [51:39<1:09:46, 3.84s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 700/1788 [51:44<1:17:42, 4.29s/it] {'loss': 0.0733, 'grad_norm': 5.237815856933594, 'learning_rate': 3.3840894965817278e-06, 'epoch': 1.17}
39%|β–ˆβ–ˆβ–ˆβ–‰ | 700/1788 [51:44<1:17:42, 4.29s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 701/1788 [51:50<1:26:18, 4.76s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 702/1788 [51:55<1:30:45, 5.01s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 703/1788 [51:58<1:16:08, 4.21s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 704/1788 [52:02<1:17:30, 4.29s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 705/1788 [52:07<1:18:44, 4.36s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 706/1788 [52:13<1:26:56, 4.82s/it] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 707/1788 [52:17<1:22:36, 4.58s/it] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 708/1788 [52:21<1:22:14, 4.57s/it] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 709/1788 [52:25<1:17:09, 4.29s/it] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 710/1788 [52:29<1:15:39, 4.21s/it] {'loss': 0.072, 'grad_norm': 5.627130508422852, 'learning_rate': 3.353014294592915e-06, 'epoch': 1.19}
40%|β–ˆβ–ˆβ–ˆβ–‰ | 710/1788 [52:29<1:15:39, 4.21s/it] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 711/1788 [52:34<1:18:18, 4.36s/it] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 712/1788 [52:39<1:23:31, 4.66s/it] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 713/1788 [52:43<1:22:16, 4.59s/it] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 714/1788 [52:46<1:09:43, 3.90s/it] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 715/1788 [52:51<1:17:12, 4.32s/it] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 716/1788 [52:53<1:06:29, 3.72s/it] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 717/1788 [52:57<1:08:13, 3.82s/it] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 718/1788 [53:00<59:57, 3.36s/it] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 719/1788 [53:04<1:06:51, 3.75s/it] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 720/1788 [53:09<1:10:21, 3.95s/it] {'loss': 0.0586, 'grad_norm': 3.6567986011505127, 'learning_rate': 3.3219390926041025e-06, 'epoch': 1.21}
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 720/1788 [53:09<1:10:21, 3.95s/it] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 721/1788 [53:14<1:17:50, 4.38s/it] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 722/1788 [53:20<1:26:00, 4.84s/it] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 723/1788 [53:22<1:12:27, 4.08s/it] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 724/1788 [53:26<1:09:50, 3.94s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 725/1788 [53:30<1:12:04, 4.07s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 726/1788 [53:36<1:18:27, 4.43s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 727/1788 [53:38<1:07:07, 3.80s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 728/1788 [53:44<1:17:57, 4.41s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 729/1788 [53:46<1:06:57, 3.79s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 730/1788 [53:50<1:09:10, 3.92s/it] {'loss': 0.0487, 'grad_norm': 6.544238567352295, 'learning_rate': 3.290863890615289e-06, 'epoch': 1.22}
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 730/1788 [53:50<1:09:10, 3.92s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 731/1788 [53:54<1:08:26, 3.88s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 732/1788 [53:56<1:00:00, 3.41s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 733/1788 [54:01<1:06:11, 3.76s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 734/1788 [54:03<58:35, 3.34s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 735/1788 [54:07<1:01:42, 3.52s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 736/1788 [54:12<1:07:56, 3.87s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 737/1788 [54:18<1:20:51, 4.62s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 738/1788 [54:22<1:16:06, 4.35s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 739/1788 [54:24<1:05:09, 3.73s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 740/1788 [54:27<57:21, 3.28s/it] {'loss': 0.0711, 'grad_norm': 3.0342986583709717, 'learning_rate': 3.2597886886264763e-06, 'epoch': 1.24}
41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 740/1788 [54:27<57:21, 3.28s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 741/1788 [54:30<1:00:06, 3.44s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 742/1788 [55:50<7:40:55, 26.44s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 743/1788 [55:56<5:53:32, 20.30s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 744/1788 [56:02<4:36:05, 15.87s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 745/1788 [56:04<3:24:51, 11.79s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 746/1788 [56:06<2:35:15, 8.94s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 747/1788 [56:09<2:00:33, 6.95s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 748/1788 [56:15<1:54:53, 6.63s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 749/1788 [56:18<1:39:31, 5.75s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 750/1788 [56:23<1:35:57, 5.55s/it] {'loss': 0.0848, 'grad_norm': 3.8024253845214844, 'learning_rate': 3.2287134866376635e-06, 'epoch': 1.26}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 750/1788 [56:23<1:35:57, 5.55s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 751/1788 [56:27<1:26:28, 5.00s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 752/1788 [56:29<1:12:21, 4.19s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 753/1788 [56:33<1:09:24, 4.02s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 754/1788 [56:35<1:00:09, 3.49s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 755/1788 [56:39<59:36, 3.46s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 756/1788 [56:41<53:54, 3.13s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 757/1788 [56:43<49:20, 2.87s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 758/1788 [56:46<46:14, 2.69s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 759/1788 [56:51<58:17, 3.40s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 760/1788 [56:53<52:34, 3.07s/it] {'loss': 0.0671, 'grad_norm': 3.3138482570648193, 'learning_rate': 3.1976382846488506e-06, 'epoch': 1.28}
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 760/1788 [56:53<52:34, 3.07s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 761/1788 [56:55<48:27, 2.83s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 762/1788 [56:58<45:24, 2.66s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 763/1788 [57:00<43:20, 2.54s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 764/1788 [57:03<49:10, 2.88s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 765/1788 [57:09<1:02:24, 3.66s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 766/1788 [57:11<55:26, 3.26s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 767/1788 [57:14<50:34, 2.97s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 768/1788 [57:16<47:17, 2.78s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 769/1788 [57:18<44:44, 2.63s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 770/1788 [57:24<58:53, 3.47s/it] {'loss': 0.0771, 'grad_norm': 4.9509992599487305, 'learning_rate': 3.1665630826600373e-06, 'epoch': 1.29}
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 770/1788 [57:24<58:53, 3.47s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 771/1788 [57:28<1:02:28, 3.69s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 772/1788 [57:34<1:14:04, 4.37s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 773/1788 [57:39<1:18:05, 4.62s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 774/1788 [57:41<1:06:18, 3.92s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 775/1788 [57:44<57:44, 3.42s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 776/1788 [57:47<59:39, 3.54s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 777/1788 [57:52<1:05:16, 3.87s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 778/1788 [57:55<1:01:53, 3.68s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 779/1788 [58:00<1:06:51, 3.98s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 780/1788 [58:03<1:04:59, 3.87s/it] {'loss': 0.0844, 'grad_norm': 5.096800327301025, 'learning_rate': 3.135487880671225e-06, 'epoch': 1.31}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 780/1788 [58:03<1:04:59, 3.87s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 781/1788 [58:06<57:06, 3.40s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 782/1788 [58:08<51:26, 3.07s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 783/1788 [58:12<54:42, 3.27s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 784/1788 [58:14<49:45, 2.97s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 785/1788 [58:19<58:37, 3.51s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 786/1788 [58:22<58:22, 3.50s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 787/1788 [58:28<1:11:28, 4.28s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 788/1788 [59:49<7:33:06, 27.19s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 789/1788 [59:52<5:29:24, 19.78s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 790/1788 [59:57<4:17:08, 15.46s/it] {'loss': 0.0855, 'grad_norm': 5.035813808441162, 'learning_rate': 3.104412678682412e-06, 'epoch': 1.33}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 790/1788 [59:57<4:17:08, 15.46s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 791/1788 [59:59<3:11:15, 11.51s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 792/1788 [1:00:04<2:39:47, 9.63s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 793/1788 [1:00:09<2:15:32, 8.17s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 794/1788 [1:00:14<2:00:35, 7.28s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 795/1788 [1:00:18<1:39:53, 6.04s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 796/1788 [1:00:22<1:29:54, 5.44s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 797/1788 [1:00:28<1:32:05, 5.58s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 798/1788 [1:00:30<1:16:16, 4.62s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 799/1788 [1:00:35<1:20:19, 4.87s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 800/1788 [1:00:40<1:18:59, 4.80s/it] {'loss': 0.0794, 'grad_norm': 3.542637348175049, 'learning_rate': 3.0733374766935987e-06, 'epoch': 1.34}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 800/1788 [1:00:40<1:18:59, 4.80s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 801/1788 [1:00:42<1:06:38, 4.05s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 802/1788 [1:00:48<1:13:10, 4.45s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 803/1788 [1:00:52<1:10:53, 4.32s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 804/1788 [1:00:54<59:24, 3.62s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 805/1788 [1:00:58<1:02:42, 3.83s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 806/1788 [1:01:00<55:13, 3.37s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 807/1788 [1:01:03<49:43, 3.04s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 808/1788 [1:01:07<56:06, 3.44s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 809/1788 [1:01:09<50:34, 3.10s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 810/1788 [1:01:13<53:33, 3.29s/it] {'loss': 0.0742, 'grad_norm': 4.676824569702148, 'learning_rate': 3.042262274704786e-06, 'epoch': 1.36}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 810/1788 [1:01:13<53:33, 3.29s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 811/1788 [1:01:18<1:02:18, 3.83s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 812/1788 [1:01:20<54:36, 3.36s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 813/1788 [1:01:26<1:05:26, 4.03s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 814/1788 [1:01:32<1:12:51, 4.49s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 815/1788 [1:01:36<1:12:52, 4.49s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 816/1788 [1:01:41<1:16:01, 4.69s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 817/1788 [1:01:46<1:18:03, 4.82s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 818/1788 [1:01:49<1:05:25, 4.05s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 819/1788 [1:01:51<56:46, 3.52s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 820/1788 [1:01:53<50:19, 3.12s/it] {'loss': 0.0711, 'grad_norm': 1.9335194826126099, 'learning_rate': 3.011187072715973e-06, 'epoch': 1.38}
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 820/1788 [1:01:53<50:19, 3.12s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 821/1788 [1:01:55<46:14, 2.87s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 822/1788 [1:01:59<49:12, 3.06s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 823/1788 [1:02:01<45:24, 2.82s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 824/1788 [1:02:05<49:40, 3.09s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 825/1788 [1:02:17<1:32:08, 5.74s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 826/1788 [1:02:21<1:25:02, 5.30s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 827/1788 [1:02:25<1:19:18, 4.95s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 828/1788 [1:02:30<1:20:21, 5.02s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 829/1788 [1:02:33<1:07:44, 4.24s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 830/1788 [1:02:37<1:09:18, 4.34s/it] {'loss': 0.0884, 'grad_norm': 3.765441656112671, 'learning_rate': 2.9801118707271597e-06, 'epoch': 1.39}
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 830/1788 [1:02:37<1:09:18, 4.34s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 831/1788 [1:02:43<1:14:06, 4.65s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 832/1788 [1:02:45<1:03:29, 3.98s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 833/1788 [1:02:50<1:10:13, 4.41s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 834/1788 [1:02:54<1:07:02, 4.22s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 835/1788 [1:02:58<1:05:46, 4.14s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 836/1788 [1:03:03<1:10:31, 4.44s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 837/1788 [1:03:07<1:08:24, 4.32s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 838/1788 [1:03:10<58:32, 3.70s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 839/1788 [1:03:12<51:45, 3.27s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 840/1788 [1:03:18<1:03:59, 4.05s/it] {'loss': 0.0721, 'grad_norm': 6.309963703155518, 'learning_rate': 2.949036668738347e-06, 'epoch': 1.41}
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 840/1788 [1:03:18<1:03:59, 4.05s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 841/1788 [1:03:20<55:30, 3.52s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 842/1788 [1:03:24<58:35, 3.72s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 843/1788 [1:03:28<58:48, 3.73s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 844/1788 [1:03:30<52:02, 3.31s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 845/1788 [1:03:34<52:56, 3.37s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 846/1788 [1:03:38<57:53, 3.69s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 847/1788 [1:03:44<1:05:44, 4.19s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 848/1788 [1:03:46<56:55, 3.63s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 849/1788 [1:03:49<55:27, 3.54s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 850/1788 [1:03:55<1:05:17, 4.18s/it] {'loss': 0.0762, 'grad_norm': 6.720387935638428, 'learning_rate': 2.9179614667495344e-06, 'epoch': 1.43}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 850/1788 [1:03:55<1:05:17, 4.18s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 851/1788 [1:05:16<7:04:46, 27.20s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 852/1788 [1:05:18<5:08:23, 19.77s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 853/1788 [1:05:24<4:00:55, 15.46s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 854/1788 [1:05:29<3:13:25, 12.43s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 855/1788 [1:05:31<2:26:01, 9.39s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 856/1788 [1:05:36<2:05:47, 8.10s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 857/1788 [1:05:39<1:38:09, 6.33s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 858/1788 [1:05:41<1:19:21, 5.12s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 859/1788 [1:05:46<1:19:13, 5.12s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 860/1788 [1:05:48<1:05:59, 4.27s/it] {'loss': 0.0841, 'grad_norm': 18.289173126220703, 'learning_rate': 2.886886264760721e-06, 'epoch': 1.44}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 860/1788 [1:05:48<1:05:59, 4.27s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 861/1788 [1:05:52<1:02:58, 4.08s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 862/1788 [1:05:54<54:38, 3.54s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 863/1788 [1:05:59<1:01:40, 4.00s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 864/1788 [1:06:05<1:07:52, 4.41s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 865/1788 [1:06:10<1:11:54, 4.67s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 866/1788 [1:06:12<1:00:50, 3.96s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 867/1788 [1:06:17<1:04:04, 4.17s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 868/1788 [1:06:22<1:08:25, 4.46s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 869/1788 [1:06:24<58:29, 3.82s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 870/1788 [1:06:30<1:08:16, 4.46s/it] {'loss': 0.0616, 'grad_norm': 4.238964080810547, 'learning_rate': 2.8558110627719083e-06, 'epoch': 1.46}
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 870/1788 [1:06:30<1:08:16, 4.46s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 871/1788 [1:06:34<1:05:55, 4.31s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 872/1788 [1:06:37<56:31, 3.70s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 873/1788 [1:06:39<49:57, 3.28s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 874/1788 [1:06:41<45:33, 2.99s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 875/1788 [1:06:47<58:55, 3.87s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 876/1788 [1:06:49<51:34, 3.39s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 877/1788 [1:06:55<1:00:07, 3.96s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 878/1788 [1:06:58<58:48, 3.88s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 879/1788 [1:07:10<1:35:02, 6.27s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 880/1788 [1:07:13<1:17:10, 5.10s/it] {'loss': 0.0725, 'grad_norm': 3.0316343307495117, 'learning_rate': 2.8247358607830954e-06, 'epoch': 1.48}
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 880/1788 [1:07:13<1:17:10, 5.10s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 881/1788 [1:07:18<1:18:21, 5.18s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 882/1788 [1:07:24<1:21:37, 5.41s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 883/1788 [1:07:28<1:17:09, 5.12s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 884/1788 [1:07:33<1:14:19, 4.93s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 885/1788 [1:07:35<1:02:13, 4.13s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 886/1788 [1:07:40<1:07:18, 4.48s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 887/1788 [1:07:43<57:20, 3.82s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 888/1788 [1:07:45<50:21, 3.36s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 889/1788 [1:07:50<59:00, 3.94s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 890/1788 [1:07:55<1:04:38, 4.32s/it] {'loss': 0.0897, 'grad_norm': 4.183451175689697, 'learning_rate': 2.793660658794282e-06, 'epoch': 1.49}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 890/1788 [1:07:55<1:04:38, 4.32s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 891/1788 [1:08:01<1:11:50, 4.81s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 892/1788 [1:08:05<1:07:51, 4.54s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 893/1788 [1:08:10<1:08:16, 4.58s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 894/1788 [1:08:15<1:10:27, 4.73s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 895/1788 [1:08:17<59:28, 4.00s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 896/1788 [1:08:22<1:04:03, 4.31s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 897/1788 [1:08:25<54:51, 3.69s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 898/1788 [1:08:27<48:41, 3.28s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 899/1788 [1:08:29<43:56, 2.97s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 900/1788 [1:08:33<49:03, 3.31s/it] {'loss': 0.05, 'grad_norm': 5.280019760131836, 'learning_rate': 2.7625854568054692e-06, 'epoch': 1.51}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 900/1788 [1:08:33<49:03, 3.31s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 901/1788 [1:08:39<57:18, 3.88s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 902/1788 [1:08:41<50:28, 3.42s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 903/1788 [1:08:43<45:30, 3.09s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 904/1788 [1:08:46<46:05, 3.13s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 905/1788 [1:08:52<54:39, 3.71s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 906/1788 [1:08:54<48:14, 3.28s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 907/1788 [1:08:58<52:20, 3.56s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 908/1788 [1:09:03<57:19, 3.91s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 909/1788 [1:09:05<50:10, 3.43s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 910/1788 [1:09:09<51:52, 3.54s/it] {'loss': 0.0583, 'grad_norm': 1.187185525894165, 'learning_rate': 2.731510254816657e-06, 'epoch': 1.53}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 910/1788 [1:09:09<51:52, 3.54s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 911/1788 [1:09:11<46:13, 3.16s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 912/1788 [1:09:16<55:37, 3.81s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 913/1788 [1:09:19<49:03, 3.36s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 914/1788 [1:09:22<50:32, 3.47s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 915/1788 [1:09:26<51:21, 3.53s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 916/1788 [1:09:28<45:59, 3.16s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 917/1788 [1:09:34<55:30, 3.82s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 918/1788 [1:09:36<48:50, 3.37s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 919/1788 [1:09:40<51:35, 3.56s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 920/1788 [1:09:45<58:25, 4.04s/it] {'loss': 0.0804, 'grad_norm': 20.968687057495117, 'learning_rate': 2.7004350528278435e-06, 'epoch': 1.54}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 920/1788 [1:09:45<58:25, 4.04s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 921/1788 [1:09:49<56:41, 3.92s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 922/1788 [1:09:51<49:32, 3.43s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 923/1788 [1:09:56<55:16, 3.83s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 924/1788 [1:10:01<1:02:08, 4.32s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 925/1788 [1:10:07<1:09:17, 4.82s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 926/1788 [1:10:12<1:08:57, 4.80s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 927/1788 [1:10:15<58:31, 4.08s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 928/1788 [1:10:17<50:43, 3.54s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 929/1788 [1:10:19<45:37, 3.19s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 930/1788 [1:10:24<50:59, 3.57s/it] {'loss': 0.0827, 'grad_norm': 4.996553421020508, 'learning_rate': 2.6693598508390307e-06, 'epoch': 1.56}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 930/1788 [1:10:24<50:59, 3.57s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 931/1788 [1:10:28<53:05, 3.72s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 932/1788 [1:10:32<54:21, 3.81s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 933/1788 [1:10:36<55:27, 3.89s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 934/1788 [1:10:40<56:41, 3.98s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 935/1788 [1:10:42<49:54, 3.51s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 936/1788 [1:10:48<58:33, 4.12s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 937/1788 [1:10:54<1:06:05, 4.66s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 938/1788 [1:10:58<1:04:50, 4.58s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 939/1788 [1:11:01<54:58, 3.89s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 940/1788 [1:11:03<48:15, 3.41s/it] {'loss': 0.0541, 'grad_norm': 1.915022611618042, 'learning_rate': 2.638284648850218e-06, 'epoch': 1.58}
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 940/1788 [1:11:03<48:15, 3.41s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 941/1788 [1:11:06<48:21, 3.43s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 942/1788 [1:11:11<53:54, 3.82s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 943/1788 [1:11:13<47:30, 3.37s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 944/1788 [1:11:19<55:04, 3.92s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 945/1788 [1:11:22<51:05, 3.64s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 946/1788 [1:11:24<45:26, 3.24s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 947/1788 [1:11:29<54:31, 3.89s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 948/1788 [1:11:32<51:27, 3.68s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 949/1788 [1:11:35<45:32, 3.26s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 950/1788 [1:11:39<49:13, 3.52s/it] {'loss': 0.0576, 'grad_norm': 3.948878288269043, 'learning_rate': 2.6072094468614045e-06, 'epoch': 1.59}
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 950/1788 [1:11:39<49:13, 3.52s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 951/1788 [1:11:44<57:15, 4.10s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 952/1788 [1:11:50<1:01:36, 4.42s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 953/1788 [1:11:55<1:04:56, 4.67s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 954/1788 [1:12:00<1:05:07, 4.69s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 955/1788 [1:12:02<55:04, 3.97s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 956/1788 [1:12:04<48:14, 3.48s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 957/1788 [1:12:09<53:12, 3.84s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 958/1788 [1:12:14<59:27, 4.30s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 959/1788 [1:12:19<59:31, 4.31s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 960/1788 [1:12:23<1:01:10, 4.43s/it] {'loss': 0.0616, 'grad_norm': 2.889896869659424, 'learning_rate': 2.5761342448725917e-06, 'epoch': 1.61}
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 960/1788 [1:12:23<1:01:10, 4.43s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 961/1788 [1:12:29<1:07:11, 4.87s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 962/1788 [1:12:35<1:09:30, 5.05s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 963/1788 [1:12:39<1:05:21, 4.75s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 964/1788 [1:12:41<55:11, 4.02s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 965/1788 [1:12:46<1:01:08, 4.46s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 966/1788 [1:12:52<1:03:42, 4.65s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 967/1788 [1:12:56<1:03:56, 4.67s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 968/1788 [1:12:59<54:07, 3.96s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 969/1788 [1:13:04<59:55, 4.39s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 970/1788 [1:13:09<1:04:10, 4.71s/it] {'loss': 0.0979, 'grad_norm': 5.543972015380859, 'learning_rate': 2.5450590428837792e-06, 'epoch': 1.63}
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 970/1788 [1:13:09<1:04:10, 4.71s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 971/1788 [1:13:15<1:08:56, 5.06s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 972/1788 [1:13:21<1:11:13, 5.24s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 973/1788 [1:13:26<1:10:56, 5.22s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 974/1788 [1:13:30<1:03:24, 4.67s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 975/1788 [1:13:41<1:32:30, 6.83s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 976/1788 [1:13:46<1:23:27, 6.17s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 977/1788 [1:13:48<1:07:43, 5.01s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 978/1788 [1:13:54<1:09:15, 5.13s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 979/1788 [1:13:57<1:02:00, 4.60s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 980/1788 [1:14:01<58:09, 4.32s/it] {'loss': 0.0781, 'grad_norm': 3.9589877128601074, 'learning_rate': 2.5139838408949664e-06, 'epoch': 1.64}
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 980/1788 [1:14:01<58:09, 4.32s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 981/1788 [1:14:09<1:13:12, 5.44s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 982/1788 [1:14:11<1:00:35, 4.51s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 983/1788 [1:14:14<51:50, 3.86s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 984/1788 [1:14:16<45:26, 3.39s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 985/1788 [1:14:18<40:53, 3.06s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 986/1788 [1:14:20<37:41, 2.82s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 987/1788 [1:14:24<40:37, 3.04s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 988/1788 [1:14:29<47:02, 3.53s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 989/1788 [1:14:32<48:28, 3.64s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 990/1788 [1:14:38<54:30, 4.10s/it] {'loss': 0.0802, 'grad_norm': 10.734256744384766, 'learning_rate': 2.482908638906153e-06, 'epoch': 1.66}
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 990/1788 [1:14:38<54:30, 4.10s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 991/1788 [1:14:43<59:02, 4.45s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 992/1788 [1:14:45<50:22, 3.80s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 993/1788 [1:14:51<56:41, 4.28s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 994/1788 [1:14:54<54:06, 4.09s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 995/1788 [1:15:00<58:55, 4.46s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 996/1788 [1:15:04<57:05, 4.32s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 997/1788 [1:15:08<57:53, 4.39s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 998/1788 [1:15:12<57:37, 4.38s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 999/1788 [1:15:15<49:29, 3.76s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1000/1788 [1:15:27<1:21:29, 6.20s/it] {'loss': 0.0659, 'grad_norm': 9.078588485717773, 'learning_rate': 2.45183343691734e-06, 'epoch': 1.68}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1000/1788 [1:15:27<1:21:29, 6.20s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1001/1788 [1:15:30<1:11:32, 5.45s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1002/1788 [1:15:33<58:56, 4.50s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1003/1788 [1:15:38<1:02:10, 4.75s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1004/1788 [1:15:43<1:01:33, 4.71s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1005/1788 [1:15:45<52:01, 3.99s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1006/1788 [1:15:47<45:16, 3.47s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1007/1788 [1:15:49<40:30, 3.11s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1008/1788 [1:15:52<37:19, 2.87s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1009/1788 [1:15:57<44:41, 3.44s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1010/1788 [1:16:01<49:33, 3.82s/it] {'loss': 0.0703, 'grad_norm': 4.64329195022583, 'learning_rate': 2.4207582349285273e-06, 'epoch': 1.69}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1010/1788 [1:16:01<49:33, 3.82s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1011/1788 [1:16:04<43:34, 3.36s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1012/1788 [1:16:07<43:47, 3.39s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1013/1788 [1:16:09<39:31, 3.06s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1014/1788 [1:16:12<36:23, 2.82s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1015/1788 [1:16:16<43:53, 3.41s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1016/1788 [1:16:19<39:32, 3.07s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1017/1788 [1:16:21<36:47, 2.86s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1018/1788 [1:16:26<43:52, 3.42s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1019/1788 [1:16:29<43:47, 3.42s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1020/1788 [1:16:31<39:27, 3.08s/it] {'loss': 0.0645, 'grad_norm': 4.664200305938721, 'learning_rate': 2.3896830329397145e-06, 'epoch': 1.71}
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1020/1788 [1:16:31<39:27, 3.08s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1021/1788 [1:16:36<46:58, 3.67s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1022/1788 [1:16:40<46:33, 3.65s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1023/1788 [1:16:44<46:31, 3.65s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1024/1788 [1:16:50<55:12, 4.34s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1025/1788 [1:16:56<1:01:12, 4.81s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1026/1788 [1:17:00<57:50, 4.55s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1027/1788 [1:17:04<55:56, 4.41s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1028/1788 [1:17:08<57:31, 4.54s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1029/1788 [1:17:14<1:00:54, 4.82s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1030/1788 [1:17:18<59:56, 4.74s/it] {'loss': 0.0696, 'grad_norm': 4.186990737915039, 'learning_rate': 2.358607830950901e-06, 'epoch': 1.73}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1030/1788 [1:17:18<59:56, 4.74s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1031/1788 [1:17:23<58:43, 4.65s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1032/1788 [1:17:28<1:01:21, 4.87s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1033/1788 [1:17:34<1:03:12, 5.02s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1034/1788 [1:17:36<52:58, 4.22s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1035/1788 [1:17:41<55:35, 4.43s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1036/1788 [1:17:43<47:34, 3.80s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1037/1788 [1:17:49<53:46, 4.30s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1038/1788 [1:17:54<58:39, 4.69s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1039/1788 [1:17:58<54:57, 4.40s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1040/1788 [1:18:00<46:54, 3.76s/it] {'loss': 0.0783, 'grad_norm': 5.212913990020752, 'learning_rate': 2.3275326289620883e-06, 'epoch': 1.74}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1040/1788 [1:18:00<46:54, 3.76s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1041/1788 [1:18:06<52:33, 4.22s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1042/1788 [1:18:08<45:25, 3.65s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1043/1788 [1:18:13<51:50, 4.17s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1044/1788 [1:18:19<57:08, 4.61s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1045/1788 [1:18:23<53:19, 4.31s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1046/1788 [1:18:25<45:52, 3.71s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1047/1788 [1:18:29<47:42, 3.86s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1048/1788 [1:18:34<50:25, 4.09s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1049/1788 [1:18:38<52:28, 4.26s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1050/1788 [1:18:44<56:27, 4.59s/it] {'loss': 0.0686, 'grad_norm': 2.900564432144165, 'learning_rate': 2.2964574269732755e-06, 'epoch': 1.76}
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1050/1788 [1:18:44<56:27, 4.59s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1051/1788 [1:18:48<54:50, 4.46s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1052/1788 [1:18:50<46:42, 3.81s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1053/1788 [1:18:54<46:57, 3.83s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1054/1788 [1:18:58<46:39, 3.81s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1055/1788 [1:19:03<52:25, 4.29s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1056/1788 [1:19:08<53:54, 4.42s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1057/1788 [1:19:11<49:20, 4.05s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1058/1788 [1:19:14<43:10, 3.55s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1059/1788 [1:19:25<1:13:31, 6.05s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1060/1788 [1:19:31<1:10:57, 5.85s/it] {'loss': 0.0802, 'grad_norm': 2.744717597961426, 'learning_rate': 2.2653822249844626e-06, 'epoch': 1.78}
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1060/1788 [1:19:31<1:10:57, 5.85s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1061/1788 [1:19:37<1:11:18, 5.89s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1062/1788 [1:19:39<58:17, 4.82s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1063/1788 [1:19:51<1:23:57, 6.95s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1064/1788 [1:19:56<1:18:23, 6.50s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1065/1788 [1:19:59<1:03:13, 5.25s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1066/1788 [1:20:04<1:02:38, 5.21s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1067/1788 [1:20:06<51:58, 4.33s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1068/1788 [1:20:08<44:14, 3.69s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1069/1788 [1:20:11<38:42, 3.23s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1070/1788 [1:20:13<35:09, 2.94s/it] {'loss': 0.0828, 'grad_norm': 4.751100540161133, 'learning_rate': 2.2343070229956497e-06, 'epoch': 1.8}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1070/1788 [1:20:13<35:09, 2.94s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1071/1788 [1:20:15<32:40, 2.73s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1072/1788 [1:20:17<31:07, 2.61s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1073/1788 [1:20:20<29:57, 2.51s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1074/1788 [1:20:22<29:08, 2.45s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1075/1788 [1:20:25<32:59, 2.78s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1076/1788 [1:20:32<44:56, 3.79s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1077/1788 [1:20:38<52:53, 4.46s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1078/1788 [1:20:40<45:14, 3.82s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1079/1788 [1:20:46<52:37, 4.45s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1080/1788 [1:20:48<44:55, 3.81s/it] {'loss': 0.0662, 'grad_norm': 3.732292890548706, 'learning_rate': 2.203231821006837e-06, 'epoch': 1.81}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1080/1788 [1:20:48<44:55, 3.81s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1081/1788 [1:20:51<39:28, 3.35s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1082/1788 [1:21:02<1:09:36, 5.92s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1083/1788 [1:21:08<1:07:28, 5.74s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1084/1788 [1:21:13<1:05:55, 5.62s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1085/1788 [1:21:18<1:04:05, 5.47s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1086/1788 [1:21:23<1:01:39, 5.27s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1087/1788 [1:21:28<1:01:17, 5.25s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1088/1788 [1:21:32<55:40, 4.77s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1089/1788 [1:21:34<46:57, 4.03s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1090/1788 [1:21:36<40:37, 3.49s/it] {'loss': 0.0648, 'grad_norm': 5.761747360229492, 'learning_rate': 2.1721566190180236e-06, 'epoch': 1.83}
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1090/1788 [1:21:36<40:37, 3.49s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1091/1788 [1:21:39<36:13, 3.12s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1092/1788 [1:21:41<33:19, 2.87s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1093/1788 [1:21:44<35:25, 3.06s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1094/1788 [1:21:50<45:14, 3.91s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1095/1788 [1:21:53<39:38, 3.43s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1096/1788 [1:21:56<40:35, 3.52s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1097/1788 [1:21:59<36:20, 3.15s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1098/1788 [1:22:01<33:15, 2.89s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1099/1788 [1:22:05<35:32, 3.09s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1100/1788 [1:22:10<43:01, 3.75s/it] {'loss': 0.0577, 'grad_norm': 2.7222814559936523, 'learning_rate': 2.1410814170292107e-06, 'epoch': 1.85}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1100/1788 [1:22:10<43:01, 3.75s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1101/1788 [1:22:12<37:51, 3.31s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1102/1788 [1:22:18<45:03, 3.94s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1103/1788 [1:22:23<49:16, 4.32s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1104/1788 [1:22:27<50:20, 4.42s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1105/1788 [1:22:30<43:01, 3.78s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1106/1788 [1:22:36<50:17, 4.42s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1107/1788 [1:22:40<50:39, 4.46s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1108/1788 [1:22:46<55:51, 4.93s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1109/1788 [1:22:51<56:50, 5.02s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1110/1788 [1:22:57<57:58, 5.13s/it] {'loss': 0.0754, 'grad_norm': 5.658672332763672, 'learning_rate': 2.110006215040398e-06, 'epoch': 1.86}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1110/1788 [1:22:57<57:58, 5.13s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1111/1788 [1:22:59<48:36, 4.31s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1112/1788 [1:23:04<49:52, 4.43s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1113/1788 [1:23:08<50:14, 4.47s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1114/1788 [1:23:11<42:53, 3.82s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1115/1788 [1:23:17<49:58, 4.46s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1116/1788 [1:23:22<52:25, 4.68s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1117/1788 [1:23:25<48:22, 4.33s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1118/1788 [1:23:37<1:13:42, 6.60s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1119/1788 [1:23:41<1:04:36, 5.79s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1120/1788 [1:23:43<52:48, 4.74s/it] {'loss': 0.0726, 'grad_norm': 8.959904670715332, 'learning_rate': 2.078931013051585e-06, 'epoch': 1.88}
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1120/1788 [1:23:43<52:48, 4.74s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1121/1788 [1:23:48<52:13, 4.70s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1122/1788 [1:23:50<44:12, 3.98s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1123/1788 [1:23:56<48:11, 4.35s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1124/1788 [1:24:00<47:03, 4.25s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1125/1788 [1:24:02<40:34, 3.67s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1126/1788 [1:24:06<43:07, 3.91s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1127/1788 [1:24:10<41:04, 3.73s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1128/1788 [1:24:12<36:22, 3.31s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1129/1788 [1:24:15<36:42, 3.34s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1130/1788 [1:24:18<33:09, 3.02s/it] {'loss': 0.079, 'grad_norm': 5.814493179321289, 'learning_rate': 2.047855811062772e-06, 'epoch': 1.9}
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1130/1788 [1:24:18<33:09, 3.02s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1131/1788 [1:24:20<30:53, 2.82s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1132/1788 [1:24:22<28:57, 2.65s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1133/1788 [1:24:28<40:11, 3.68s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1134/1788 [1:24:31<35:58, 3.30s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1135/1788 [1:24:34<36:59, 3.40s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1136/1788 [1:24:37<33:21, 3.07s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1137/1788 [1:24:39<30:54, 2.85s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1138/1788 [1:24:41<29:06, 2.69s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1139/1788 [1:24:44<27:42, 2.56s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1140/1788 [1:24:48<34:36, 3.20s/it] {'loss': 0.0761, 'grad_norm': 6.705260276794434, 'learning_rate': 2.0167806090739593e-06, 'epoch': 1.91}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1140/1788 [1:24:48<34:36, 3.20s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1141/1788 [1:24:54<43:29, 4.03s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1142/1788 [1:25:00<47:07, 4.38s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1143/1788 [1:25:04<47:51, 4.45s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1144/1788 [1:25:06<40:46, 3.80s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1145/1788 [1:25:12<46:32, 4.34s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1146/1788 [1:25:14<39:57, 3.73s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1147/1788 [1:25:18<39:42, 3.72s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1148/1788 [1:25:23<44:17, 4.15s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1149/1788 [1:25:25<38:17, 3.60s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1150/1788 [1:25:28<34:12, 3.22s/it] {'loss': 0.0842, 'grad_norm': 2.0799975395202637, 'learning_rate': 1.985705407085146e-06, 'epoch': 1.93}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1150/1788 [1:25:28<34:12, 3.22s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1151/1788 [1:25:32<36:34, 3.45s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1152/1788 [1:25:36<37:46, 3.56s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1153/1788 [1:25:41<43:22, 4.10s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1154/1788 [1:25:43<37:28, 3.55s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1155/1788 [1:25:48<42:33, 4.03s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1156/1788 [1:25:54<46:32, 4.42s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1157/1788 [1:25:57<43:39, 4.15s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1158/1788 [1:26:00<37:41, 3.59s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1159/1788 [1:26:02<33:25, 3.19s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1160/1788 [1:26:04<30:21, 2.90s/it] {'loss': 0.0668, 'grad_norm': 4.491452217102051, 'learning_rate': 1.9546302050963336e-06, 'epoch': 1.95}
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1160/1788 [1:26:04<30:21, 2.90s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1161/1788 [1:26:06<28:31, 2.73s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1162/1788 [1:26:10<31:19, 3.00s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1163/1788 [1:26:13<31:39, 3.04s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1164/1788 [1:26:18<37:15, 3.58s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1165/1788 [1:26:22<37:21, 3.60s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1166/1788 [1:26:24<33:48, 3.26s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1167/1788 [1:26:26<30:49, 2.98s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1168/1788 [1:26:32<37:40, 3.65s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1169/1788 [1:26:34<33:20, 3.23s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1170/1788 [1:26:38<37:29, 3.64s/it] {'loss': 0.0745, 'grad_norm': 4.367901802062988, 'learning_rate': 1.9235550031075203e-06, 'epoch': 1.96}
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1170/1788 [1:26:38<37:29, 3.64s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1171/1788 [1:26:41<33:33, 3.26s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1172/1788 [1:26:46<39:17, 3.83s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1173/1788 [1:26:48<34:12, 3.34s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1174/1788 [1:26:50<30:50, 3.01s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1175/1788 [1:26:53<28:31, 2.79s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1176/1788 [1:26:58<35:41, 3.50s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1177/1788 [1:27:03<41:11, 4.05s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1178/1788 [1:27:05<35:47, 3.52s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1179/1788 [1:27:10<39:48, 3.92s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1180/1788 [1:27:14<39:26, 3.89s/it] {'loss': 0.0709, 'grad_norm': 3.3935399055480957, 'learning_rate': 1.8924798011187074e-06, 'epoch': 1.98}
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1180/1788 [1:27:14<39:26, 3.89s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1181/1788 [1:27:16<34:22, 3.40s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1182/1788 [1:27:19<30:58, 3.07s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1183/1788 [1:27:21<28:36, 2.84s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1184/1788 [1:27:23<26:48, 2.66s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1185/1788 [1:27:26<25:29, 2.54s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1186/1788 [1:27:28<25:06, 2.50s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1187/1788 [1:27:31<27:40, 2.76s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1188/1788 [1:27:34<26:18, 2.63s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1189/1788 [1:27:36<25:04, 2.51s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1190/1788 [1:27:38<24:23, 2.45s/it] {'loss': 0.0538, 'grad_norm': 3.4000391960144043, 'learning_rate': 1.8614045991298946e-06, 'epoch': 2.0}
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1190/1788 [1:27:38<24:23, 2.45s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1191/1788 [1:27:44<33:11, 3.34s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1192/1788 [1:27:45<28:40, 2.89s/it]---------------------------*Rank 7: refresh data---------------------------
---------------------------*Rank 0: refresh data---------------------------
---------------------------*Rank 2: refresh data---------------------------
---------------------------*Rank 6: refresh data---------------------------
---------------------------*Rank 3: refresh data---------------------------
---------------------------*Rank 5: refresh data---------------------------
---------------------------*Rank 1: refresh data---------------------------
---------------------------*Rank 4: refresh data---------------------------
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1193/1788 [1:27:51<36:45, 3.71s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1194/1788 [1:27:55<37:09, 3.75s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1195/1788 [1:28:00<41:59, 4.25s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1196/1788 [1:28:06<46:57, 4.76s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1197/1788 [1:28:10<44:48, 4.55s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1198/1788 [1:28:14<43:12, 4.39s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1199/1788 [1:28:17<37:01, 3.77s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1200/1788 [1:28:21<38:03, 3.88s/it] {'loss': 0.064, 'grad_norm': 1.7232855558395386, 'learning_rate': 1.8303293971410815e-06, 'epoch': 2.01}
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1200/1788 [1:28:21<38:03, 3.88s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1201/1788 [1:28:25<37:48, 3.86s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1202/1788 [1:28:27<33:13, 3.40s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1203/1788 [1:28:31<33:46, 3.46s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1204/1788 [1:28:33<30:15, 3.11s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1205/1788 [1:28:35<27:50, 2.87s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1206/1788 [1:28:41<35:35, 3.67s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1207/1788 [1:30:01<4:18:51, 26.73s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1208/1788 [1:30:05<3:11:42, 19.83s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1209/1788 [1:30:11<2:31:03, 15.65s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1210/1788 [1:30:13<1:52:07, 11.64s/it] {'loss': 0.07, 'grad_norm': 5.242465496063232, 'learning_rate': 1.7992541951522686e-06, 'epoch': 2.03}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1210/1788 [1:30:13<1:52:07, 11.64s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1211/1788 [1:30:18<1:33:53, 9.76s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1212/1788 [1:30:23<1:17:08, 8.04s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1213/1788 [1:30:26<1:05:19, 6.82s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1214/1788 [1:30:29<52:18, 5.47s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1215/1788 [1:30:33<49:49, 5.22s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1216/1788 [1:30:39<51:18, 5.38s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1217/1788 [1:30:42<42:28, 4.46s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1218/1788 [1:30:47<46:35, 4.90s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1219/1788 [1:30:53<49:26, 5.21s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1220/1788 [1:30:58<47:54, 5.06s/it] {'loss': 0.0565, 'grad_norm': 36.61115264892578, 'learning_rate': 1.7681789931634558e-06, 'epoch': 2.05}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1220/1788 [1:30:58<47:54, 5.06s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1221/1788 [1:31:00<39:57, 4.23s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1222/1788 [1:31:03<34:31, 3.66s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1223/1788 [1:31:05<30:31, 3.24s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1224/1788 [1:31:11<37:11, 3.96s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1225/1788 [1:31:15<37:16, 3.97s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1226/1788 [1:31:20<41:08, 4.39s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1227/1788 [1:31:24<38:57, 4.17s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1228/1788 [1:31:28<38:30, 4.13s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1229/1788 [1:31:31<37:27, 4.02s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1230/1788 [1:31:34<32:34, 3.50s/it] {'loss': 0.04, 'grad_norm': 13.257882118225098, 'learning_rate': 1.7371037911746427e-06, 'epoch': 2.06}
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1230/1788 [1:31:34<32:34, 3.50s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1231/1788 [1:31:36<29:04, 3.13s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1232/1788 [1:31:38<26:40, 2.88s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1233/1788 [1:31:40<24:48, 2.68s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1234/1788 [1:31:43<23:38, 2.56s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1235/1788 [1:31:48<31:17, 3.39s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1236/1788 [1:31:50<28:06, 3.06s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1237/1788 [1:31:53<25:50, 2.81s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1238/1788 [1:31:55<24:49, 2.71s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1239/1788 [1:32:00<32:05, 3.51s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1240/1788 [1:32:03<28:27, 3.12s/it] {'loss': 0.0496, 'grad_norm': 1.6459987163543701, 'learning_rate': 1.70602858918583e-06, 'epoch': 2.08}
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1240/1788 [1:32:03<28:27, 3.12s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1241/1788 [1:32:06<28:28, 3.12s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1242/1788 [1:32:11<33:43, 3.71s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1243/1788 [1:32:13<29:54, 3.29s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1244/1788 [1:32:18<33:07, 3.65s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1245/1788 [1:32:20<29:27, 3.25s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1246/1788 [1:32:32<52:44, 5.84s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1247/1788 [1:32:44<1:08:58, 7.65s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1248/1788 [1:32:47<56:43, 6.30s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1249/1788 [1:32:49<45:47, 5.10s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1250/1788 [1:32:54<46:09, 5.15s/it] {'loss': 0.0748, 'grad_norm': 3.4806902408599854, 'learning_rate': 1.674953387197017e-06, 'epoch': 2.1}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1250/1788 [1:32:54<46:09, 5.15s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1251/1788 [1:32:58<41:54, 4.68s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1252/1788 [1:33:03<41:46, 4.68s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1253/1788 [1:33:08<43:39, 4.90s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1254/1788 [1:33:13<43:06, 4.84s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1255/1788 [1:33:18<44:01, 4.96s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1256/1788 [1:33:24<45:19, 5.11s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1257/1788 [1:33:30<47:37, 5.38s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1258/1788 [1:33:35<46:56, 5.31s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1259/1788 [1:33:39<43:20, 4.92s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1260/1788 [1:33:43<42:41, 4.85s/it] {'loss': 0.0807, 'grad_norm': 3.7441959381103516, 'learning_rate': 1.6438781852082039e-06, 'epoch': 2.11}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1260/1788 [1:33:43<42:41, 4.85s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1261/1788 [1:33:49<44:27, 5.06s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1262/1788 [1:33:51<37:09, 4.24s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1263/1788 [1:33:57<40:40, 4.65s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1264/1788 [1:34:01<37:53, 4.34s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1265/1788 [1:34:03<32:28, 3.73s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1266/1788 [1:34:08<36:35, 4.21s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1267/1788 [1:34:10<31:41, 3.65s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1268/1788 [1:34:16<35:43, 4.12s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1269/1788 [1:34:20<36:19, 4.20s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1270/1788 [1:34:22<31:24, 3.64s/it] {'loss': 0.0674, 'grad_norm': 4.848166465759277, 'learning_rate': 1.6128029832193912e-06, 'epoch': 2.13}
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1270/1788 [1:34:22<31:24, 3.64s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1271/1788 [1:34:26<30:56, 3.59s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1272/1788 [1:34:31<35:30, 4.13s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1273/1788 [1:34:37<38:51, 4.53s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1274/1788 [1:34:43<42:27, 4.96s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1275/1788 [1:34:46<38:13, 4.47s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1276/1788 [1:34:50<36:10, 4.24s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1277/1788 [1:34:55<39:17, 4.61s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1278/1788 [1:34:58<33:17, 3.92s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1279/1788 [1:35:03<36:08, 4.26s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1280/1788 [1:35:05<31:08, 3.68s/it] {'loss': 0.0667, 'grad_norm': 4.819372653961182, 'learning_rate': 1.5817277812305782e-06, 'epoch': 2.15}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1280/1788 [1:35:05<31:08, 3.68s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1281/1788 [1:35:10<33:33, 3.97s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1282/1788 [1:35:14<34:08, 4.05s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1283/1788 [1:35:16<29:42, 3.53s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1284/1788 [1:35:21<33:37, 4.00s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1285/1788 [1:35:24<29:18, 3.50s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1286/1788 [1:35:29<33:30, 4.00s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1287/1788 [1:35:33<34:57, 4.19s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1288/1788 [1:35:38<37:21, 4.48s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1289/1788 [1:35:43<37:56, 4.56s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1290/1788 [1:35:46<32:42, 3.94s/it] {'loss': 0.0446, 'grad_norm': 2.3166191577911377, 'learning_rate': 1.550652579241765e-06, 'epoch': 2.16}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1290/1788 [1:35:46<32:42, 3.94s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1291/1788 [1:35:48<28:20, 3.42s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1292/1788 [1:35:50<25:28, 3.08s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1293/1788 [1:35:55<29:27, 3.57s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1294/1788 [1:36:00<33:16, 4.04s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1295/1788 [1:36:05<36:02, 4.39s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1296/1788 [1:36:10<36:38, 4.47s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1297/1788 [1:36:12<31:13, 3.82s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1298/1788 [1:36:18<34:50, 4.27s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1299/1788 [1:36:20<29:55, 3.67s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1300/1788 [1:36:22<26:33, 3.27s/it] {'loss': 0.0521, 'grad_norm': 4.0214924812316895, 'learning_rate': 1.5195773772529524e-06, 'epoch': 2.18}
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1300/1788 [1:36:22<26:33, 3.27s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1301/1788 [1:36:26<27:39, 3.41s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1302/1788 [1:36:28<24:41, 3.05s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1303/1788 [1:36:30<22:51, 2.83s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1304/1788 [1:36:35<26:40, 3.31s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1305/1788 [1:36:40<31:40, 3.93s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1306/1788 [1:36:43<29:59, 3.73s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1307/1788 [1:36:46<26:38, 3.32s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1308/1788 [1:36:50<28:15, 3.53s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1309/1788 [1:36:54<30:47, 3.86s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1310/1788 [1:37:00<35:02, 4.40s/it] {'loss': 0.0476, 'grad_norm': 5.766311168670654, 'learning_rate': 1.4885021752641394e-06, 'epoch': 2.2}
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1310/1788 [1:37:00<35:02, 4.40s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1311/1788 [1:37:02<29:55, 3.76s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1312/1788 [1:37:05<26:16, 3.31s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1313/1788 [1:37:07<23:53, 3.02s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1314/1788 [1:37:12<29:33, 3.74s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1315/1788 [1:37:15<26:05, 3.31s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1316/1788 [1:37:17<23:33, 3.00s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1317/1788 [1:37:19<21:49, 2.78s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1318/1788 [1:37:25<27:30, 3.51s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1319/1788 [1:37:36<47:05, 6.02s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1320/1788 [1:37:42<47:00, 6.03s/it] {'loss': 0.057, 'grad_norm': 3.2234015464782715, 'learning_rate': 1.4574269732753263e-06, 'epoch': 2.21}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1320/1788 [1:37:42<47:00, 6.03s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1321/1788 [1:37:46<41:22, 5.31s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1322/1788 [1:37:51<39:44, 5.12s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1323/1788 [1:37:53<33:05, 4.27s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1324/1788 [1:37:57<31:50, 4.12s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1325/1788 [1:37:59<27:35, 3.58s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1326/1788 [1:38:05<32:57, 4.28s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1327/1788 [1:38:10<34:59, 4.55s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1328/1788 [1:38:15<34:41, 4.53s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1329/1788 [1:38:19<33:23, 4.37s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1330/1788 [1:38:25<36:46, 4.82s/it] {'loss': 0.0689, 'grad_norm': 3.8104610443115234, 'learning_rate': 1.4263517712865136e-06, 'epoch': 2.23}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1330/1788 [1:38:25<36:46, 4.82s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1331/1788 [1:38:27<30:58, 4.07s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1332/1788 [1:38:31<30:57, 4.07s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1333/1788 [1:38:36<32:21, 4.27s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1334/1788 [1:38:38<27:37, 3.65s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1335/1788 [1:38:40<24:32, 3.25s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1336/1788 [1:38:44<25:18, 3.36s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1337/1788 [1:38:46<22:39, 3.01s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1338/1788 [1:38:51<27:55, 3.72s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1339/1788 [1:38:56<29:22, 3.93s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1340/1788 [1:39:00<30:14, 4.05s/it] {'loss': 0.0633, 'grad_norm': 2.6560747623443604, 'learning_rate': 1.3952765692977006e-06, 'epoch': 2.25}
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1340/1788 [1:39:00<30:14, 4.05s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1341/1788 [1:39:02<26:09, 3.51s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1342/1788 [1:39:05<23:17, 3.13s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1343/1788 [1:39:08<23:23, 3.15s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1344/1788 [1:39:12<25:40, 3.47s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1345/1788 [1:39:17<29:24, 3.98s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1346/1788 [1:39:21<29:21, 3.99s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1347/1788 [1:39:24<25:33, 3.48s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1348/1788 [1:39:28<26:50, 3.66s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1349/1788 [1:39:32<29:14, 4.00s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1350/1788 [1:39:35<25:24, 3.48s/it] {'loss': 0.0526, 'grad_norm': 3.2441322803497314, 'learning_rate': 1.3642013673088877e-06, 'epoch': 2.27}
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1350/1788 [1:39:35<25:24, 3.48s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1351/1788 [1:39:37<22:51, 3.14s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1352/1788 [1:39:42<27:45, 3.82s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1353/1788 [1:39:46<28:10, 3.89s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1354/1788 [1:39:52<31:25, 4.34s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1355/1788 [1:39:54<27:00, 3.74s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1356/1788 [1:39:59<28:45, 3.99s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1357/1788 [1:40:04<31:30, 4.39s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1358/1788 [1:40:06<27:07, 3.79s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1359/1788 [1:40:12<31:39, 4.43s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1360/1788 [1:40:18<33:45, 4.73s/it] {'loss': 0.056, 'grad_norm': 4.1473517417907715, 'learning_rate': 1.3331261653200746e-06, 'epoch': 2.28}
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1360/1788 [1:40:18<33:45, 4.73s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1361/1788 [1:40:24<36:05, 5.07s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1362/1788 [1:40:26<30:05, 4.24s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1363/1788 [1:40:31<31:03, 4.38s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1364/1788 [1:40:36<33:07, 4.69s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1365/1788 [1:40:41<34:19, 4.87s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1366/1788 [1:40:44<28:30, 4.05s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1367/1788 [1:40:46<24:37, 3.51s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1368/1788 [1:40:48<21:56, 3.13s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1369/1788 [1:40:53<26:22, 3.78s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1370/1788 [1:40:56<23:11, 3.33s/it] {'loss': 0.065, 'grad_norm': 3.064603567123413, 'learning_rate': 1.3020509633312618e-06, 'epoch': 2.3}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1370/1788 [1:40:56<23:11, 3.33s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1371/1788 [1:41:01<26:52, 3.87s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1372/1788 [1:41:04<26:32, 3.83s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1373/1788 [1:41:07<23:22, 3.38s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1374/1788 [1:41:11<24:39, 3.57s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1375/1788 [1:41:16<26:57, 3.92s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1376/1788 [1:41:20<28:37, 4.17s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1377/1788 [1:41:24<27:22, 4.00s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1378/1788 [1:41:26<23:48, 3.48s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1379/1788 [1:41:28<21:13, 3.11s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1380/1788 [1:41:31<19:19, 2.84s/it] {'loss': 0.0455, 'grad_norm': 2.379159688949585, 'learning_rate': 1.270975761342449e-06, 'epoch': 2.32}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1380/1788 [1:41:31<19:19, 2.84s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1381/1788 [1:41:34<20:59, 3.09s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1382/1788 [1:41:39<24:15, 3.59s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1383/1788 [1:41:43<25:30, 3.78s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1384/1788 [1:41:49<28:42, 4.26s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1385/1788 [1:41:53<28:07, 4.19s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1386/1788 [1:41:58<30:34, 4.56s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1387/1788 [1:42:03<30:47, 4.61s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1388/1788 [1:42:05<26:11, 3.93s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1389/1788 [1:42:11<30:06, 4.53s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1390/1788 [1:42:15<29:17, 4.41s/it] {'loss': 0.0629, 'grad_norm': 2.8858001232147217, 'learning_rate': 1.2399005593536358e-06, 'epoch': 2.33}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1390/1788 [1:42:15<29:17, 4.41s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1391/1788 [1:42:17<24:50, 3.75s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1392/1788 [1:42:20<21:46, 3.30s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1393/1788 [1:42:22<19:44, 3.00s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1394/1788 [1:42:26<22:08, 3.37s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1395/1788 [1:42:30<22:01, 3.36s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1396/1788 [1:42:32<19:48, 3.03s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1397/1788 [1:42:34<18:20, 2.82s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1398/1788 [1:42:36<17:17, 2.66s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1399/1788 [1:42:41<21:19, 3.29s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1400/1788 [1:42:43<18:47, 2.91s/it] {'loss': 0.0602, 'grad_norm': 3.2534356117248535, 'learning_rate': 1.208825357364823e-06, 'epoch': 2.35}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1400/1788 [1:42:43<18:47, 2.91s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1401/1788 [1:42:48<22:05, 3.42s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1402/1788 [1:42:52<24:00, 3.73s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1403/1788 [1:43:04<39:34, 6.17s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1404/1788 [1:43:08<34:37, 5.41s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1405/1788 [1:43:10<28:29, 4.46s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1406/1788 [1:43:12<24:13, 3.80s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1407/1788 [1:43:18<27:09, 4.28s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1408/1788 [1:43:23<29:11, 4.61s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1409/1788 [1:43:27<27:57, 4.43s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1410/1788 [1:43:32<28:16, 4.49s/it] {'loss': 0.0617, 'grad_norm': 2.6328916549682617, 'learning_rate': 1.1777501553760099e-06, 'epoch': 2.37}
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1410/1788 [1:43:32<28:16, 4.49s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1411/1788 [1:43:36<28:13, 4.49s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1412/1788 [1:43:39<24:03, 3.84s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1413/1788 [1:43:41<21:08, 3.38s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1414/1788 [1:43:46<24:18, 3.90s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1415/1788 [1:43:50<24:50, 4.00s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1416/1788 [1:43:55<26:05, 4.21s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1417/1788 [1:43:59<26:31, 4.29s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1418/1788 [1:44:04<27:22, 4.44s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1419/1788 [1:44:09<28:39, 4.66s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1420/1788 [1:44:12<24:12, 3.95s/it] {'loss': 0.0541, 'grad_norm': 3.0707194805145264, 'learning_rate': 1.146674953387197e-06, 'epoch': 2.38}
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1420/1788 [1:44:12<24:12, 3.95s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1421/1788 [1:44:17<26:58, 4.41s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1422/1788 [1:44:19<22:53, 3.75s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1423/1788 [1:44:22<20:09, 3.31s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1424/1788 [1:44:24<18:11, 3.00s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1425/1788 [1:44:29<22:13, 3.67s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1426/1788 [1:44:31<19:38, 3.26s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1427/1788 [1:44:34<17:52, 2.97s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1428/1788 [1:44:36<16:26, 2.74s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1429/1788 [1:44:43<23:53, 3.99s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1430/1788 [1:44:47<23:44, 3.98s/it] {'loss': 0.0494, 'grad_norm': 10.068917274475098, 'learning_rate': 1.1155997513983842e-06, 'epoch': 2.4}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1430/1788 [1:44:47<23:44, 3.98s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1431/1788 [1:44:49<21:03, 3.54s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1432/1788 [1:46:10<2:38:00, 26.63s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1433/1788 [1:46:15<1:58:51, 20.09s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1434/1788 [1:46:17<1:27:12, 14.78s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1435/1788 [1:46:22<1:09:53, 11.88s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1436/1788 [1:46:27<56:46, 9.68s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1437/1788 [1:46:29<43:33, 7.45s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1438/1788 [1:46:31<34:20, 5.89s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1439/1788 [1:46:37<33:22, 5.74s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1440/1788 [1:46:39<27:11, 4.69s/it] {'loss': 0.0884, 'grad_norm': 2.2381155490875244, 'learning_rate': 1.0845245494095713e-06, 'epoch': 2.42}
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1440/1788 [1:46:39<27:11, 4.69s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1441/1788 [1:46:41<22:56, 3.97s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1442/1788 [1:46:45<23:19, 4.05s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1443/1788 [1:46:48<20:17, 3.53s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1444/1788 [1:46:53<22:57, 4.00s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1445/1788 [1:46:55<19:56, 3.49s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1446/1788 [1:46:57<17:48, 3.12s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1447/1788 [1:47:02<21:11, 3.73s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1448/1788 [1:47:07<22:22, 3.95s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1449/1788 [1:47:11<21:45, 3.85s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1450/1788 [1:47:13<19:00, 3.37s/it] {'loss': 0.0702, 'grad_norm': 3.9701359272003174, 'learning_rate': 1.0534493474207582e-06, 'epoch': 2.43}
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1450/1788 [1:47:13<19:00, 3.37s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1451/1788 [1:47:15<17:11, 3.06s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1452/1788 [1:47:20<19:25, 3.47s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1453/1788 [1:47:25<22:38, 4.06s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1454/1788 [1:47:29<23:08, 4.16s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1455/1788 [1:47:32<19:58, 3.60s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1456/1788 [1:47:35<19:37, 3.55s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1457/1788 [1:47:37<17:28, 3.17s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1458/1788 [1:47:40<15:59, 2.91s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1459/1788 [1:47:45<19:48, 3.61s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1460/1788 [1:47:50<22:35, 4.13s/it] {'loss': 0.0515, 'grad_norm': 4.6914262771606445, 'learning_rate': 1.0223741454319454e-06, 'epoch': 2.45}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1460/1788 [1:47:50<22:35, 4.13s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1461/1788 [1:47:53<19:31, 3.58s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1462/1788 [1:47:57<21:02, 3.87s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1463/1788 [1:48:01<20:36, 3.80s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1464/1788 [1:48:06<22:54, 4.24s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1465/1788 [1:48:08<19:43, 3.67s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1466/1788 [1:48:11<17:25, 3.25s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1467/1788 [1:48:15<19:41, 3.68s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1468/1788 [1:48:20<20:48, 3.90s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1469/1788 [1:48:22<18:07, 3.41s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1470/1788 [1:48:24<16:16, 3.07s/it] {'loss': 0.071, 'grad_norm': 4.725838661193848, 'learning_rate': 9.912989434431325e-07, 'epoch': 2.47}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1470/1788 [1:48:24<16:16, 3.07s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1471/1788 [1:48:28<16:29, 3.12s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1472/1788 [1:48:30<15:03, 2.86s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1473/1788 [1:48:32<14:05, 2.68s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1474/1788 [1:48:36<15:36, 2.98s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1475/1788 [1:48:38<14:25, 2.76s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1476/1788 [1:48:40<13:34, 2.61s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1477/1788 [1:48:44<15:00, 2.90s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1478/1788 [1:48:46<13:58, 2.71s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1479/1788 [1:48:52<18:54, 3.67s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1480/1788 [1:48:56<19:39, 3.83s/it] {'loss': 0.0597, 'grad_norm': 4.217200756072998, 'learning_rate': 9.602237414543196e-07, 'epoch': 2.48}
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1480/1788 [1:48:56<19:39, 3.83s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1481/1788 [1:49:01<21:32, 4.21s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1482/1788 [1:49:06<22:55, 4.49s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1483/1788 [1:49:11<22:15, 4.38s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1484/1788 [1:49:13<18:57, 3.74s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1485/1788 [1:49:15<16:43, 3.31s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1486/1788 [1:49:21<19:55, 3.96s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1487/1788 [1:49:26<21:49, 4.35s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1488/1788 [1:49:28<18:40, 3.74s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1489/1788 [1:49:32<18:17, 3.67s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1490/1788 [1:49:36<19:37, 3.95s/it] {'loss': 0.0599, 'grad_norm': 4.229434490203857, 'learning_rate': 9.291485394655066e-07, 'epoch': 2.5}
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1490/1788 [1:49:36<19:37, 3.95s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1491/1788 [1:49:38<17:00, 3.44s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1492/1788 [1:49:41<15:17, 3.10s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1493/1788 [1:49:46<17:39, 3.59s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1494/1788 [1:49:48<15:41, 3.20s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1495/1788 [1:49:53<18:25, 3.77s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1496/1788 [1:49:55<16:08, 3.32s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1497/1788 [1:49:59<16:19, 3.37s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1498/1788 [1:50:03<17:48, 3.68s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1499/1788 [1:50:08<18:51, 3.92s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1500/1788 [1:50:11<18:18, 3.81s/it] {'loss': 0.0568, 'grad_norm': 4.094692230224609, 'learning_rate': 8.980733374766937e-07, 'epoch': 2.52}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1500/1788 [1:50:11<18:18, 3.81s/it]05/29/2024 02:38:02 - INFO - sentence_transformers.SentenceTransformer - Save model to new_mode/checkpoint-1500
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1501/1788 [1:50:27<35:50, 7.49s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1502/1788 [1:50:31<29:50, 6.26s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1503/1788 [1:50:36<28:26, 5.99s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1504/1788 [1:50:38<23:04, 4.87s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1505/1788 [1:50:41<19:21, 4.10s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1506/1788 [1:50:46<20:56, 4.46s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1507/1788 [1:50:49<19:26, 4.15s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1508/1788 [1:50:52<17:06, 3.67s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1509/1788 [1:50:55<16:26, 3.53s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1510/1788 [1:51:00<18:48, 4.06s/it] {'loss': 0.074, 'grad_norm': 3.6952600479125977, 'learning_rate': 8.669981354878807e-07, 'epoch': 2.53}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1510/1788 [1:51:00<18:48, 4.06s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1511/1788 [1:51:03<16:19, 3.54s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1512/1788 [1:51:06<15:53, 3.46s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1513/1788 [1:51:08<14:17, 3.12s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1514/1788 [1:51:10<13:04, 2.86s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1515/1788 [1:51:14<13:40, 3.00s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1516/1788 [1:51:16<12:40, 2.80s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1517/1788 [1:51:18<11:57, 2.65s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1518/1788 [1:51:24<15:39, 3.48s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1519/1788 [1:51:29<17:14, 3.84s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1520/1788 [1:51:34<19:56, 4.47s/it] {'loss': 0.0646, 'grad_norm': 6.393181800842285, 'learning_rate': 8.359229334990678e-07, 'epoch': 2.55}
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1520/1788 [1:51:34<19:56, 4.47s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1521/1788 [1:51:37<16:54, 3.80s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1522/1788 [1:51:42<18:57, 4.28s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1523/1788 [1:51:46<17:51, 4.04s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1524/1788 [1:51:48<15:36, 3.55s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1525/1788 [1:51:51<15:08, 3.45s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1526/1788 [1:51:53<13:28, 3.09s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1527/1788 [1:51:59<16:26, 3.78s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1528/1788 [1:52:04<18:07, 4.18s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1529/1788 [1:52:06<15:36, 3.62s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1530/1788 [1:52:09<13:50, 3.22s/it] {'loss': 0.0606, 'grad_norm': 5.374364376068115, 'learning_rate': 8.048477315102549e-07, 'epoch': 2.57}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1530/1788 [1:52:09<13:50, 3.22s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1531/1788 [1:52:17<20:03, 4.68s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1532/1788 [1:52:19<16:57, 3.97s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1533/1788 [1:52:23<16:54, 3.98s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1534/1788 [1:52:28<18:36, 4.40s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1535/1788 [1:52:33<19:29, 4.62s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1536/1788 [1:52:37<18:10, 4.33s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1537/1788 [1:52:39<15:35, 3.73s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1538/1788 [1:52:45<17:24, 4.18s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1539/1788 [1:52:50<18:45, 4.52s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1540/1788 [1:52:52<15:58, 3.86s/it] {'loss': 0.0572, 'grad_norm': 2.1056995391845703, 'learning_rate': 7.737725295214419e-07, 'epoch': 2.58}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1540/1788 [1:52:52<15:58, 3.86s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1541/1788 [1:52:55<14:02, 3.41s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1542/1788 [1:53:01<17:04, 4.16s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1543/1788 [1:53:04<16:13, 3.97s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1544/1788 [1:53:08<16:03, 3.95s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1545/1788 [1:53:10<14:00, 3.46s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1546/1788 [1:53:15<15:59, 3.96s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1547/1788 [1:53:21<17:22, 4.33s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1548/1788 [1:53:23<14:47, 3.70s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1549/1788 [1:53:28<16:32, 4.15s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1550/1788 [1:53:33<17:55, 4.52s/it] {'loss': 0.0649, 'grad_norm': 4.441219329833984, 'learning_rate': 7.426973275326291e-07, 'epoch': 2.6}
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1550/1788 [1:53:33<17:55, 4.52s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1551/1788 [1:53:39<18:43, 4.74s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1552/1788 [1:53:44<19:13, 4.89s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1553/1788 [1:53:49<19:30, 4.98s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1554/1788 [1:53:54<19:01, 4.88s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1555/1788 [1:53:56<15:56, 4.10s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1556/1788 [1:54:01<17:19, 4.48s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1557/1788 [1:54:04<14:36, 3.80s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1558/1788 [1:54:06<12:43, 3.32s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1559/1788 [1:54:08<11:28, 3.01s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1560/1788 [1:54:12<12:47, 3.37s/it] {'loss': 0.053, 'grad_norm': 2.090217113494873, 'learning_rate': 7.11622125543816e-07, 'epoch': 2.62}
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1560/1788 [1:54:12<12:47, 3.37s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1561/1788 [1:54:18<14:46, 3.91s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1562/1788 [1:54:22<14:55, 3.96s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1563/1788 [1:54:24<12:56, 3.45s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1564/1788 [1:54:29<14:50, 3.97s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1565/1788 [1:54:34<16:05, 4.33s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1566/1788 [1:54:37<13:46, 3.72s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1567/1788 [1:54:39<12:04, 3.28s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1568/1788 [1:54:41<10:53, 2.97s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1569/1788 [1:54:45<12:21, 3.39s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1570/1788 [1:54:51<14:34, 4.01s/it] {'loss': 0.052, 'grad_norm': 1.7646933794021606, 'learning_rate': 6.805469235550031e-07, 'epoch': 2.63}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1570/1788 [1:54:51<14:34, 4.01s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1571/1788 [1:54:57<16:38, 4.60s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1572/1788 [1:54:59<14:04, 3.91s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1573/1788 [1:55:03<13:43, 3.83s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1574/1788 [1:55:08<14:41, 4.12s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1575/1788 [1:55:13<15:47, 4.45s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1576/1788 [1:55:18<16:28, 4.66s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1577/1788 [1:55:23<17:05, 4.86s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1578/1788 [1:55:26<14:18, 4.09s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1579/1788 [1:55:28<12:21, 3.55s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1580/1788 [1:55:32<13:15, 3.82s/it] {'loss': 0.0506, 'grad_norm': 4.3889241218566895, 'learning_rate': 6.494717215661903e-07, 'epoch': 2.65}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1580/1788 [1:55:32<13:15, 3.82s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1581/1788 [1:55:35<11:37, 3.37s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1582/1788 [1:55:37<10:28, 3.05s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1583/1788 [1:55:39<09:40, 2.83s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1584/1788 [1:55:44<12:04, 3.55s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1585/1788 [1:55:49<13:09, 3.89s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1586/1788 [1:55:54<14:32, 4.32s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1587/1788 [1:55:57<12:21, 3.69s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1588/1788 [1:56:02<14:10, 4.25s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1589/1788 [1:56:05<12:09, 3.66s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1590/1788 [1:56:07<10:48, 3.28s/it] {'loss': 0.0653, 'grad_norm': 3.8383901119232178, 'learning_rate': 6.183965195773773e-07, 'epoch': 2.67}
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1590/1788 [1:56:07<10:48, 3.28s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1591/1788 [1:56:12<12:50, 3.91s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1592/1788 [1:56:17<13:34, 4.16s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1593/1788 [1:56:23<14:54, 4.59s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1594/1788 [1:56:28<16:05, 4.97s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1595/1788 [1:56:33<15:39, 4.87s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1596/1788 [1:56:45<22:18, 6.97s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1597/1788 [1:56:47<17:47, 5.59s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1598/1788 [1:56:51<15:43, 4.96s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1599/1788 [1:56:56<15:50, 5.03s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1600/1788 [1:57:01<15:15, 4.87s/it] {'loss': 0.0782, 'grad_norm': 4.66818904876709, 'learning_rate': 5.873213175885645e-07, 'epoch': 2.68}
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1600/1788 [1:57:01<15:15, 4.87s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1601/1788 [1:57:03<12:46, 4.10s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1602/1788 [1:57:08<13:40, 4.41s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1603/1788 [1:57:14<15:02, 4.88s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1604/1788 [1:57:19<15:29, 5.05s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1605/1788 [1:57:23<13:56, 4.57s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1606/1788 [1:57:26<12:58, 4.28s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1607/1788 [1:57:30<11:55, 3.95s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1608/1788 [1:57:35<12:57, 4.32s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1609/1788 [1:57:39<12:28, 4.18s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1610/1788 [1:57:41<10:38, 3.59s/it] {'loss': 0.0484, 'grad_norm': 5.30470609664917, 'learning_rate': 5.562461155997515e-07, 'epoch': 2.7}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1610/1788 [1:57:41<10:38, 3.59s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1611/1788 [1:57:43<09:27, 3.21s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1612/1788 [1:57:48<10:44, 3.66s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1613/1788 [1:57:50<09:28, 3.25s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1614/1788 [1:57:52<08:35, 2.96s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1615/1788 [1:57:58<10:42, 3.71s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1616/1788 [1:58:04<12:36, 4.40s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1617/1788 [1:58:06<10:44, 3.77s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1618/1788 [1:58:11<11:29, 4.06s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1619/1788 [1:58:13<09:57, 3.54s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1620/1788 [1:58:19<11:37, 4.15s/it] {'loss': 0.0491, 'grad_norm': 4.228532314300537, 'learning_rate': 5.251709136109385e-07, 'epoch': 2.72}
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1620/1788 [1:58:19<11:37, 4.15s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1621/1788 [1:58:21<10:01, 3.60s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1622/1788 [1:58:26<11:13, 4.06s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1623/1788 [1:58:30<10:53, 3.96s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1624/1788 [1:58:36<12:29, 4.57s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1625/1788 [1:58:41<12:33, 4.62s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1626/1788 [1:58:44<11:41, 4.33s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1627/1788 [1:58:47<09:58, 3.72s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1628/1788 [1:58:51<10:02, 3.77s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1629/1788 [1:58:56<11:06, 4.19s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1630/1788 [1:59:01<11:28, 4.36s/it] {'loss': 0.0447, 'grad_norm': 9.895421028137207, 'learning_rate': 4.940957116221255e-07, 'epoch': 2.73}
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1630/1788 [1:59:01<11:28, 4.36s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1631/1788 [1:59:06<12:14, 4.68s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1632/1788 [1:59:08<10:15, 3.94s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1633/1788 [1:59:10<08:48, 3.41s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1634/1788 [1:59:13<07:51, 3.06s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1635/1788 [1:59:16<08:18, 3.26s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1636/1788 [1:59:19<07:33, 2.98s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1637/1788 [1:59:22<08:01, 3.19s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1638/1788 [1:59:27<09:03, 3.63s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1639/1788 [1:59:29<08:00, 3.22s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1640/1788 [1:59:34<09:23, 3.81s/it] {'loss': 0.0534, 'grad_norm': 5.228079319000244, 'learning_rate': 4.6302050963331263e-07, 'epoch': 2.75}
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1640/1788 [1:59:34<09:23, 3.81s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1641/1788 [1:59:37<08:13, 3.35s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1642/1788 [1:59:41<08:28, 3.48s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1643/1788 [2:01:01<1:04:11, 26.56s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1644/1788 [2:01:06<48:22, 20.15s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1645/1788 [2:01:08<35:17, 14.81s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1646/1788 [2:01:14<28:21, 11.98s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1647/1788 [2:01:17<22:15, 9.47s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1648/1788 [2:01:20<17:04, 7.32s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1649/1788 [2:01:24<14:45, 6.37s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1650/1788 [2:01:28<13:23, 5.82s/it] {'loss': 0.0797, 'grad_norm': 3.3605363368988037, 'learning_rate': 4.319453076444997e-07, 'epoch': 2.77}
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1650/1788 [2:01:28<13:23, 5.82s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1651/1788 [2:01:34<13:21, 5.85s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1652/1788 [2:01:39<12:15, 5.41s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1653/1788 [2:01:45<12:29, 5.55s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1654/1788 [2:01:47<10:10, 4.55s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1655/1788 [2:01:49<08:36, 3.88s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1656/1788 [2:01:54<09:00, 4.09s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1657/1788 [2:01:56<07:46, 3.56s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1658/1788 [2:01:58<06:52, 3.18s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1659/1788 [2:02:04<08:07, 3.78s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1660/1788 [2:02:09<09:23, 4.40s/it] {'loss': 0.0557, 'grad_norm': 3.7053415775299072, 'learning_rate': 4.008701056556868e-07, 'epoch': 2.79}
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1660/1788 [2:02:09<09:23, 4.40s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1661/1788 [2:02:12<07:59, 3.77s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1662/1788 [2:02:15<07:49, 3.73s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1663/1788 [2:02:18<06:50, 3.28s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1664/1788 [2:02:20<06:10, 2.99s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1665/1788 [2:02:25<07:33, 3.69s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1666/1788 [2:02:29<07:39, 3.77s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1667/1788 [2:02:31<06:42, 3.32s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1668/1788 [2:02:34<06:02, 3.02s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1669/1788 [2:02:36<05:32, 2.79s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1670/1788 [2:02:38<05:12, 2.65s/it] {'loss': 0.0604, 'grad_norm': 3.3942348957061768, 'learning_rate': 3.697949036668739e-07, 'epoch': 2.8}
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1670/1788 [2:02:38<05:12, 2.65s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1671/1788 [2:02:50<10:35, 5.43s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1672/1788 [2:02:54<09:41, 5.01s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1673/1788 [2:03:00<10:04, 5.25s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1674/1788 [2:03:02<08:15, 4.35s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1675/1788 [2:03:05<07:01, 3.73s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1676/1788 [2:03:09<07:08, 3.83s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1677/1788 [2:03:13<07:32, 4.08s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1678/1788 [2:03:16<06:31, 3.56s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1679/1788 [2:03:22<07:48, 4.29s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1680/1788 [2:03:24<06:39, 3.70s/it] {'loss': 0.0557, 'grad_norm': 2.8816325664520264, 'learning_rate': 3.387197016780609e-07, 'epoch': 2.82}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1680/1788 [2:03:24<06:39, 3.70s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1681/1788 [2:03:29<07:26, 4.17s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1682/1788 [2:03:32<06:22, 3.61s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1683/1788 [2:03:36<06:55, 3.96s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1684/1788 [2:03:40<06:39, 3.84s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1685/1788 [2:03:45<07:11, 4.19s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1686/1788 [2:03:47<06:06, 3.59s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1687/1788 [2:03:49<05:23, 3.20s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1688/1788 [2:03:54<06:09, 3.70s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1689/1788 [2:04:00<07:03, 4.28s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1690/1788 [2:04:05<07:37, 4.67s/it] {'loss': 0.0574, 'grad_norm': 4.5696892738342285, 'learning_rate': 3.07644499689248e-07, 'epoch': 2.84}
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1690/1788 [2:04:05<07:37, 4.67s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1691/1788 [2:04:08<06:24, 3.97s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1692/1788 [2:04:12<06:28, 4.05s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1693/1788 [2:04:17<06:44, 4.25s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1694/1788 [2:04:19<05:44, 3.67s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1695/1788 [2:04:21<05:01, 3.25s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1696/1788 [2:04:24<04:32, 2.96s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1697/1788 [2:04:26<04:11, 2.77s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1698/1788 [2:04:28<03:56, 2.63s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1699/1788 [2:04:32<04:24, 2.97s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1700/1788 [2:04:35<04:32, 3.10s/it] {'loss': 0.0413, 'grad_norm': 3.842710256576538, 'learning_rate': 2.765692977004351e-07, 'epoch': 2.85}
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1700/1788 [2:04:35<04:32, 3.10s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1701/1788 [2:04:40<05:05, 3.51s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1702/1788 [2:04:45<05:49, 4.06s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1703/1788 [2:04:50<06:08, 4.33s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1704/1788 [2:04:53<05:12, 3.73s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1705/1788 [2:05:05<08:36, 6.23s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1706/1788 [2:05:08<07:27, 5.46s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1707/1788 [2:05:13<07:11, 5.32s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1708/1788 [2:05:18<06:44, 5.05s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1709/1788 [2:05:20<05:35, 4.25s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1710/1788 [2:05:25<05:53, 4.53s/it] {'loss': 0.0479, 'grad_norm': 3.4598591327667236, 'learning_rate': 2.454940957116221e-07, 'epoch': 2.87}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1710/1788 [2:05:25<05:53, 4.53s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1711/1788 [2:05:29<05:33, 4.33s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1712/1788 [2:05:31<04:42, 3.72s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1713/1788 [2:05:43<07:44, 6.19s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1714/1788 [2:05:49<07:24, 6.01s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1715/1788 [2:05:55<07:18, 6.00s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1716/1788 [2:05:59<06:19, 5.27s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1717/1788 [2:06:01<05:09, 4.36s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1718/1788 [2:06:06<05:19, 4.56s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1719/1788 [2:06:08<04:27, 3.88s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1720/1788 [2:06:10<03:51, 3.41s/it] {'loss': 0.0541, 'grad_norm': 3.019392728805542, 'learning_rate': 2.1441889372280923e-07, 'epoch': 2.89}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1720/1788 [2:06:10<03:51, 3.41s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1721/1788 [2:06:16<04:28, 4.00s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1722/1788 [2:06:21<04:45, 4.33s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1723/1788 [2:06:23<04:01, 3.72s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1724/1788 [2:06:28<04:15, 3.99s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1725/1788 [2:06:32<04:24, 4.19s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1726/1788 [2:06:35<03:44, 3.63s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1727/1788 [2:06:40<04:09, 4.09s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1728/1788 [2:08:00<26:57, 26.97s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1729/1788 [2:08:05<19:58, 20.31s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1730/1788 [2:08:09<14:52, 15.39s/it] {'loss': 0.0519, 'grad_norm': 4.948524475097656, 'learning_rate': 1.833436917339963e-07, 'epoch': 2.9}
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1730/1788 [2:08:09<14:52, 15.39s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1731/1788 [2:08:14<11:34, 12.18s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1732/1788 [2:08:16<08:36, 9.23s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1733/1788 [2:08:21<07:24, 8.09s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1734/1788 [2:08:27<06:32, 7.27s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1735/1788 [2:08:29<05:06, 5.78s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1736/1788 [2:08:33<04:26, 5.13s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1737/1788 [2:08:38<04:26, 5.23s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1738/1788 [2:08:40<03:38, 4.36s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1739/1788 [2:08:44<03:23, 4.15s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1740/1788 [2:08:46<02:52, 3.59s/it] {'loss': 0.0597, 'grad_norm': 2.3276920318603516, 'learning_rate': 1.5226848974518335e-07, 'epoch': 2.92}
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1740/1788 [2:08:46<02:52, 3.59s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1741/1788 [2:08:51<03:03, 3.91s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1742/1788 [2:08:53<02:37, 3.43s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1743/1788 [2:08:57<02:36, 3.47s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1744/1788 [2:09:00<02:27, 3.34s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1745/1788 [2:09:06<02:57, 4.12s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1746/1788 [2:09:08<02:30, 3.58s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1747/1788 [2:09:20<04:08, 6.07s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1748/1788 [2:09:26<03:56, 5.92s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1749/1788 [2:09:28<03:08, 4.82s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1750/1788 [2:09:33<03:10, 5.00s/it] {'loss': 0.0529, 'grad_norm': 1.118607759475708, 'learning_rate': 1.2119328775637043e-07, 'epoch': 2.94}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1750/1788 [2:09:33<03:10, 5.00s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1751/1788 [2:09:36<02:34, 4.18s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1752/1788 [2:09:41<02:46, 4.62s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1753/1788 [2:09:46<02:47, 4.79s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1754/1788 [2:09:49<02:17, 4.03s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1755/1788 [2:09:54<02:23, 4.36s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1756/1788 [2:10:00<02:34, 4.82s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1757/1788 [2:10:02<02:06, 4.07s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1758/1788 [2:10:04<01:45, 3.52s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1759/1788 [2:10:10<02:00, 4.17s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1760/1788 [2:10:15<02:02, 4.37s/it] {'loss': 0.0535, 'grad_norm': 2.108508586883545, 'learning_rate': 9.011808576755749e-08, 'epoch': 2.95}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1760/1788 [2:10:15<02:02, 4.37s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1761/1788 [2:10:19<01:53, 4.19s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1762/1788 [2:10:23<01:52, 4.34s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1763/1788 [2:10:26<01:33, 3.72s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1764/1788 [2:10:28<01:19, 3.29s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1765/1788 [2:10:32<01:21, 3.53s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1766/1788 [2:10:34<01:09, 3.16s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1767/1788 [2:10:37<01:00, 2.90s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1768/1788 [2:10:42<01:11, 3.59s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1769/1788 [2:10:47<01:17, 4.09s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1770/1788 [2:10:52<01:20, 4.47s/it] {'loss': 0.0561, 'grad_norm': 5.242223739624023, 'learning_rate': 5.904288377874457e-08, 'epoch': 2.97}
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1770/1788 [2:10:52<01:20, 4.47s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1771/1788 [2:10:56<01:13, 4.34s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1772/1788 [2:11:02<01:17, 4.81s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1773/1788 [2:11:08<01:14, 4.97s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1774/1788 [2:11:12<01:05, 4.66s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1775/1788 [2:11:16<01:00, 4.62s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1776/1788 [2:11:18<00:47, 3.92s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1777/1788 [2:11:21<00:37, 3.43s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1778/1788 [2:11:23<00:30, 3.08s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1779/1788 [2:11:25<00:25, 2.84s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1780/1788 [2:11:28<00:21, 2.67s/it] {'loss': 0.0456, 'grad_norm': 2.8926992416381836, 'learning_rate': 2.7967681789931638e-08, 'epoch': 2.99}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1780/1788 [2:11:28<00:21, 2.67s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1781/1788 [2:11:31<00:20, 2.96s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1782/1788 [2:11:36<00:20, 3.43s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1783/1788 [2:11:38<00:15, 3.08s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1784/1788 [2:11:40<00:11, 2.85s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1785/1788 [2:11:46<00:11, 3.68s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1786/1788 [2:11:49<00:07, 3.57s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1787/1788 [2:11:52<00:03, 3.19s/it]---------------------------*Rank 1: refresh data---------------------------
---------------------------*Rank 6: refresh data---------------------------
---------------------------*Rank 2: refresh data---------------------------
---------------------------*Rank 7: refresh data---------------------------
---------------------------*Rank 4: refresh data---------------------------
---------------------------*Rank 3: refresh data---------------------------
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1788/1788 [2:11:53<00:00, 2.80s/it]---------------------------*Rank 0: refresh data---------------------------
---------------------------*Rank 5: refresh data---------------------------
{'train_runtime': 7919.4526, 'train_samples_per_second': 1.806, 'train_steps_per_second': 0.226, 'train_loss': 0.08343580933558595, 'epoch': 3.0}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1788/1788 [2:11:53<00:00, 2.80s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1788/1788 [2:11:53<00:00, 4.43s/it]
[2024-05-29 02:59:40,890] [INFO] [launch.py:351:main] Process 16385 exits successfully.
[2024-05-29 02:59:40,896] [INFO] [launch.py:351:main] Process 16386 exits successfully.
[2024-05-29 02:59:40,897] [INFO] [launch.py:351:main] Process 16383 exits successfully.
[2024-05-29 02:59:40,897] [INFO] [launch.py:351:main] Process 16384 exits successfully.
[2024-05-29 02:59:41,898] [INFO] [launch.py:351:main] Process 16382 exits successfully.
05/29/2024 02:59:44 - INFO - sentence_transformers.SentenceTransformer - Save model to new_mode
[2024-05-29 02:59:44,902] [INFO] [launch.py:351:main] Process 16387 exits successfully.
[2024-05-29 02:59:44,902] [INFO] [launch.py:351:main] Process 16381 exits successfully.
wandb: - 0.012 MB of 0.012 MB uploaded wandb: \ 0.012 MB of 0.046 MB uploaded wandb: | 0.037 MB of 0.046 MB uploaded wandb: / 0.037 MB of 0.046 MB uploaded wandb: - 0.037 MB of 0.046 MB uploaded wandb: \ 0.037 MB of 0.046 MB uploaded wandb: | 0.037 MB of 0.046 MB uploaded wandb: / 0.037 MB of 0.046 MB uploaded wandb: - 0.037 MB of 0.046 MB uploaded wandb: \ 0.037 MB of 0.046 MB uploaded wandb: | 0.037 MB of 0.046 MB uploaded wandb: / 0.037 MB of 0.046 MB uploaded wandb: - 0.037 MB of 0.046 MB uploaded wandb: \ 0.037 MB of 0.046 MB uploaded wandb: | 0.037 MB of 0.046 MB uploaded wandb: / 0.037 MB of 0.046 MB uploaded wandb: - 0.037 MB of 0.046 MB uploaded wandb: \ 0.037 MB of 0.046 MB uploaded wandb: | 0.037 MB of 0.046 MB uploaded wandb: / 0.037 MB of 0.046 MB uploaded wandb: - 0.037 MB of 0.046 MB uploaded wandb: \ 0.037 MB of 0.046 MB uploaded wandb: | 0.037 MB of 0.046 MB uploaded wandb:
wandb: Run history:
wandb: train/epoch β–β–β–β–β–‚β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–„β–„β–„β–„β–„β–…β–…β–…β–…β–…β–…β–†β–†β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆ
wandb: train/global_step β–β–β–β–β–‚β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–„β–„β–„β–„β–„β–…β–…β–…β–…β–…β–…β–†β–†β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆ
wandb: train/grad_norm β–ƒβ–‚β–ƒβ–ƒβ–ƒβ–‚β–β–β–„β–β–‚β–‚β–‚β–β–β–‚β–β–‚β–β–„β–…β–β–‚β–‚β–β–‚β–β–ˆβ–‚β–β–β–β–‚β–β–β–β–‚β–β–β–
wandb: train/learning_rate β–…β–‡β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–‡β–‡β–‡β–†β–†β–†β–†β–†β–…β–…β–…β–…β–…β–„β–„β–„β–„β–„β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–β–β–
wandb: train/loss β–ˆβ–…β–„β–„β–„β–‚β–ƒβ–‚β–ƒβ–β–ƒβ–‚β–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–β–‚β–‚β–β–‚β–β–β–β–β–β–β–β–β–β–β–
wandb:
wandb: Run summary:
wandb: total_flos 0.0
wandb: train/epoch 3.0
wandb: train/global_step 1788
wandb: train/grad_norm 2.8927
wandb: train/learning_rate 0.0
wandb: train/loss 0.0456
wandb: train_loss 0.08344
wandb: train_runtime 7919.4526
wandb: train_samples_per_second 1.806
wandb: train_steps_per_second 0.226
wandb:
wandb: πŸš€ View run new_mode at: https://wandb.ai/dangfutures/huggingface/runs/5f9gceeo
wandb: ⭐️ View project at: https://wandb.ai/dangfutures/huggingface
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20240529_004739-5f9gceeo/logs
[2024-05-29 03:00:16,935] [INFO] [launch.py:351:main] Process 16380 exits successfully.