You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2024-08-08 01:14:33.266603: W external/xla/xla/service/gpu/nvptx_compiler.cc:836] The NVIDIA driver's CUDA version is 12.2 which is older than the PTX compiler version (12.5.82). Because the driver is older than the PTX compiler version, XLA is disabling parallel compilation, which may slow down compilation. You should update your NVIDIA driver or use the NVIDIA-provided CUDA forward compatibility packages.
I0808 01:14:37.952140 140605538547520 app.py:92] JAX host: 0 / 1
I0808 01:14:37.952368 140605538547520 app.py:93] JAX devices: [CudaDevice(id=0), CudaDevice(id=1), CudaDevice(id=2), CudaDevice(id=3), CudaDevice(id=4), CudaDevice(id=5), CudaDevice(id=6), CudaDevice(id=7)]
I0808 01:14:37.952456 140605538547520 local.py:45] Setting task status: host_id: 0, host_count: 1
I0808 01:14:37.952512 140605538547520 local.py:50] Created artifact Workdir of type ArtifactType.DIRECTORY and value /tmp/training.
I0808 01:14:37.954501 140605538547520 app.py:104] RNG: [0 0]
I0808 01:14:38.603692 140605538547520 checkpoints.py:1101] Found no checkpoint files in /tmp/training with prefix checkpoint_
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
I0808 01:14:38.604115 140605538547520 train_utils.py:380] device_count: 8
I0808 01:14:38.604308 140605538547520 train_utils.py:381] num_hosts : 1
I0808 01:14:38.604445 140605538547520 train_utils.py:382] host_id : 0
I0808 01:14:38.605386 140605538547520 train_utils.py:405] local_batch_size : 256
I0808 01:14:38.605548 140605538547520 train_utils.py:406] device_batch_size : 32
2024-08-08 01:14:38.973571: W external/local_tsl/tsl/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "NOT_FOUND: Could not locate the credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata.google.internal".
2024-08-08 01:15:39.983812: E external/local_tsl/tsl/platform/cloud/curl_http_request.cc:610] The transmission of request 0xdc1a0d0 (URI: https://www.googleapis.com/storage/v1/b/tfds-data/o/dataset_info%2Flvis%2F1.3.0?fields=size%2Cgeneration%2Cupdated) has been stuck at 0 of 0 bytes for 61 seconds and will be aborted. CURL timing information: lookup time: 0.010952 (No error), connect time: 0 (No error), pre-transfer time: 0 (No error), start-transfer time: 0 (No error)
The text was updated successfully, but these errors were encountered:
lxyzler
changed the title
Unable to connect to Google during training, unable to retrieve data. How can data be stored locally and read in
owl_vit training problem.Unable to connect to Google during training, unable to retrieve data. How can data be stored locally and read in
Aug 8, 2024
python -m scenic.projects.owl_vit.main --alsologtostderr=true --workdir=/tmp/training --config=scenic/projects/owl_vit/configs/clip_b32_finetune.py
2024-08-08 01:14:33.266603: W external/xla/xla/service/gpu/nvptx_compiler.cc:836] The NVIDIA driver's CUDA version is 12.2 which is older than the PTX compiler version (12.5.82). Because the driver is older than the PTX compiler version, XLA is disabling parallel compilation, which may slow down compilation. You should update your NVIDIA driver or use the NVIDIA-provided CUDA forward compatibility packages.
I0808 01:14:37.952140 140605538547520 app.py:92] JAX host: 0 / 1
I0808 01:14:37.952368 140605538547520 app.py:93] JAX devices: [CudaDevice(id=0), CudaDevice(id=1), CudaDevice(id=2), CudaDevice(id=3), CudaDevice(id=4), CudaDevice(id=5), CudaDevice(id=6), CudaDevice(id=7)]
I0808 01:14:37.952456 140605538547520 local.py:45] Setting task status: host_id: 0, host_count: 1
I0808 01:14:37.952512 140605538547520 local.py:50] Created artifact Workdir of type ArtifactType.DIRECTORY and value /tmp/training.
I0808 01:14:37.954501 140605538547520 app.py:104] RNG: [0 0]
I0808 01:14:38.603692 140605538547520 checkpoints.py:1101] Found no checkpoint files in /tmp/training with prefix checkpoint_
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
I0808 01:14:38.604115 140605538547520 train_utils.py:380] device_count: 8
I0808 01:14:38.604308 140605538547520 train_utils.py:381] num_hosts : 1
I0808 01:14:38.604445 140605538547520 train_utils.py:382] host_id : 0
I0808 01:14:38.605386 140605538547520 train_utils.py:405] local_batch_size : 256
I0808 01:14:38.605548 140605538547520 train_utils.py:406] device_batch_size : 32
2024-08-08 01:14:38.973571: W external/local_tsl/tsl/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "NOT_FOUND: Could not locate the credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata.google.internal".
2024-08-08 01:15:39.983812: E external/local_tsl/tsl/platform/cloud/curl_http_request.cc:610] The transmission of request 0xdc1a0d0 (URI: https://www.googleapis.com/storage/v1/b/tfds-data/o/dataset_info%2Flvis%2F1.3.0?fields=size%2Cgeneration%2Cupdated) has been stuck at 0 of 0 bytes for 61 seconds and will be aborted. CURL timing information: lookup time: 0.010952 (No error), connect time: 0 (No error), pre-transfer time: 0 (No error), start-transfer time: 0 (No error)
The text was updated successfully, but these errors were encountered: