Skip to content
This repository was archived by the owner on May 28, 2024. It is now read-only.
This repository was archived by the owner on May 28, 2024. It is now read-only.

Skipping cancelled dequeue attempt with queue not closed #36

@Liang-ZX

Description

@Liang-ZX
  1. ERROR LOG (first epoch)
    [1210 18:09:10 @param.py:158] [HyperParamSetter] At global_step=0, learning_rate is set to 0.001000
    [1210 18:09:11 @prof.py:294] [HostMemoryTracker] Free RAM in before_train() is 238.12 GB.
    [1210 18:09:11 @stac_helper.py:83] ----------------------------------------------------------------------------------------------------
    [1210 18:09:11 @stac_helper.py:84] Model save path: result/VOC2007/instances_trainval
    [1210 18:09:11 @stac_helper.py:85] ----------------------------------------------------------------------------------------------------
    [1210 18:09:11 @eval.py:313] [EvalCallback] Will evaluate every 20 epochs
    [1210 18:09:28 @base.py:273] Start Epoch 1 ...
    0%| |0/500[00:00<?,?it/s]2021-12-10 18:09:43.544891: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
    2021-12-10 18:10:23.596973: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
    0%| |0/500[02:46<?,?it/s]
    2021-12-10 18:12:16.766932: W tensorflow/core/kernels/queue_base.cc:277] _0_QueueInput/input_queue: Skipping cancelled enqueue attempt with queue not closed
    Traceback (most recent call last):
    File "/mnt/lustre/liangzhixuan/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
    File "/mnt/lustre/liangzhixuan/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
    File "/mnt/lustre/liangzhixuan/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
    tensorflow.python.framework.errors_impl.DeadlineExceededError: Timed out waiting for notification

  2. Environment Information:


sys.platform linux
Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) [GCC 7.3.0]
Tensorpack v0.9.8-61-g4ac2e22b-dirty
Numpy 1.16.4
TensorFlow 1.14.0/v1.14.0-rc1-22-gaf24dc91b5
TF Compiler Version 4.8.5
TF CUDA support True
TF MKL support False
TF XLA support False
Nvidia Driver /usr/lib64/libnvidia-ml.so.460.73.01
CUDA /mnt/lustre/share/cuda-10.0/lib64/libcudart.so.10.0.130
CUDNN /mnt/lustre/share/cuda-10.0/lib64/libcudnn.so.7.4.1
NCCL
CUDA_VISIBLE_DEVICES 1,2,3,4
GPU 0,1,2,3,4,5,6,7 Tesla V100-SXM2-32GB
Free RAM 344.40/376.39 GB
CPU Count 48
cv2 4.1.1
msgpack 1.0.3
python-prctl False


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions