Skip to content

how to train segmentation in win10 #19

@jsong0041

Description

@jsong0041

Dear author, I have a question: how to train segmentation in win10?
I used the "python train.py configs/fpn_crossformer_s_ade20k_40k.py --cfg-options pretrained/backbone-corssformer-s.pth --work-dir output --launcher pytorch" but got an error msg as follows:

Traceback (most recent call last):
File "train.py", line 152, in
main()
File "train.py", line 65, in main
args = parse_args()
File "train.py", line 57, in parse_args
args = parser.parse_args()
File "C:\Python37\lib\argparse.py", line 1755, in parse_args
args, argv = self.parse_known_args(args, namespace)
File "C:\Python37\lib\argparse.py", line 1787, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
File "C:\Python37\lib\argparse.py", line 1993, in _parse_known_args
start_index = consume_optional(start_index)
File "C:\Python37\lib\argparse.py", line 1933, in consume_optional
take_action(action, args, option_string)
File "C:\Python37\lib\argparse.py", line 1861, in take_action
action(self, namespace, argument_values, option_string)
File "C:\Python37\lib\site-packages\mmcv\utils\config.py", line 739, in call
key, val = kv.split('=', maxsplit=1)
ValueError: not enough values to unpack (expected 2, got 1)

and I also tried to use ur shell (dist_train.sh) directly, but also got an error as

$ /bin/sh E:/project_c/crossformer-debug/segmentation/dist_train.sh
NOTE: Redirects are currently not supported in Windows or MacOs.
C:\Python37\lib\site-packages\torch\distributed\launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects --local_rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions

FutureWarning,
Traceback (most recent call last):
File "C:\Python37\lib\site-packages\torch\distributed\run.py", line 564, in determine_local_world_size
return int(nproc_per_node)
ValueError: invalid literal for int() with base 10: ''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Python37\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "C:\Python37\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Python37\lib\site-packages\torch\distributed\launch.py", line 193, in
main()
File "C:\Python37\lib\site-packages\torch\distributed\launch.py", line 189, in main
launch(args)
File "C:\Python37\lib\site-packages\torch\distributed\launch.py", line 174, in launch
run(args)
File "C:\Python37\lib\site-packages\torch\distributed\run.py", line 709, in run
config, cmd, cmd_args = config_from_args(args)
File "C:\Python37\lib\site-packages\torch\distributed\run.py", line 617, in config_from_args
nproc_per_node = determine_local_world_size(args.nproc_per_node)
File "C:\Python37\lib\site-packages\torch\distributed\run.py", line 582, in determine_local_world_size
raise ValueError(f"Unsupported nproc_per_node value: {nproc_per_node}")
ValueError: Unsupported nproc_per_node value:

so can u give some suggestions for solutions?
Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions