Skip to content

autoscaler not handling well bad formatted JSON qstat output #43

@xpillons

Description

@xpillons

cyclecloud-pbspro version 2.0.10

With OpenPBS 19.1.1 output in JSON for qstat can be bad formatted in case of complex environment variables.
For example : qstat -f <job_id> -F json | jq '.' will return an error meaning the JSON is bad formatted.
As a result this make the autoscaler to stop working, so a single bad job can hang all the whole system and no new nodes can be added.

Here the output of azpbs autoscale

File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/environment.py", line 58, in from_driver
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib64/python3.6/logging/init.py", line 996, in emit
    stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 47623-47628: ordinal not in range(128)
Call stack:
  File "/root/bin/azpbs", line 4, in
    main()
  File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/cli.py", line 284, in main
    clilib.main(argv or sys.argv[1:], "pbspro", PBSCLI())
  File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/hpc/autoscale/clilib.py", line 1739, in main
    args.func(**kwargs)
  File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/hpc/autoscale/clilib.py", line 1315, in analyze
    dcalc = self._demand(config, ctx_handler=ctx_handler)
  File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/hpc/autoscale/clilib.py", line 360, in _demand
    dcalc, jobs = self._demand_calc(config, driver)
  File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/cli.py", line 113, in _demand_calc
    pbs_env = self._pbs_env(pbs_driver)
  File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/cli.py", line 106, in _pbs_env
    self.__pbs_env = environment.from_driver(pbs_driver.config, pbs_driver)
  File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/environment.py", line 58, in from_driver
    jobs = pbs_driver.parse_jobs(queues, default_scheduler.resources_for_scheduling)
  File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/driver.py", line 414, in parse_jobs
    self.pbscmd, self.resource_definitions, queues, resources_for_scheduling
  File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/driver.py", line 530, in parse_jobs
    response: Dict = pbscmd.qstat_json("-f", "-t")
  File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/pbscmd.py", line 31, in qstat_json
    response = self.qstat(*args)
  File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/pbscmd.py", line 25, in qstat
    return self._check_output(cmd)
  File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/pbscmd.py", line 76, in _check_output
    logger.info("Response: %s", ret)
  File "/usr/lib64/python3.6/logging/init.py", line 1308, in info
    self._log(INFO, msg, args, **kwargs)
  File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/hpc/autoscale/hpclogging.py", line 45, in _log
    **stacklevelkw
Message: 'Response: %s'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions