I am working through issue #224 and I think the current BaseRunner.run output API could be simplified.
My understanding of the current behavior is:
capture=True uses subprocess.PIPE so stdout and stderr are returned on the CompletedProcess.
silent=True redirects output to os.devnull, unless overridden.
stdout / stderr can be set to Path objects to write the corresponding stream to disk.
This creates a few ambiguous combinations:
capture=True, silent=True silently ignores silent.
capture=False, stdout=path or stderr=path still redirects that stream to a file, despite capture being disabled.
stdout / stderr are currently path-oriented, so users cannot pass an open file object, in-memory buffer, callback, logger, or multiple destinations.
- Internal code that needs to inspect stderr must either force
capture=True or write stderr to a temporary file and read it back.
This is particularly awkward for OrcaMmRunner.run_orca_mm, which needs to inspect stderr to decide whether to raise OrcaMmException. At the moment, that requires coupling the internal error-checking behavior to the public output-routing behavior.
I think the issue is that capture, silent, stdout, and stderr are all competing controls for the same underlying concern: stream handling.
Proposal
Replace the current flag-based API with a stream-target API.
The runner would always capture stdout and stderr internally and return them on the result. The stdout and stderr arguments would only describe additional destinations for each stream.
For example:
StreamTarget = (
None
| Path
| IO[str]
| Callable[[str], None]
| Sequence["StreamTarget"]
)
Proposed semantics:
None: no external output destination
Path: write the stream to a file
IO[str]: write the stream to an open file-like object
Callable[[str], None]: stream chunks to a callback
Sequence[StreamTarget]: tee the stream to multiple destinations
The run method could then look something like:
def run(
self,
binary: OrcaBinary,
args: Sequence[str] = (),
*,
stdout: StreamTarget = None,
stderr: StreamTarget = None,
stdin: str | None = None,
cwd: Path | None = None,
timeout: int | None = None,
) -> RunResult:
...
Where RunResult exposes the captured output:
@dataclass
class RunResult:
binary: str
args: Sequence[str]
returncode: int
stdout: str
stderr: str
This removes the need for both capture and silent:
# Current silent behavior
runner.run(..., stdout=None, stderr=None)
# Write stderr to a file while still allowing internal inspection
result = runner.run(..., stderr=Path("orca_mm.err"))
# Print stdout live
result = runner.run(..., stdout=lambda chunk: print(chunk, end=""))
# Print stdout live and write it to disk
result = runner.run(
...,
stdout=[
lambda chunk: print(chunk, end=""),
Path("orca.out"),
],
)
For OrcaMmRunner.run_orca_mm, this would simplify the implementation because stderr would always be available on the returned result:
result = self.run(
OrcaBinary.ORCA_MM,
args,
stdout=stdout,
stderr=stderr,
)
if raise_on_error and result.stderr:
raise OrcaMmException(command, arguments, result.stderr)
This would decouple internal validation from user-facing output routing. Users can decide where output should go, while the runner can still inspect process output consistently.
One possible caveat is memory usage for very large outputs, since this proposal implies always retaining stdout and stderr in memory. If that becomes a concern, we could add a later escape hatch such as max_capture_bytes, but I think the simpler default API is preferable unless large-output cases become a real problem.
I am working through issue #224 and I think the current
BaseRunner.runoutput API could be simplified.My understanding of the current behavior is:
capture=Trueusessubprocess.PIPEsostdoutandstderrare returned on theCompletedProcess.silent=Trueredirects output toos.devnull, unless overridden.stdout/stderrcan be set toPathobjects to write the corresponding stream to disk.This creates a few ambiguous combinations:
capture=True, silent=Truesilently ignoressilent.capture=False, stdout=pathorstderr=pathstill redirects that stream to a file, despite capture being disabled.stdout/stderrare currently path-oriented, so users cannot pass an open file object, in-memory buffer, callback, logger, or multiple destinations.capture=Trueor write stderr to a temporary file and read it back.This is particularly awkward for
OrcaMmRunner.run_orca_mm, which needs to inspect stderr to decide whether to raiseOrcaMmException. At the moment, that requires coupling the internal error-checking behavior to the public output-routing behavior.I think the issue is that
capture,silent,stdout, andstderrare all competing controls for the same underlying concern: stream handling.Proposal
Replace the current flag-based API with a stream-target API.
The runner would always capture
stdoutandstderrinternally and return them on the result. Thestdoutandstderrarguments would only describe additional destinations for each stream.For example:
Proposed semantics:
None: no external output destinationPath: write the stream to a fileIO[str]: write the stream to an open file-like objectCallable[[str], None]: stream chunks to a callbackSequence[StreamTarget]: tee the stream to multiple destinationsThe run method could then look something like:
Where
RunResultexposes the captured output:This removes the need for both
captureandsilent:For
OrcaMmRunner.run_orca_mm, this would simplify the implementation because stderr would always be available on the returned result:This would decouple internal validation from user-facing output routing. Users can decide where output should go, while the runner can still inspect process output consistently.
One possible caveat is memory usage for very large outputs, since this proposal implies always retaining
stdoutandstderrin memory. If that becomes a concern, we could add a later escape hatch such asmax_capture_bytes, but I think the simpler default API is preferable unless large-output cases become a real problem.