Skip to content

backend/processing/segmenter: Simplify deployment/usage for batch processing #378

@ethanjli

Description

@ethanjli

This is a tracking issue for our project to make the PlanktoScope segmenter easier to deploy/use for batch processing outside the PlanktoScope's Raspberry Pi.

Motivation

Currently, the segmenter is implemented as a worker/server which requires an MQTT broker, as well as something to generate an MQTT command for the segmenter. Thus, deployment of the PlanktoScope segmenter via https://github.com/PlanktoScope/pallet-segmenter is somewhat complicated, since it has to bring up an MQTT broker and a GUI to generate an MQTT command for the segmenter. This is unnecessary complexity if the user just wants to process some datasets without a GUI, and it may also make deployment on HPC clusters more challenging due to the requirement (which isn't inherently needed just for batch processing) for port 1883 to be available for MQTT communication at a minimum. If we could just initiate batch processing by launching the segmenter differently (e.g. with different command-line arguments), then we could avoid all these complexities and constraints in a common use-case for running the segmenter headlessly outside of the PlanktoScope's Raspberry Pi.

The relevance of this use-case is reflected by what Katie Crider & Margaret Mulholland are trying to do with PlanktoScopes for HABs monitoring (they want to batch-process datasets on an HPC cluster), and the issue of unnecessary complexity with MQTT for running the segmenter on other computers is validated by the changes Salima Rafai made in her version of the segmenter to run as a Python script without depending on MQTT.

Goals

  • Make it possible to invoke the segmenter for batch processing via command-line arguments, without any MQTT involved
  • Make it possible to run the segmenter for batch processing in a Jupyter Notebook.
  • Enable multiple instances of the segmenter to be launched for batch-processing different directories in parallel

Steps

  • Refactor the segmenter to separate MQTT from image-processing functionality
  • Make a nice Python API for image-processing functionality
  • Refactor the segmenter so that it can be launched as either a worker/server or a batch-processing command (i.e. make a command-line interface for the segmenter)
  • Ensure that the segmenter won't get blocked if nothing is receiving its object preview MJPEG stream (or if it does get blocked, fix that!)
  • Make it possible to install & launch the segmenter via pipx? (though maybe system libraries would still be needed for numpy, opencv, etc.)
  • ???

Unresolved Questions

  • Maybe it would also be useful to launch the segmenter to handle HTTP requests instead of MQTT commands? I haven't yet seen any concrete use-case (or any request for help with such a use-case) where this would be the simplest solution though, so this doesn't seem like an important thing to do yet.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions