This repository contains the backend asynchronous worker for the KintaGen platform. Built to handle heavy scientific computations, it combines the speed of Node.js for API routing and asynchronous job management with the analytical power of R for processing raw scientific data (like NMR and GC-MS).
It is designed to be deployed as a Docker container, often managed alongside Vercel serverless functions, Redis, and QStash, acting as the dedicated muscle for tasks that exceed standard serverless execution limits.
The worker is a Node.js Express server (server.js) wrapped around native R scripts. It securely receives job payloads from the KintaGen frontend (via a task queue), downloads the necessary data from IPFS (Pinata/Lighthouse) or Vercel Blob into a secure temporary environment, and spawns synchronous R child processes to perform the calculations.
- Job Reception: The
/process-jobendpoint receives a POST request containing the job ID, analysis type (xcms,drc,nmr), and the target file URLs. Access is strictly controlled viaWORKER_SECRET. - Data Fetching: The Node.js server seamlessly downloads the target datasets to temporary local storage (
os.tmpdir()). - R Execution: The appropriate native R script is executed as a child process, parsing the data and performing complex calculations.
- Post-Processing & Identification (GC-MS): For GC-MS workflows, the Node worker intercepts the top extracted spectra and asynchronously calls the MoNA (Massbank of North America) API (
identifySinglePeak) to match spectra against known library compounds. - Status Reporting: Upon completion (or failure), the worker updates the Vercel/Redis status API and aggressively cleans up all local temporary files to ensure stateless execution.
Currently, the worker natively supports three major pipelines via dedicated R scripts:
drc_analysis.R(LD50): Performs classical LD₅₀ / ED₅₀ dose-response modelling using thedrcpackage. Emits statistical estimates, confidence intervals, and a base64 encoded PNG plot.xcms_analysis.R(GC-MS): Executes a full untargeted GC-MS pipeline on.mzMLor.CDFfiles leveraging thexcmsBioconductor package. It handles peak detection, centroiding, and retention time alignment.nmr1d_analysis.R(NMR): Processes raw 1H-NMR Bruker directories. It performs automatic phase and baseline correction, generating fitted plots and calibration zooms.
Because scientific R packages often require complex system-level dependencies, this API is packaged in a robust Docker image based on rocker/r-ver:4.3.2.
The Dockerfile handles the installation of:
- System Libraries:
libnetcdf-dev,libhdf5-dev,libglpk-dev,libfftw3-dev(essential for mass-spec and Fourier transform operations). - CRAN & Bioconductor Packages: Standard scientific packages including
xcms,drc,ggplot2, andRnmr1D(installed directly from GitHub). - Node.js Runtime: Integrates Node 20.x alongside R to run the Express API wrapper safely within the same container.
To run this worker locally for development or testing:
- Environment Setup: Create a
.envfile referencing the.env.examplestructure in the main application. You will minimally need a dummyWORKER_SECRETandPORT. - Install Node Dependencies:
npm install
- Install R Dependencies: Ensure you have R (v4.3+) installed locally with the required packages (
drc,xcms,BiocManager, etc.) as defined in the Dockerfile. - Start the Server:
node server.js
The API will expose a /healthcheck route and the secured /process-job endpoint on the configured port.