Skip to content

Basic principle

Jean-Baptiste Lugagne edited this page Aug 16, 2018 · 6 revisions

We describe the basic steps to follow in the training_script.m and prediction_script.m Matlab scripts. You can use the training sets and stacks from out Zenodo archive for testing purposes, or you can use your own Z-stacks.

Z-pixels and signatures

We define z-pixels as the pixel intensity of a certain pixel throughout all frames in one Z-stack. We noted that, depending on the object being observed, the "signature" of that z-pixel would differ. The basic idea is that by manually labelling z-pixels in z-stacks and then feeding those training sets into machine learning classifiers, we are able to then predict labelling of other datasets for image segmentation purposes.

Therefore the first step for the user will be to design training sets using our custom graphical interface.

Preprocessing and Principal component analysis

Once manually or semi-automatically labelled training sets have been generated, data can be extracted and formatted for training.

Before formatting, we introduced the possibility for preprocessing image functions to be performed on the frames in the Z-stacks: this allows for the incorporation via various possible operations of information on the neighborhood of each Z-pixel. This step multiplies the number of elements per z-pixel by however many preprocessing steps are being performed. This might cause memory issues. We advise downsampling for PCA estimation.

Example of preprocessing operations

Principal component analysis is then performed on the extracted data to reduce dimensionality. The user specifies a number of principal components to keep, which makes it possible to keep a large percentage of the information in the datasets while drastically reducing the number of components per observation. This is a common procedure in hyperspectral image analysis, which inspired this approach to microscopy images, to circumvent the curse of dimensionality. There are a number of other ways to go about it but this is probably the easiest one to implement and use. An axis of improvement would definitely be to look into other feature extraction techniques.

Example of classes clustering on our datasets with only the first 3 PC

Training

Once data has been extracted and formatted, machine learning classifiers can be trained on it. We currently support two types of classifiers, Support Vector Machines and Random Forests.

While the master branch of this repository is dedicated to SVM-based classification, we recently observed that Random Forests tend to perform consistently better than SVMs in terms of computational time. We therefore recommend that users use the Random Forests branch instead. The two branches are maintained in parallel with updates to one branch being usually immediately patched into the other unless they specifically concern the classifiers themsleves.

Since the software has initially been developed with SVMs in mind, the user might run into confusing messages about SVMs when they are actually on the random forests branch. We will try to remove those as time goes by.

Once the classifiers have been trained, they can be saved to disk and later be used for prediction. The user can also run evaluation scripts to try to optimize certain parameters.

Prediction

With a trained classifier the user can launch predictions on stacks, either after the experiment has been performed or on-the-fly, as stacks are being acquired.

We developed a GUI that simplifies the prediction step, although it can be easily scripted for more flexibility.

Segmentation

Although we do not provide solutions for segmentation, in a lot of case a simple watershedding step will suffice to segment cells after the stacks have been classified. We will provide a few simple scripts as examples soon.

Clone this wiki locally