Skip to content

Specification

Louis VASLIN edited this page Jan 9, 2026 · 2 revisions

Model Specification

The model for Unsupervised Defect Detection is built as an deep convolution Auto-Encoder.
The figure bellow shows a global schematic of the model architecture. global schematic

The details of the encoding and decoding blocks are shown on the following schematics. blocks schematic

The latent space is given by a final convolution layer with a stride of 2.
The latent space is then processed through a transpose convolution layer with stride of 2 before feeding into the first decoding block.

Most of the hyperparameters of the model must be defined in a configuration file with JSON-like format before training.
The specification of the configuration format is detailed in the next section.

Configuration specification

The JSON-like file must contain the following fields to fully define a model :

  • block_ker
    The kernel size of the convolution layers used in each encoding blocks.
    The decoding block will follow the same list of kernel sizes in reverse order.
    Example : ‘block_ker’: [5, 5] → 2 encoding blocks with kernel size 5x5

  • drop
    The amount of dropout to be apply in each encoding blocks.
    The decoding block will follow the same list of dropout rates in reverse order.
    Example : drop’: [0.01, 0.01] → 2 encoding blocks with dropout rate of 1%

  • block_size_in
    The features for each convolution layers in the encoding blocks.
    Each blocks must have 3 separate numbers.
    The first is the number of input features (input size first layer).
    The second is the number of intermediate features (output size of first layer)
    The last is the number of output features (output size of second layer)
    Example : ‘block_size_in’: [[3, 25, 20], [20, 15, 10]] → 2 encoding blocks

  • block_size_out
    The number of features for each convolution layers in the decoding blocks.
    It can be different from the sizes defined for the encoding block (asymmetric AE).
    The specification are the same as for block_size_in.
    Example : ‘block_size_out’: [[5, 10, 15], [15, 20, 3]] → 2 decoding blocks

  • latent_size
    The number of output features of the latent space convolution layer.
    Example : 'latent_size': 3 → latent space with feature size of 3

  • latent_ker
    The kernel size of the latent space convolution layer.
    Example : 'latent_ker': 3 → kernel size of latent space convolution is 3x3

  • laten_opad
    The output padding size of the transposed convolution layer of the latent space.
    It must be set as to ensure that the output shape of the layer is mirroring the input shape of the latent space convolution.
    Example : 'latent_opad': [1,1]

  • out_pad
    The output padding size of each decoding block.
    It must be set to ensure that the output shape of each decoding block is mirroring the input
    shape of the encoding blocks.
    Example : 'out_pad': [[1,1], [1,1]]

Clone this wiki locally