- for ad hoc workflows and tools: docker, singularity, make
- for lightweight application: jupyter notebooks
- for ad hoc pipelines: shell scripts in github
- for (routine) pipelines: snakemake, Galaxy, nextflow (with tools from bioconda)
- for software tools: bioconda/biocontainers
- ...
- VM hypervisors are fat in terms of system requirements
- small, neat capsule containing your application
- enables CI, CD
- containers gives you instant application portability and easy to deploy in a cloud
- make applications and workloads more portable in an effective, standardized, and repeatable way

pros and cons
- ++ very similar to a full OS
- ++ high OS diversity
- -- need of more space and resources
- -- slower than containers
- -- not as good automation
pros and cons
- ++ faster
- ++ no need for full OS
- ++ easy solutions for distribution of recipes. high portability
- ++ easy to automate
- -- still OS dependant solutions
- -- not real OS in some cases
- platform for developing, shipping, and running applications
- infrastructure as application/code
- Open Container Initiative
- Docker community edition
- read-only templates
- containers are run from them
- images are not run
- can be built from existing images
- ubuntu, alpine
- base images can be created with tools such as Debootstrap
- any modification from base image is a new layer ( tip: use && )
- images have several layers
- Recipe: Dockerfile
- Instructions
- FROM
- ADD, COPY
- RUN
- ENV PATH, ARG
- USER, WORKDIR, LABEL
- VOLUME, EXPOSE
- CMD, (ENTRYPOINT)
- start from packages e.g. pip/PyPI, CPAN, or CRAN
- use versions for tools and images
- use ENV PATH instead of ENTRYPOINT
- reduce size as much as possible
- keep data outside the image/container
- check the license
- make your container discoverable e.g. biocontainers, quay.io, docker hub
################## BASE IMAGE ######################
FROM biocontainers/biocontainers:v1.0.0_cv4
################## METADATA ######################
LABEL base_image="biocontainers:v1.0.0_cv4"
LABEL version="2"
LABEL software="NCBI BLAST+"
LABEL software.version="2.2.31"
LABEL about.summary="basic local alignment search tool"
LABEL about.home="http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastHome"
LABEL about.documentation="http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastHome"
LABEL about.license_file="https://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/scripts/projects/blast/LICENSE"
LABEL about.license="SPDX:MIT"
LABEL extra.identifiers.biotools="BLAST"
LABEL about.tags="Genomics"
################## MAINTAINER ######################
MAINTAINER Saulo Alves Aflitos <sauloal@gmail.com>
################## INSTALLATION ######################
RUN conda install blast=2.2.31
WORKDIR /data/repo for biocontainers base image
# Base image
FROM ubuntu:16.04
################## METADATA ######################
LABEL base_image="ubuntu:16.04"
LABEL version="4"
LABEL software="Biocontainers base Image"
LABEL software.version="1.0.0"
LABEL about.summary="Base image for BioDocker"
LABEL about.home="http://biocontainers.pro"
LABEL about.documentation="https://github.com/BioContainers/specs/wiki"
LABEL about.license_file="https://github.com/BioContainers/containers/blob/master/LICENSE"
LABEL about.license="SPDX:Apache-2.0"
LABEL about.tags="Genomics,Proteomics,Transcriptomics,General,Metabolomics"
################## MAINTAINER ######################
MAINTAINER Felipe da Veiga Leprevost <felipe@leprevost.com.br>
ENV DEBIAN_FRONTEND noninteractive
RUN mv /etc/apt/sources.list /etc/apt/sources.list.bkp && \
bash -c 'echo -e "deb mirror://mirrors.ubuntu.com/mirrors.txt xenial main restricted universe multiverse\n\
deb mirror://mirrors.ubuntu.com/mirrors.txt xenial-updates main restricted universe multiverse\n\
deb mirror://mirrors.ubuntu.com/mirrors.txt xenial-backports main restricted universe multiverse\n\
deb mirror://mirrors.ubuntu.com/mirrors.txt xenial-security main restricted universe multiverse\n\n" > /etc/apt/sources.list' && \
cat /etc/apt/sources.list.bkp >> /etc/apt/sources.list && \
cat /etc/apt/sources.list
RUN apt-get clean all && \
apt-get update && \
apt-get upgrade -y && \
apt-get install -y \
autotools-dev \
automake \
cmake \
curl \
grep \
sed \
dpkg \
fuse \
git \
wget \
zip \
openjdk-8-jre \
build-essential \
pkg-config \
python \
python-dev \
python-pip \
bzip2 \
ca-certificates \
libglib2.0-0 \
libxext6 \
libsm6 \
libxrender1 \
git \
mercurial \
subversion \
zlib1g-dev && \
apt-get clean && \
apt-get purge && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
RUN echo 'export PATH=/opt/conda/bin:$PATH' > /etc/profile.d/conda.sh && \
wget --quiet https://repo.continuum.io/miniconda/Miniconda2-4.0.5-Linux-x86_64.sh -O ~/miniconda.sh && \
/bin/bash ~/miniconda.sh -b -p /opt/conda && \
rm ~/miniconda.sh
RUN TINI_VERSION=`curl https://github.com/krallin/tini/releases/latest | grep -o "/v.*\"" | sed 's:^..\(.*\).$:\1:'` && \
curl -L "https://github.com/krallin/tini/releases/download/v${TINI_VERSION}/tini_${TINI_VERSION}.deb" > tini.deb && \
dpkg -i tini.deb && \
rm tini.deb && \
apt-get clean
RUN mkdir /data /config
# Add user biodocker with password biodocker
RUN groupadd fuse && \
useradd --create-home --shell /bin/bash --user-group --uid 1000 --groups sudo,fuse biodocker && \
echo `echo "biodocker\nbiodocker\n" | passwd biodocker` && \
chown biodocker:biodocker /data && \
chown biodocker:biodocker /config
# give write permissions to conda folder
RUN chmod 777 -R /opt/conda/
# Change user
USER biodocker
ENV PATH=$PATH:/opt/conda/bin
ENV PATH=$PATH:/home/biodocker/bin
ENV HOME=/home/biodocker
RUN mkdir /home/biodocker/bin
RUN conda config --add channels r
RUN conda config --add channels bioconda
RUN conda upgrade conda
VOLUME ["/data", "/config"]
# Overwrite this with 'CMD []' in a dependent Dockerfile
CMD ["/bin/bash"]
WORKDIR /data
$ cd /home/user/workplace
$ docker pull biocontainers/blast
$ docker run biocontainers/blast blastp -help
$ wget http://www.uniprot.org/uniprot/P04156.fasta
$ curl -O ftp://ftp.ncbi.nih.gov/refseq/D_rerio/mRNA_Prot/zebrafish.1.protein.faa.gz
$ gunzip zebrafish.1.protein.faa.gz
$ docker run -v /Users/yperez/workplace:/data/ biocontainers/blast makeblastdb -in zebrafish.1.protein.faa -dbtype prot
$ docker run -v /Users/yperez/workplace:/data/ biocontainers/blast blastp -query P04156.fasta -db zebrafish.1.protein.faa -out results.txt
- Carefully define a set of tools for a given analysis
- Use tools from the Bioconda registry
- Adopt containers to guarantee consistency of results
- Use virtualization to make analyses “resistant to time”.
- impact of docker containers on performance
- container-based virtualization for HPC environments
- recommendations on containers
- practical computational reproducibility in life sciences
- software development + software operations
- automate and monitor


