Skip to content

Latest commit

 

History

History
326 lines (238 loc) · 9.09 KB

File metadata and controls

326 lines (238 loc) · 9.09 KB

Interest Group Meeting

Containers and Workflows

25.01.2019 Clemenspoort


what options are available


what options are available

  • for ad hoc workflows and tools: docker, singularity, make
  • for lightweight application: jupyter notebooks
  • for ad hoc pipelines: shell scripts in github
  • for (routine) pipelines: snakemake, Galaxy, nextflow (with tools from bioconda)
  • for software tools: bioconda/biocontainers
  • ...

So why does everyone love containers and Docker?


So why does everyone love containers and Docker?

  • VM hypervisors are fat in terms of system requirements
  • small, neat capsule containing your application
  • enables CI, CD
  • containers gives you instant application portability and easy to deploy in a cloud
  • make applications and workloads more portable in an effective, standardized, and repeatable way

Containers VMs https://www.zdnet.com/article/what-is-docker-and-why-is-it-so-darn-popular/


virtualisation

pros and cons

  • ++ very similar to a full OS
  • ++ high OS diversity
  • -- need of more space and resources
  • -- slower than containers
  • -- not as good automation

containers

pros and cons

  • ++ faster
  • ++ no need for full OS
  • ++ easy solutions for distribution of recipes. high portability
  • ++ easy to automate
  • -- still OS dependant solutions
  • -- not real OS in some cases

Docker

Docker


Docker

  • platform for developing, shipping, and running applications
  • infrastructure as application/code
  • Open Container Initiative
  • Docker community edition

Docker components


Docker image

  • read-only templates
  • containers are run from them
  • images are not run

Docker image - building

  • can be built from existing images
    • ubuntu, alpine
  • base images can be created with tools such as Debootstrap
  • any modification from base image is a new layer ( tip: use && )
  • images have several layers

Docker image - instructions

  • Recipe: Dockerfile
  • Instructions
  • FROM
  • ADD, COPY
  • RUN
  • ENV PATH, ARG
  • USER, WORKDIR, LABEL
  • VOLUME, EXPOSE
  • CMD, (ENTRYPOINT)

Reference


** One tool, one image **

  • start from packages e.g. pip/PyPI, CPAN, or CRAN
  • use versions for tools and images
  • use ENV PATH instead of ENTRYPOINT
  • reduce size as much as possible
  • keep data outside the image/container
  • check the license
  • make your container discoverable e.g. biocontainers, quay.io, docker hub

Example

################## BASE IMAGE ######################

FROM biocontainers/biocontainers:v1.0.0_cv4

################## METADATA ######################

LABEL base_image="biocontainers:v1.0.0_cv4"
LABEL version="2"
LABEL software="NCBI BLAST+"
LABEL software.version="2.2.31"
LABEL about.summary="basic local alignment search tool"
LABEL about.home="http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastHome"
LABEL about.documentation="http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastHome"
LABEL about.license_file="https://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/scripts/projects/blast/LICENSE"
LABEL about.license="SPDX:MIT"
LABEL extra.identifiers.biotools="BLAST"
LABEL about.tags="Genomics"

################## MAINTAINER ######################

MAINTAINER Saulo Alves Aflitos <sauloal@gmail.com>

################## INSTALLATION ######################

RUN conda install blast=2.2.31

WORKDIR /data/

repo for biocontainers base image

# Base image
FROM ubuntu:16.04

################## METADATA ######################

LABEL base_image="ubuntu:16.04"
LABEL version="4"
LABEL software="Biocontainers base Image"
LABEL software.version="1.0.0"
LABEL about.summary="Base image for BioDocker"
LABEL about.home="http://biocontainers.pro"
LABEL about.documentation="https://github.com/BioContainers/specs/wiki"
LABEL about.license_file="https://github.com/BioContainers/containers/blob/master/LICENSE"
LABEL about.license="SPDX:Apache-2.0"
LABEL about.tags="Genomics,Proteomics,Transcriptomics,General,Metabolomics"

################## MAINTAINER ######################
MAINTAINER Felipe da Veiga Leprevost <felipe@leprevost.com.br>

ENV DEBIAN_FRONTEND noninteractive

RUN mv /etc/apt/sources.list /etc/apt/sources.list.bkp && \
    bash -c 'echo -e "deb mirror://mirrors.ubuntu.com/mirrors.txt xenial main restricted universe multiverse\n\
deb mirror://mirrors.ubuntu.com/mirrors.txt xenial-updates main restricted universe multiverse\n\
deb mirror://mirrors.ubuntu.com/mirrors.txt xenial-backports main restricted universe multiverse\n\
deb mirror://mirrors.ubuntu.com/mirrors.txt xenial-security main restricted universe multiverse\n\n" > /etc/apt/sources.list' && \
    cat /etc/apt/sources.list.bkp >> /etc/apt/sources.list && \
    cat /etc/apt/sources.list

RUN apt-get clean all && \
    apt-get update && \
    apt-get upgrade -y && \
    apt-get install -y  \
        autotools-dev   \
        automake        \
        cmake           \
        curl            \
        grep            \
        sed             \
        dpkg            \
        fuse            \
        git             \
        wget            \
        zip             \
        openjdk-8-jre   \
        build-essential \
        pkg-config      \
        python          \
	python-dev      \
        python-pip      \
        bzip2           \
        ca-certificates \
        libglib2.0-0    \
        libxext6        \
        libsm6          \
        libxrender1     \
        git             \
        mercurial       \
        subversion      \
        zlib1g-dev &&   \
        apt-get clean && \
        apt-get purge && \
        rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

RUN echo 'export PATH=/opt/conda/bin:$PATH' > /etc/profile.d/conda.sh && \
    wget --quiet https://repo.continuum.io/miniconda/Miniconda2-4.0.5-Linux-x86_64.sh -O ~/miniconda.sh && \
    /bin/bash ~/miniconda.sh -b -p /opt/conda && \
    rm ~/miniconda.sh

RUN TINI_VERSION=`curl https://github.com/krallin/tini/releases/latest | grep -o "/v.*\"" | sed 's:^..\(.*\).$:\1:'` && \
    curl -L "https://github.com/krallin/tini/releases/download/v${TINI_VERSION}/tini_${TINI_VERSION}.deb" > tini.deb && \
    dpkg -i tini.deb && \
    rm tini.deb && \
    apt-get clean

RUN mkdir /data /config

# Add user biodocker with password biodocker
RUN groupadd fuse && \
    useradd --create-home --shell /bin/bash --user-group --uid 1000 --groups sudo,fuse biodocker && \
    echo `echo "biodocker\nbiodocker\n" | passwd biodocker` && \
    chown biodocker:biodocker /data && \
    chown biodocker:biodocker /config

# give write permissions to conda folder
RUN chmod 777 -R /opt/conda/

# Change user
USER biodocker

ENV PATH=$PATH:/opt/conda/bin
ENV PATH=$PATH:/home/biodocker/bin
ENV HOME=/home/biodocker

RUN mkdir /home/biodocker/bin

RUN conda config --add channels r
RUN conda config --add channels bioconda

RUN conda upgrade conda

VOLUME ["/data", "/config"]

# Overwrite this with 'CMD []' in a dependent Dockerfile
CMD ["/bin/bash"]

WORKDIR /data

How to run the docker image

 $ cd /home/user/workplace
 $ docker pull biocontainers/blast
 $ docker run biocontainers/blast blastp -help
 $ wget http://www.uniprot.org/uniprot/P04156.fasta    
 $ curl -O ftp://ftp.ncbi.nih.gov/refseq/D_rerio/mRNA_Prot/zebrafish.1.protein.faa.gz
 $ gunzip zebrafish.1.protein.faa.gz
 $ docker run -v /Users/yperez/workplace:/data/ biocontainers/blast makeblastdb -in zebrafish.1.protein.faa -dbtype prot
 $ docker run -v /Users/yperez/workplace:/data/ biocontainers/blast blastp -query P04156.fasta -db zebrafish.1.protein.faa -out results.txt

reproducibility stack Reference


recommendations

  • Carefully define a set of tools for a given analysis
  • Use tools from the Bioconda registry
  • Adopt containers to guarantee consistency of results
  • Use virtualization to make analyses “resistant to time”.

Further reading


Thanks

  • Bioinfo Core at CRG slides
  • based on recommendation from F1000

devops

  • software development + software operations
  • automate and monitor

Devops Explained