skip to Main Content

Whitepaper: Containerized Genomic Workflows with Singularity – Part 2

Authors: Gwendolyn Kurtzer, Keith Cunningham, Eduardo Arango

Singularity Containers

As mentioned in the first post “The quick uptake of nf-core amongst the bioinformatics community is testament to the success of Singularity and Nextflow.” This post will give an introduction to Singularity containers and why it has become the go-to container technology among bio-informatic researchers.

One of the biggest problems in scientific computing is creating an environment  for reproducible results. That is, an application stack and it’s data must be able to run identically on any computational resource. Until recently, the job of ensuring scientific reproducibility fell onto system administrators, to manage a complex set of tools, applications, data and related resource dependencies.  However, with the introduction of Singularity, a container platform designed specifically to simplify the application of statistical techniques, the science of reproducibility has never been easier.

Within just the past few years, the use of containers has revolutionized the way in which industries and enterprises have developed and deployed computational software and distributed systems. The containerization model is gaining traction because it  provides improved reliability, reproducibility, and levels customization that have not been possible. From the onset of containerization in high performance computing, Singularity has lead the way in providing container services, ranging from small clusters to massive supercomputers.

Container computing has revolutionized the way groups are developing, sharing, and running software.  This has been led by the growth of acceptance by many corporate DevOps teams, which has provided an ecosystem of tools to enable container computing.  This paradigm shift made inroads in the high performance computing community via new tools such as Singularity, allowing users to securely run containers in environments where it was not feasible for other container platforms.

Today Singularity is the most widely used container solution in High-Performance Computing (HPC) centers. Enterprise users interested in AI, deep learning, compute driven analytics, and IoT are increasingly demanding HPC-like tools and resources. Singularity has many features that make it the preferred container solution for this new type of enterprise workloads. Instead of a layered filesystem, a Singularity container is stored in a single file.
This simplifies the container management lifecycle and facilitates features such as image signing and encryption to produce a trusted container environment.

The Singularity container system started as an open source project in 2015 and was , created as a result of scientists wanting a new method of packaging analytics applications for mobility and repeatability. By combining the success in HPC environments with the rapid expansion of artificial intelligence, deep learning, and machine learning in the Enterprise, Singularity is uniquely qualified to address the needs of a new market called Enterprise Performance Computing (EPC).

Instead of a layered file system, the Singularity image format encapsulates applications, data, scripts and supporting libraries  in a single file. This simplifies the container management lifecycle and facilitates features such as image signing and encryption to produce trusted containers, which also enhance reproducibility and portability.

At runtime, Singularity blurs the lines between the container and the host system allowing users to read and write persistent data and leverage hardware like GPUs and Infiniband with ease.

The Singularity security model is also unique among container solutions. Users can now build containers on resources they control, or by using Sylabs container library. Then they can move their containers to a production environment where the Linux kernel enforces privileges as it does with any other application. These features make Singularity a simple, secure container solution perfect for HPC and EPC workloads.

Singularity blocks privilege escalation within the container so if a user wants to be root inside the container, it must be root outside the container. This usage paradigm mitigates many of the security concerns that exists with containers on multi-tenant shared resources. You can directly call programs inside the container from outside the container fully incorporating pipes, standard IO, file system access, X11, and MPI.

One of Singularity’s architecturally defined features is the ability to execute containers as if they were native programs or scripts on a host computer. All standard input, output, error, pipes, IPC, and other communication pathways used by locally running programs are synchronized with the applications running locally within the container.

The key functions of Singularity are:

  • Designed specifically for compute based workflows
  • Uses Singularity Image Format (SIF)
  • Portable containers that natively leverages GPU
  • Works with Mellanox and Intel interconnects
  • MPI and PMIx Workflows compatible
  • Runs on ARM, Power, and x86 platforms
  • Service and Batch Job compatibility
  • Native integration with batch scheduling systems and resource managers (Slurm, PBS, LSF, etc.)
  • Can use other containers as source to build Singularity Images (Docker Hub, Quay, Registries, OCI, etc.)

Sylabs provides licensing, enterprise level support, professional services, cloud functionality, and value-added plugins for the Singularity container platform.

Part 3, of this 3 part post series, is planned to be posted on November 8.

Back To Top