[GSoC2020][Bonding period] What is a singularity container? What’s the difference with Docker?

The goal of the project is to implement a pipeline and deploy it on the High-Performance Computing(HPC) cluster. Thus Red Hen Lab asks us to use singularity as a container to facilitate the deployment. I only had a very basic idea about a container. It is high time for me to better understand this hot-spot concept and try to answer some questions in my head:

  • What is the difference between a container and a Virtual Machine?
  • What is a singularity?
  • I have heard about Docker and it seems that Docker is more popular. What is the difference between Docker and Singularity?
  • Why should we use Singularity?

What is the difference between containers and Virtual Machines?

Let’s start with the Container. I think the name itself is a very successful analogy to explain the functionality beforehand.

Figure 1: Container[1]

Just as shown in Figure 1, a container isolate the goods with others so that each good can demand a specific setting/environment and all goods will not interrupt each other while being carried in the same freight. However, we should notice that a container has no engine, it cannot run by itself. So it needs to be hosted in the working machine.

By analogy, I think it explains pretty much the difference between a container and a virtual machine. Figure 2 provides points out the main difference between those two structures. We can see that the container has direct access to the host kernel, whereas the VMs have indirect access, resulting in a significant performance loss.

Figure 2: Difference between Containers(left) and Vitual Machines(right)[1]

What is a Singularity?

First of all, Singularity is a container. So it keeps its advantages compared with VMs. A more formal definition of Singularity is a free, cross-platform, and open-source computer program that performs operating-system-level virtualization, according to the definition in wikipedia[2].

Figure 3: Logo of Singularity

I think the most abstract part of this definition is operating-system-level virtualization. It means that at execution time, the container is running on top of the host’s operating system, which is treated as a standard process. In a word, might be another way to express the direct access to the kernel.

Second, there are some features that differ a Singularity from other containers: scientific reproducibility, single-file based container image, no trust security model, etc. I am not going to list out and explain all features, which is out of the concern of this post and also out of my capacity. However, it is kind of interesting to explain why Singularity is more suitable for HPC. That is the reason Red Hen Lab chose to use Singularity instead of Docker. To do so, we need to compare Singularity and Docker.

What is the difference between Docker and Singularity?

Docker, the most popular container implementation around the world, is facing a competitor for some specific users: HPC centers. Why is that?

Basically, there are some drawbacks of Docker:

  • Security concerns due to the shared resources in HPC
    • Users can escalate to root
    • Non-authorized users having root access to any of the production networks are problematic
    • Cannot limit access to local file systems
  • No native GPU support, which is important for scientific research
  • Root owned Docker daemon is outside the reach and control of the resource manager
  • Inefficient support for new patches/implementations specific to HPC

In summary, Docker is designed, built and maintained for entreprises not for HPC centers.

To tackle this problem, a specific HPC-oriented container is born: Singularity. Compared with the mentioned drawbacks of Docker, Singularity has the following advantages:

  • Security confirmed: inside user == outside user
    • if you want to be root inside of the container, you must first be root outside of the container.
    • No root escalation allowed
  • Architected specifically for scientific reproducibility
  • Portable, sharable and distributable container: single-file based container images
  • Compatibility with existing shared resources
  • Support HPC hardware: GPU, Infiniband

Well, given these improvements compared with Docker, it is quite clear why Singularity is welcome in many HPC centers.

How to use:

There are detailed documentations about creating/building/updating a Singularity container, I will not repeat again.

Link for Build a Recipe: https://github.com/singularityhub/singularityhub.github.io/wiki/Build-A-Container#building-your-container

Youtube tutorial for writing a recipe:

As a reminder, I have noticed one issue on the documentation and helped correct it. For more details please see:

https://www.kangzhiq.com/2020/05/11/gsoc2020-troubleshooting-for-singularity-hub

One final remark: never overlook the scientists’ group. If you don’t treat them correctly, they can surely live without you. Ahahah 🙂

Reference:

https://www.sdsc.edu/Events/training/singularity_on_comet_2019/introduction-to-singularity.pdf

https://en.wikipedia.org/wiki/Singularity_(software)

https://www.slideshare.net/IntelSoftware/introduction-to-highperformance-computing-hpc-containers-and-singularity

Leave a Reply

Your email address will not be published. Required fields are marked *