Containers are one of those things that many people hear about or even use, but don't quite understand what they are exactly. Sure, you may know what they are, but do you understand why they're so popular? I'm going to attempt to explain a bit about what containers are and why people are scrambling to get on the wagon. My intention is that someone who has some background in technology can read this and go, "Oh, okay. Cool. I get it now." At the very least, I hope you'll understand why I have a picture of a stuffed suitcase for my article.


Bit of Background

If you're already familiar with Virtual Servers and Hypervisors, you can skip this section.

Hosting applications isn't as simple as just installing an app and running it like you would install a text editor on your PC. You have to worry about things like "Crap, if this lonely server that runs our application dies, no one can use our service because the server that runs our app is dead." Or even, "Yay! So many people are using our application that we need more powerful servers to keep up, or even just more servers." In the old days (and even still today for some), you had to buy an actual server, rack it, network it, install the OS, install the application, configure the application, and run it. And since you can't afford down time, you need more than one server, each hosting the application in case one of the servers dies. If you have high demand, you may need more servers for more applications. Being able to keep adding servers with an application on it to distribute the work is called Horizontal Scaling. What happens when you have 2 applications that work together to provide your service? It's either buy 2 servers or install them on the same server. What if one application was Windows based and the other Linux? Now you definitely have to have 2 servers.  Just putting the costs of servers aside (yeah, they're expensive), you have costs in just maintaining them. You also have to spend money on new ones when the old ones get...well...old. I can go on, but I think everyone pretty much agrees that maintaining physical servers for your application kinda sucks.

Then, along the way, someone made server virtualization a thing.

Server Virtualization allows you to simulate the hardware of a physical server, and create many "virtual" servers on it with their own OS, each thinking they're running on dedicated hardware. Now, instead of buying 2 physical servers, I can create 2 (or more) Virtual Servers on the same machine. This gives each virtual server their own little bubble where they can operate independently of the other virtual servers. This is called Isolation. Isolation is really key because it says that from the operating system's perspective, running its stuff won't interfere with other servers. While the virtual servers are using the same physical memory and storage, they're isolated so they can't see each other.  If one runs out of disk space, or has a kernel panic, or has package dependencies that another server can't meet, it's all good because the virtual servers are isolated from each other. Also, since we can simulate the hardware, we can standardize that simulation so that the virtual server's OS sees the same simulated hardware, no matter how different the real hardware is. This is called Abstraction, and the thing that simulates the hardware and provides that abstraction is called a Hypervisor. Now that the entire thing is virtual, I can package it up and transfer someplace else to run it. This is called Portability. Portability is really important as well because it says that no matter what's underneath or where it is, I can run my virtual servers on it as long as the Hypervisor is the same. VMWare is an example of such a hypervisor. There are others as well (KVM,  Zen, etc.).

Containers

Just as hypervisor technology abstracts the operating system from the physical server, container technology abstracts the application from the operating system. A hypervisor is to a VM, as a container engine is to a container. The most popular container engine out there is Docker. The same, awesome characteristics of isolation and portability that we gave to operating systems running on VMs, we now give to applications by running them in containers. This is HUGE. [// TODO: Insert Michael Scott quote]

When you think about it, an operating system isn't a destination. It isn't something someone wants by itself. No one says, "You know? I love my operating system...just sitting there...existing." Okay, maybe some of you do. But most people don't. Most people need an operating system to act as this facilitator between all of the applications that they really are after. People want applications, they don't want an operating system. This is why containers are so great, they're focused on the applications; the things we're really interested in interacting with. 

When a container starts, it's just an application so all that's really starting is a process. Contrast this to starting a VM where you have to wait until the operating system is all booted up before your application can be started. Applications in containers start nearly instantaneously, whereas VMs do not. 

What's in a Container?

Containers are images (just like VM images) that contain just the application and its dependencies. This makes them generally much smaller than your typical VM image. This image is built following a specification called an OCI Image. Instead of forcing a developer to starting with nothing and write a script to add all of the application dependencies into the image, one can start with a pre-built image, and just add their application to it. For example, if you have a Java application, you can start with the OpenJDK image, copy in your application files to a path inside the image, and build it. The instructinos for building an image go into a file named Dockerfile. So what exactly does that look like? Here's a Dockerfile for an application that requires Node.js as a dependency.

FROM node:8
WORKDIR /app
COPY ./artifacts ./
ENTRYPOINT ["npm", "start"]

Line 1: Start with the docker image node:8 (read: node.js, version 8).
Line 2: Set the current working directory (within the image) to /app
Line 3: Copy ./artifacts from my local machine, to ./ (which is now /app) inside the image.
Line 4: Execute npm start when someone runs the container

What's really nice is that Docker created this big registry of images that the community can upload images to. That's why all I had to do was ask for node:8 in my Dockerfile. Docker knew to go out to the Docker public registry and pull down the node image tagged with version 8.

NOTE: Container storage is ephemeral; meaning any data written from within the container while its running will be lost when it exist.

People are creating images all the time and publishing them to the public registry, so there's a lot of reuse going on. This leads to a layer effect vs everyone just doing their own thing in one shot. You can look here at the node:8 image and see that it's built from the buildpack-deps:jessie image. Following that, you can see the buildpack-deps:jessie Dockerfile here. You can keep tracing the chain back to a root image called scratch. Scratch is the starting point with pretty much the bare minimum on it. It's important to note that you can also have private registries for your own images. After installing and setting up the private registry, you can push and pull images by prefixing the image name with the registry hostname. For example, if you setup a registry that accessible as registry.my-domain.net, you can push and pull images by including that just before the image name, i.e. registry.my-domain.net/my-image:1 will pull the image from registry.my-domain.net.


Key Point

Just like VMs allow you to pack more operating systems (read VMs) onto a physical server, containers allow you to pack more applications (read containers) onto an operating system. This makes more efficient use of your server resources. So how is it possible to pack many operating systems onto a server? Isolation. Each operating system gets its own little bubble called a VM. Take this a level higher and the same rule applies. You can pack more applications onto an operating system because each application gets its own little bubble called a container. 

Let's say suitcases cost $50k and you needed to pack as many clothes as possible into a suitcase so you didn't have to spend money on a 2nd suitcase. However, for some reason the laws of travel physics says you must place the clothes into these strange rectangular boxes first, before putting them into the suitcase. Because you can only fit so many clothes into a box, and only so many boxes fit into a suitcase, you forced to spend money on a few more suitcases. THEN someone comes along and says, "To hell with those boxes! Just put the clothes into these vacuum bags!". WOW! Look at how many clothes you can fit into a single suitcase. 


What's Next?

Now that we have the ability to publish containers to registries, we can run as many as we'd like on the same host by just running the docker command. For example, to run the nginx container:
$ docker run --name my-nginx -d nginx

This pulls the nginx image down from the registry and starts the container in the background (thanks to the -d option).  If you want persistent storage from within the container, no problem. Just map some directory on your host into a path into the container like this:
$docker run --name my-nginx -d -v /var/my-host/data:/var/www/static nginx

This tells docker to take the local directory /var/my-host/data and mount that into the container as /var/www/static. Now every file written from inside the container to /var/www/static will show up on the local host at /var/my-host/data.

That's all there is to it really. There are many other odds and ends you can do to customize the behavior but these are the basics.



So what's Swarm & Kubernetes then?

With containers, you can do all sorts of interesting things like expose ports, map storage volumes in (so you can actually store data that sticks around after the container exits). But in order to really go crazy with containers, you need some sort of orchestration that allows you do auto scale up/down, assign IP address, allocate dynamic storage, load balance, etc. This is where container orchestration platforms such as Kubernetes and Swarm become handy. There are entire books that cover this topic so I'll leave that for another post.

No comments:

Post a Comment