Containerization

The following is a project that I completed in January 2024. It's was an exploration of Containers and AWS. The objective was to take a simple application from code all the way to loading it into a k8ns cluster in AWS.

This resulted in some great material around containers, and building containers.

The full PowerPoint can be found here:

Walkthrough.pptx

History and Evolution

Centralized to Distributed to Centralized

Computing has evolved from single user single machine, to multi-user, to distributed, and is being consolidated back into data centers. We had big single user machines that turned into big multi-user mainframes. This is where UNIX and Linux ultimately came from. We then went distributed to everyone having a computer and a server on there local network. We are not in a cycle of consolidating those servers back into data centers.

The Latest Technology (1940’s)

In colloquial usage, the terms "Turing-complete" and "Turing-equivalent" are used to mean that any real-world general-purpose computer or computer language can approximately simulate the computational aspects of any other real-world general-purpose computer or computer language.

https://en.wikipedia.org/wiki/Turing_completeness

Emulation (1980's)

Microsoft wrote BASIC on an 8080 emulator using a DEC PDP-8.

https://www.pbs.org/nerds/qa10.html

Virtualization

Modern virtualization stems from emulation - we are effectively emulating the CPU and hardware the software runs on. Most servers in classic bare metal models would only be utilized to 25%, 50%, but never up to 100% meaning the remaining utilized capacity is a waster of money, power, and resources.

Virtualization has the following benefits:

Better utilization approaching 100%
Separation and clear boundaries between software packages by no longer sharing operating systems
Abstracting the system from the hardware making moving to new hardware a simple file copy

New Problem

Emulating all hardware is inefficient as computing cycles are used to provide the basic function of the hardware it’s self.

Solution

Hardware and software working together allows the guest system to use the underlying hardware to simulate and separate different ‘virtual computers.’ While this gives a good performance increase bringing the virtualized performance closer to the bare metal performance, it requires that the system being virtualized by compatible with the underlying hardware.

Architecture

As we've gone form centralized to distributed to centralized, these cycles bear similarities, but lessons learned, lower costs, and more powerful hardware has changed the landscape. Software has changed with the internet and continued integration of systems and networks.

Architecture

The Next Problem

Virtualization is heavy: we are running 1000s of operating systems; computing resources spent running the same operating system over and over; memory and disk space wasted; human effort to manage and orchestrate.

Solution

The solution started in the 1970’s in the UNIX world. A system called ‘chroot’ would isolate the filesystem. This isolation, also known as sandboxing, continued to sandboxing application execution space. This allowed applications to safely be separated from other applications for testing and security reasons in UNIX like environments.

Ultimately this led to containerization. Rather than virtualize the hardware and spin up another copy of the operating system, we can reuse parts of the operating system in a secure manner that looks like its own system.

From Physical to Containers

Physical to Containers

On a bare metal system, everything is together. The OS is configured for a specific hardware with specific drivers - impossible to move. If an application or OS goes haywire, their is no separation so it all goes down.

When virtualizing, it solved those problems, but the OS requires 1 to 2GB of RAM for Windows, 1-2CPU minimum, and 40GB HDD. This is per application to get full isolation. This gets expensive.

Containerization is the next iteration in lighter weight isolation. It looks like it's own computer, it secure - that's where the technology started.

Container Runtimes

Docker was the first to market and is synonymous with "containers". But their exists and open standard for containers, and their are free alternatives.

Open Container Initiative

The Open Container Initiative is an open governance structure for the express purpose of creating open industry standards around container formats and runtimes.

Established in June 2015 by Docker and other leaders in the container industry, the OCI currently contains three specifications: the Runtime Specification (runtime-spec), the Image Specification (image-spec) and the Distribution Specification (distribution-spec). The Runtime Specification outlines how to run a “filesystem bundle” that is unpacked on disk. At a high-level an OCI implementation would download an OCI Image then unpack that image into an OCI Runtime filesystem bundle. At this point the OCI Runtime Bundle would be run by an OCI Runtime.
- https://opencontainers.org/

Docker vs Podman

For getting some basic understanding of containerization, we will start simple and then get more complicated. The following folders have steps to walk through containerization.

When referring to containerization, the choice of docker or podman is indifferent and podman uses the same parameters and syntax a docker. Simply replace docker with podman:

Docker vs Podman

Networking

The following is a conceptualized container runtime environment. The key here to understand is that containers are isolated on there own network that cannot talk to your local computer.

You need to do “port forwarding” to allow access into the docket containers.

Networking

For the next steps, we will explore containers and networking. First, we will discuss networking and then containers. Again, when we don’t specify a network, container engine will use the “default” network. For most instances, we will want to isolate our applications and workload on their own network.

List Network

docker network list

The network type we typically use is “bridge”. Bridge is a fancy network term for router, and for our routing, we’ll use NAT.

Create Network

docker network create houston

If we don’t specify a network type, it’ll give us a bridge. The output on success is a big long SHA hash. When referencing by hash, you must only type the first part to the point of uniqueness - otherwise you’ll target many hashes.

Delete Network

docker network rm houston

This will delete the network named “Houston”. The output on success is simply the network name.

Network Command Outputs

Working with Images

Next, we will create an ubuntu container from the standard ubuntu image. This container will map the “/root” folder to a location on your drive.

Create Container

docker create \
    --name htown-ubuntu \
    --network houston \
    -v ~/Projects/Houston/htown-ubuntu/root:/root \
    -i \
    -t \
    ubuntu

Paramters:

create – tells docker to create a container
--name – name of the container
--network – network to attach the image to
-v – mount local folder to a path inside of counter
-i – interactive
-t – use tty (terminal / console)
ubuntu – image to use

Docker Create

Start Container

docker start htown-ubuntu

Parameters:

start – indicates to start a container
htown-ubuntu – name of the counter to start

On success, it lists the containers name.

Docker Start