Introduction to Storage in Docker

After reading this post, you will understand:

  • How docker stores data
  • How docker manages file system

Let us start with how a Docker stores data on the local file system.

When you install Docker on a system, the following folder structure is created at /var/lib/docker

affairs containers image volumes etc..

This is where Doctor stores all its data by default.

When I say data I mean files related to images and containers running on the dock or host for example

all files related to containers are stored under the containers folder and the files related to images

are stored under the image folder.

Any volumes created by the docker containers are created under the volumes folder.

Well don't worry about that for now.

We will come back to that in a bit.

For now let's just understand where Docker stores its files and in what format.

So how exactly does Docker stored the files of an image and a container you understand that we need

to understand Dockers layered architecture.

Let's quickly recap something we learned when Docker builds images it builds these in a layered architecture.

Each line of instruction in the docker file creates a new layer in the Docker image with just the changes

from the previous layer.

For example the first layer is a base Ubuntu operating system followed by the second instruction that

creates a second layer which installs all the APD packages.

And then the third instruction creates a third layer which with the python packages followed by the

fourth layer that copies the source code over.

And then finally the fifth layer that updates the entry point of the image since each layer only stores

the changes from the previous layer.

It is reflected in the size as well.

If you look at the base one to image it is around and 120 megabytes in size.

The AAPT packages that are installed is around 300 M B and then the remaining layers are small to understand

the advantages of this layered architecture.

Let's consider a second application this application has a different darker file but is very similar

to our first application as in it uses the same base image as a one to use as the same python and flask

dependencies but uses a different source code to create a different application.

And so a different entry point as well.

When I run the docker build command to build a new image for this application since the first three

layers of both the applications are the same Docker is not going to build the first three layers.

Instead it reuses the same three layers it built for the first application from the cache and only creates

the last two layers with the new sources and the new entry point this way Docker builds images faster

and efficiently saves disk space.

This is also applicable if you were to update your application code whenever you update your application

code such as the abductee y in this case Docker simply reuses all the previous layers from cache and

quickly rebuilds the application image by updating the latest source code thus saving us a lot of time

during rebuilds and updates let's rearrange the layers bottom up so we can understand it better at the

bottom we have the base open to layer then the packages then the dependencies and then the source code

of the application and then the entry point all of these layers are created when we run the docker build

command to form the final Docker image so all of these are the Docker image layers.

Once the build is complete you cannot modify the contents of these layers and so they are read only

and you can only modify them by initiating a new build when you run a container based off of this image.

Using the docker run command Docker creates a container based off of these layers and creates a new

rateable layer on top of the image layer.

The rateable layer is used to store data created by the container such as log files by the applications.

Any temporary files generated by the container or just any file modified by the user on that container

the life of this layer though is only as long as the container is alive.

When the container is destroyed this layer and all of the changes stored in it are also destroyed.

Remember that the same image layer is shared by all containers created using this image if I were to

log into the newly created container and say create a new file called temp dot t t.

It would create that file in the container layer which is read and write.

We just said that the files in the image layer are read only meaning you cannot edit anything in those

layers.

Let's take an example of our application code since we bake our code into the image.

The code is part of the image layer and as such is read only after running a container.

What if I wish to modify the source code to say test a change.

Remember the same image layer may be shared between multiple containers created from this image.

So does it mean that I cannot modify this file inside the container.

Now I can still modify this file but before I save the modified file Docker automatically creates a

copy of the file in the read write layer and I will then be modifying a different version of the file

in the rewrite layer.

All future modifications will be done on this copy of the file in the rewrite layer.

This is called copy on write mechanism the image layer being read only just means that the files in

these layers will not be modified in the image itself so the image will remain the same all the time

until you rebuild the image using the docker build command.

What happens when we get rid of the container all of the data that was stored in the container layer

also gets deleted.

The change we made to the Abdul Pillai and the new ten file we created will also get removed.

So what if we wish to persist this data.

For example if we were working with a database and we would like to preserve the data created by the

container we could add a persistent volume to the container to do this first create a volume using the

docker volume create command.

So when we run the docker volume create data underscore volume command it creates a folder called data

underscore volume under the var lib Docker volumes directory.

Then when I run the docker container using the docker run command I could mount this volume inside the

docker containers rewrite layer using the dash the option like this.

So I would do a docker run Daschle then specify my newly created volume name followed by a colon and

the location inside my container which is the default location where miniscule stored data and that

is where lib my askew.

And then the image name my askew all this will create a new container and mount the data volume we created

into var lib.

My obscure folder inside the container so all data written by the database is in fact stored on the

volume created on the docker host.

Even if the container is destroyed the data is still active.

Now what if you didn't run the docker volume create command to create the volume before the docker run

command.

For example if I run the docker run command to create a new instance of my rescue container with the

volume data underscore volume 2 which I have not created yet Docker will automatically create a volume

named data underscore volume 2 and mount it to the container.

You should be able to see all these volumes if you list the contents of the var lib Docker volumes folder.

This is called volume mounting as we are mounting in volume created by Docker under the var lib Docker

volumes folder.

But what if we had our data already at another location for example let's say we have some external

storage on the docker host at or slash data and we would like to store database data on that volume

and not in the default where the docker volumes folder.

In that case we would run a container using the command Docker run Daschle.

But in this case we will provide the complete part to the folder we would like to mount.

That is what slash data for Slash minus Q Well and so it will create a container and mount the folder

to the container.

This is called bind mounting.

So there are two types of mounts a volume mounting and a bind mount volume mount mounts a volume from

the volumes directory and bind mount mounts a directory from any location on the docker host.

One final point to note before I let you go using the dash V is an old style the new way is to use dash

mount option the dash dash mount is the preferred way as it is more verbose.

So you have to specify each parameter in a key equals value format.

For example the previous command can be written with the dash mount option as this using the type source

and target options.

The type in this case is bind the source is the location on my host and target is the location on my

container

so who is responsible for doing all of these operations.

Maintaining the layered architecture.

Creating a viable layer moving files across layers to enable copy and write etc. It's the storage drivers.

So Dockery uses storage drivers to enable layered architecture.

Some of the common storage drivers are a user fast BTR affairs DFS device mapper overlay and overlay

to the selection of the storage driver.

Depends on the underlying OS being used for example with Ubuntu.

The default story is driver is a new offence whereas this store as driver is not available on other

operating systems like fedora or S.O.S.

In that case device mapper may be a better option Docker will choose the best stories driver available

automatically based on the operating system the different stories drivers also provide different performance

and stability characteristics so you may want to choose one that fits the needs of your application

and your organisation.

If you would like to read more on any of these stories drivers please refer to the links in the attached

documentation for now.

That is all from the docker architecture concepts.

See you in the next lecture.