Introduction to Storage in Docker
After reading this post, you will understand:
- How docker stores data
- How docker manages file system
Let us start with how a Docker stores data on the local file system.
When you install Docker on a system, the following folder structure is created at /var/lib/docker
affairs containers image volumes etc..
This is where Doctor stores all its data by default.
When I say data I mean files related to images and containers running on the dock or host for example
all files related to containers are stored under the containers folder and the files related to images
are stored under the image folder.
Any volumes created by the docker containers are created under the volumes folder.
Well don't worry about that for now.
We will come back to that in a bit.
For now let's just understand where Docker stores its files and in what format.
So how exactly does Docker stored the files of an image and a container you understand that we need
to understand Dockers layered architecture.
Let's quickly recap something we learned when Docker builds images it builds these in a layered architecture.
Each line of instruction in the docker file creates a new layer in the Docker image with just the changes
from the previous layer.
For example the first layer is a base Ubuntu operating system followed by the second instruction that
creates a second layer which installs all the APD packages.
And then the third instruction creates a third layer which with the python packages followed by the
fourth layer that copies the source code over.
And then finally the fifth layer that updates the entry point of the image since each layer only stores
the changes from the previous layer.
It is reflected in the size as well.
If you look at the base one to image it is around and 120 megabytes in size.
The AAPT packages that are installed is around 300 M B and then the remaining layers are small to understand
the advantages of this layered architecture.
Let's consider a second application this application has a different darker file but is very similar
to our first application as in it uses the same base image as a one to use as the same python and flask
dependencies but uses a different source code to create a different application.
And so a different entry point as well.
When I run the docker build command to build a new image for this application since the first three
layers of both the applications are the same Docker is not going to build the first three layers.
Instead it reuses the same three layers it built for the first application from the cache and only creates
the last two layers with the new sources and the new entry point this way Docker builds images faster
and efficiently saves disk space.
This is also applicable if you were to update your application code whenever you update your application
code such as the abductee y in this case Docker simply reuses all the previous layers from cache and
quickly rebuilds the application image by updating the latest source code thus saving us a lot of time
during rebuilds and updates let's rearrange the layers bottom up so we can understand it better at the
bottom we have the base open to layer then the packages then the dependencies and then the source code
of the application and then the entry point all of these layers are created when we run the docker build
command to form the final Docker image so all of these are the Docker image layers.
Once the build is complete you cannot modify the contents of these layers and so they are read only
and you can only modify them by initiating a new build when you run a container based off of this image.
Using the docker run command Docker creates a container based off of these layers and creates a new
rateable layer on top of the image layer.
The rateable layer is used to store data created by the container such as log files by the applications.
Any temporary files generated by the container or just any file modified by the user on that container
the life of this layer though is only as long as the container is alive.
When the container is destroyed this layer and all of the changes stored in it are also destroyed.
Remember that the same image layer is shared by all containers created using this image if I were to
log into the newly created container and say create a new file called temp dot t t.
It would create that file in the container layer which is read and write.
We just said that the files in the image layer are read only meaning you cannot edit anything in those
layers.
Let's take an example of our application code since we bake our code into the image.
The code is part of the image layer and as such is read only after running a container.
What if I wish to modify the source code to say test a change.
Remember the same image layer may be shared between multiple containers created from this image.
So does it mean that I cannot modify this file inside the container.
Now I can still modify this file but before I save the modified file Docker automatically creates a
copy of the file in the read write layer and I will then be modifying a different version of the file
in the rewrite layer.
All future modifications will be done on this copy of the file in the rewrite layer.
This is called copy on write mechanism the image layer being read only just means that the files in
these layers will not be modified in the image itself so the image will remain the same all the time
until you rebuild the image using the docker build command.
What happens when we get rid of the container all of the data that was stored in the container layer
also gets deleted.
The change we made to the Abdul Pillai and the new ten file we created will also get removed.
So what if we wish to persist this data.
For example if we were working with a database and we would like to preserve the data created by the
container we could add a persistent volume to the container to do this first create a volume using the
docker volume create command.
So when we run the docker volume create data underscore volume command it creates a folder called data
underscore volume under the var lib Docker volumes directory.
Then when I run the docker container using the docker run command I could mount this volume inside the
docker containers rewrite layer using the dash the option like this.
So I would do a docker run Daschle then specify my newly created volume name followed by a colon and
the location inside my container which is the default location where miniscule stored data and that
is where lib my askew.
And then the image name my askew all this will create a new container and mount the data volume we created
into var lib.
My obscure folder inside the container so all data written by the database is in fact stored on the
volume created on the docker host.
Even if the container is destroyed the data is still active.
Now what if you didn't run the docker volume create command to create the volume before the docker run
command.
For example if I run the docker run command to create a new instance of my rescue container with the
volume data underscore volume 2 which I have not created yet Docker will automatically create a volume
named data underscore volume 2 and mount it to the container.
You should be able to see all these volumes if you list the contents of the var lib Docker volumes folder.
This is called volume mounting as we are mounting in volume created by Docker under the var lib Docker
volumes folder.
But what if we had our data already at another location for example let's say we have some external
storage on the docker host at or slash data and we would like to store database data on that volume
and not in the default where the docker volumes folder.
In that case we would run a container using the command Docker run Daschle.
But in this case we will provide the complete part to the folder we would like to mount.
That is what slash data for Slash minus Q Well and so it will create a container and mount the folder
to the container.
This is called bind mounting.
So there are two types of mounts a volume mounting and a bind mount volume mount mounts a volume from
the volumes directory and bind mount mounts a directory from any location on the docker host.
One final point to note before I let you go using the dash V is an old style the new way is to use dash
mount option the dash dash mount is the preferred way as it is more verbose.
So you have to specify each parameter in a key equals value format.
For example the previous command can be written with the dash mount option as this using the type source
and target options.
The type in this case is bind the source is the location on my host and target is the location on my
container
so who is responsible for doing all of these operations.
Maintaining the layered architecture.
Creating a viable layer moving files across layers to enable copy and write etc. It's the storage drivers.
So Dockery uses storage drivers to enable layered architecture.
Some of the common storage drivers are a user fast BTR affairs DFS device mapper overlay and overlay
to the selection of the storage driver.
Depends on the underlying OS being used for example with Ubuntu.
The default story is driver is a new offence whereas this store as driver is not available on other
operating systems like fedora or S.O.S.
In that case device mapper may be a better option Docker will choose the best stories driver available
automatically based on the operating system the different stories drivers also provide different performance
and stability characteristics so you may want to choose one that fits the needs of your application
and your organisation.
If you would like to read more on any of these stories drivers please refer to the links in the attached
documentation for now.
That is all from the docker architecture concepts.
See you in the next lecture.