DataTipss | Data Engineering, DevOps & Cybersecurity Blog

How to scan vulnerabilities for Docker images

Vulnerability scanning for Docker

Today we use a lot of docker. It enables developers to package application into containers, A standardized

executable component combining application source code with OS libraries and dependencies required to

run code in any environment. We create the docker image and distribute it to others but how sure are we if that image is secure enough and doesn't have any vulnerability?

Suppose you have an image which has lot of vulnerabilities and that is being used in your production system. Then any hacker can find those weaknesses in your system and can exploit easily. So identifying the vulnerabilities in your image is very important part for the security of your system.

Vulnerability scanning

Vulnerability scanning is the process of identifying the security weakness and flaws in the system. This is an integral part of vulnerability management program which is to protect organizations from data breach.

Vulnerability scanning for docker local images allow teams to review the security state of the container images and take actions on fixing issues identified during scan.

Docker scan runs on Snyk engine. It is providing users the visibility into the security standards of their Dockerfiles and images. Users triggers vulnerability scans through CLI and use the CLI to view the results. The scan results contain the list of common vulnerabilities and exposures also called as CVEs.

I recommend upgrading to latest version to Docker scan tool.

Let's check the options available for docker scan using help command.

docker scan --help

docker scan --help

Usage: docker scan [OPTIONS] IMAGE

A tool to scan your images

Options:

--accept-license Accept using a third party scanning provider

--dependency-tree Show dependency tree with scan results

--exclude-base Exclude base image from vulnerability scanning (requires --file)

-f, --file string Dockerfile associated with image, provides more detailed results

--group-issues Aggregate duplicated vulnerabilities and group them to a single one (requires --json)

--json Output results in JSON format

--login Authenticate to the scan provider using an optional token (with --token), or web base token if empty

--reject-license Reject using a third party scanning provider

--severity string Only report vulnerabilities of provided level or higher (low|medium|high)

--token string Authentication token to login to the third party scanning provider

--version Display version of the scan plugin

Now you can see all the options available with docker scan. Let's check the version using below command.

docker scan --accept-license --version

So if you have version earlier that v0.11.0 then docker scan is not able to detect log4j-CVE-2021-44228.

You must update you docker desktop to 4.3.1 or higher.

How to scan

You can docker scan command just by passing the image name.

docker scan my-image

Above command will provide you a report on terminal about your scan.

Scan images during Development and Production

Creating an image from Dockerfile or rebuilding it can introduce new vulnerabilities in the system. So scanning the image during the development process should be a normal workflow. You can automate this process like:

image_building ==> docker scan image ==> Push to dockerhub/private registry

For Production system, whenever there is new vulnerability discovered, running the scan can always be a better idea to detect that vulnerability in your system. Periodically scanning of container should be a good choice.

Ending thoughts

Building secure images is continuous process. Consider all the best practices to build an efficient, scalable and secure images. Start with your base images and always remember to choose images from official and verified publisher. Because you don't know what's inside that image.

Note: If you think this helped you and you want to learn more stuff on devops, then I would recommend joining the Kodecloud devops course and go for the complete certification path by clicking this link

Running your first Pod on Kubernetes

What is Kubernetes

Kubernetes is an open source, cloud native infrastructure tool that automates scaling, deployment and management of containerized applications.

Kubernetes was originally developed by google and later was handed over to Cloud Native Computing Foundation(CNCF) for enhancement and maintenance. Kubernetes is the most popular and highly in demand orchestrator tool. Kubernetes is complex tool and a bit difficult to learn compare to swarm.

Here are few main architecture components of Kubernetes below:

Cluster

A collection of multiple nodes, typically at least one master node and several worker nodes(also known as minions)

Node

A physical or Virtual Machine(VM)

Control Plane

A component that schedule and deploys application instances across all nodes

Kubelete

An agent process running on nodes. It is responsible of managing the state of each nodes and it can perform several actions to maintain a desired state.

Pods

Pods are basic scheduling unit. Pods consist of one or more containers co-located on a host machine and share same resources.

How to run your first Pod on Kubernetes

Before you begin you need to have a Kubernetes cluster running on your system and kubectl must be configured on it. Kubectl is command line tool which will be communicating with your cluster.

The easiest way to start with it, is get the docker for desktop on windows/Mac. Once you have it you can start docker for desktop and go to settings and you can find Kubernetes label on it. Click it and it will install Kubernetes on your system.

Once done you can run below command to check if Kubernetes cluster is running.

kubectl cluster-info

This command will give you information about your Kubernetes cluster. Now since we checked that our cluster is up and running, we'll deploy our first Pod now.

To check running pods on system run below command:

kubectl get pods

No pods running currently so you'll see no information. To run a Pod execute below command:

kubectl run ng --image=nginx

Here ng is name of Pod I have given. you can give it any name. Now check if Pod is running?

kubectl get pods

NAME READY STATUS RESTARTS AGE

ng 1/1 Running 0 98s

So our first Pod is running.

A Pod can run more than one container in it. Behind the scene you are actually running a container with added abstraction layer which is called a Pod. But remember you can't have more than one container with same name inside a Pod.

You can add -o wide in you get Pod command to get more information about running Pods.

kubectl get pods -o wide

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES

So you get more info.

Note:

kubectl get pods will check running Pods in default Namespace. Kubernetes has a concept of Namespace. So you can have multiple namespaces. When you install Kubernetes so by default the are two namespaces.

Default
kube-system

kubectl get pods --all-namespaces -o wide

By running above command you can see all Pods running on all different namespaces.

What are some more flags/options in running a pod?

#Start a single instance of busybox and keep it in the foreground, don't restart it if it exits.

Command Below:

kubectl run -i --tty busybox --image=busybox --restart=Never

# Start a replicated instance of nginx.

Command Below:

kubectl run nginx --image=nginx --replicas=3

Sometimes you need to stop and start the Pod like you do in docker. You stop the container and you start the container. But in Kubernetes, it's not possible to stop the Pod and resume later. You can edit the Pod.yaml file and redeploy your changes. But you also can delete your Pod and easily recreate it.

kubectl delete pod ng

pod "ng" deleted

We have successfully deleted a Pod.

Thats how you can start you first Pod on Kubernetes. Kubernetes is most popular container orchestrator. You can run multiple Pods at scale and monitor them easily. Pods are very essential part of Kubernetes system. So Pods are used to control containers in an indirect manner in Kubernetes. This blog has covered basics of starting a Pod and deleting it.

Containers orchestration: Kubernetes vs Docker swarm

When deploying applications at scale, you need to plan all your architecture components with current and future strategies in mind. Container orchestration tools help achieve this by automating the management of application microservices across all clusters.

There are few major containers orchestration tools listed below:

Docker Swarm
Kubernetes
OpenShift
Hashicorp Nomad
Mesos

Today we'll talk about Docker Swarm and Kubernetes and we'll compare them in terms of features.

What is container orchestration

Container orchestration is a set of practices for managing the Docker Containers at large scale. As soon as containerized applications scale to large number of containers, then there is need of container management capabilities. Such as provisioning containers, scaling up and scaling down, manage networking, load balancing ,security and others.

Let's talk Kubernetes

Kubernetes is an open source, cloud native infrastructure tool that automates scaling, deployment and management of containerized applications.

Here are few main architecture components of Kubernetes below:

Cluster

A collection of multiple nodes, typically at least one master node and several worker nodes(also known as minions)

Node

A physical or Virtual Machine(VM)

Control Plane

A component that schedule and deploys application instances across all nodes

Kubelete

An agent process running on nodes. It is responsible of managing the state of each nodes and it can perform several actions to maintain a desired state.

Pods

Pods are basic scheduling unit. Pods consist of one or more containers co-located on a host machine and share same resources.

Deployments, Replicas and ReplicaSets

Docker Swarm

Docker swarm is native to Docker platform Docker was developed to maintain the application efficiency and availability in different runtime environments by deploying containerized application microservices across multiple clusters.

A mix of docker-compose, swarm, overlay network can be used to manage cluster of docker containers.

Docker swarm is still maturing in terms of functionalities when compare to other open source container orchestration tools.

Here are few main architecture components of Docker swarm below:

Swarm

A collection of nodes that include at-least one manager and several worker nodes.

Service

A task that agent nodes or managers are required to perform on the swarm.

Manager node

A node tasked with delivering work. It manages and distributes the task among worker nodes.

Worker node

A node responsible for running tasks distributed by the swarm's manager node.

Tasks

Set of commands

Choosing the right Orchestrator for your containers

Kubernetes focuses on open-source and modular orchestration, offering an efficient container orchestration solution for high demand applications with complex configuration.

Docker swarm emphasises ease of use, making it most suitable for simple applications that are quick to deploy and easy to manage.

Some fundamental differences between both

GUI:

Kubernetes features an easy web user interface(dashboards) that helps you

Deploy containerized application on cluster
Manage cluster resources
View an error log, deployments, jobs

Unlike Kubernetes, Docker swarm does not come with Web UI to deploy applications and orchestrate containers. But there are some third party tools which can achieve this with Docker.

Availability:

Kubernetes ensure high availability by creating clusters to eliminate ingle point of failures. You can use Stacked Control Plane nodes that ensure availability by co-locating etcd objects with all available nodes of a cluster during failover. Or you can use external etcd objects for load balancing while controlling the control plane nodes separately.

For Docker to maintain high-availability, Docker uses service replication at swarm nodes level. A swarm manager deploys multiple instances of the same container with replicas of services in each.

Scalability:

Kubernetes supports autoscaling on both cluster level and pod level. Whereas Docker Swarm deploys containers quickly. This gives the orchestration tool faster reaction times that allow for on-demand scaling.

Monitoring:

Kubernetes offers multiple native logging and monitoring solutions for deployed services within a cluster. Also Kubernetes supports third-party integration to help with event-based monitoring.

On the other side Docker Swarm doesn't offer monitoring solution like Kubernetes. As a result you need to rely on third party applications to support monitoring. So monitoring a Docker Swarm is considered to e more complex than Kubernetes.

How and why container monitoring is so important

What is container monitoring?

Containers are ephemeral in nature, they are difficult to monitor compared to bare metal server based applications or even those running on virtualized server. Monitoring is critical to ensure avalability, performance and security of containers. So containers infrastructure requires new monitoring tools and strategies.

Container observability

Visibility and monitoring are essential a running environment and to optimize resource usage and costs.

Because each container image can have a large number of running instances and due to high pace at which new images and versions are introduced, problems can be easily spread across containers and applications and can interrupt the entire architecture. So this makes it very critical to identify the root cause of a problem as soon as it occurs.

In large scale containerized environments, this is only possible through dedicated cloud native monitoring tools.

But if you are unable to achieve observability so this can result in below:

It is very difficult for developers and operations task to understand what is running and how it is performing. So without observability it is very difficult to troubleshoot the problem and meeting the SLA for a production system.
Scalability is also the major challenge to achieve without observability. Scaling your application on demand can enhance your user's experience. But if scalability is too slow it can make it poor.

Challenges with container monitoring

There are few challenges in container monitoring:

Containers are ephemeral so provisioning and destroying a container very quick process. This is one of the biggest advantage but for complex and big production system it makes very difficult to identify the issue.
Containers share resources. These consume resources from host machine. If there is no monitoring of resources on host machine then any point of time high CPU or memory spike can scare you and can lead your production running application to stop.

Then how can we monitor containers

You can always use alerting system to monitor your containers. Setting up alert across the delivery pipeline can prevent the risk of system failure at early stage.

What are the common features in monitoring tools

Real time monitoring
Performance baseline
Anomaly detection
Network Performance monitoring
Config monitoring
Dashboards
API monitoring
Alerting
Automation

Here are famous container monitoring tools used by modern industries

Prometheus

Prometheus is open-source systems monitoring and alerting toolkit and it was originally built at SoundCloud. Prometheus collects and stores it's metrics s time series data ie. metrics information was stored with the timestamp at which it was recorded alongside optional key value pairs called labels.

features:

A multi-dimensional data model with time series data identified by metric name and key/value pairs
PromQL is a flexible query language to query the dimensionality
Multiple modes of graphing and dashboard support

Grafana

With Grafana you can visualise, analyse and alert on your system. No matter where your data is stored you can create dashboards and monitor. your data source can be anything like postgres, mysql, redis etc.

Apart from above two there are few more popular tools like ElasticsSearch and Kibana, Zabbix, datadog etc.

How to run PostgreSQL on Docker

Postgres on Docker

Postgres is most advanced object relational database management system(ORDBMS). Postgres implements majority of SQL:2011 standard. It's ACID compliant and It avoids locking issues using multiversion concurrency control. So today we are going to run Postgres on Docker.

To start with Postgres we first need to pull the image from DockerHub. DockerHub is image repository for all images. Let's run the below command and pull the image:

docker pull postgres

Using default tag: latest

latest: Pulling from library/postgres

a9eb63951c1c: Pull complete

b192c7f382df: Pull complete

e7ce3f587986: Pull complete

4098744a1414: Pull complete

4c98d6f3399d: Pull complete

65e57fefc38a: Pull complete

d61d9528cfd5: Pull complete

de6b20f44659: Pull complete

25db13ff0bef: Pull complete

7f74f4b0e936: Pull complete

144c847b11fb: Pull complete

cf0afd1be009: Pull complete

fe0c14991327: Pull complete

Now let's check that we have downloaded the image.

docker images

REPOSITORY TAG IMAGE ID CREATED SIZE

postgres latest 83ce63c594ee 5 days ago 355MB

Let's run the image and start a container.

docker run --name test -e POSTGRES_PASSWORD=Test@123 -d postgres

Just run the docker ps command to check if container is running

docker ps

CONTAINER ID IMAGE COMMAND CREATED STATUS

83ec4a222 postgres "docker-entrypoint.s…" 2 minutes ago Up

Let's enter in bash shell of container by running below command

docker exec -it 83ec4a222 bash

root@83ec4a222:/#

Connect to Postgres now:

psql -h localhost -p 5432 -U postgres -w

psql (14.0 (Debian 14.0-1.pgdg110+1))

Type "help" for help.

You are connected to Postgres now. Lets' create some tables and execute some queries.

postgres=# \l

List of databases

-----------+----------+----------+------------+------------+-----------------------

(3 rows)

postgres=#

Let's check the current database name by running below command.

postgres=# select current_database();

current_database

------------------

postgres

(1 row)

So current database is Postgres. We'll check now how many databases are there on the system.

postgres=# select datname from pg_catalog.pg_database;

datname

-----------

postgres

template1

template0

(3 rows)

There are total 3 databases on system.

You can check all tables on a database by querying information schema.

postgres=# select table_name from information_schema.tables limit 10;

table_name

-----------------------

pg_statistic

pg_type

pg_foreign_table

pg_authid

pg_shadow

pg_statistic_ext_data

pg_roles

pg_settings

pg_file_settings

pg_hba_file_rules

(10 rows)

We can do a lot more than this on Postgres this was just a small part about Postgres. We can get all information about all tables and databases just by using information schema. Docker can be very useful in this case when we don't want to install it on system and want to run Postgres inside container and can leverage the power of Docker.

How to dockerize your python application in docker

Dockerize your python application:

Docker is a technology which lets you build, deploy and run your applications. Docker enables you separate your infrastructure from your application. With Docker all you need to do is just write your code,.

dockerize it and distribute it in form of image. That way any one can use your application who is running the Docker.

What do you mean by Dockerize application?

Dockerize mean you write your code on your system then you prepare the image and distribute it over the internet or on DockerHub. You don't have to worry about the underlying infrastructure and dependencies.

Let's write a python program which will count the occurrence of words from a given string.

#Input :  string = "Docker is a technology which 
# lets you build, deploy and run your applications.";
#Count occurence of words from a given string example 


def findFreq(s):
    dictt = {}
    strng = s.split(" ")
    strr1 = set(strng)
    for word in strr1:
        dictt[word] = s.count(word)
    return dictt  
if __name__ == "__main__":
    x = input("Enter your string:")
    #raw_input in python 2.x and input() in python 3.x
    print(findFreq(x))
       

#Output: {'a': 4, 'and': 1, 'run': 1, '': 80, 
# 'deploy': 1, 'technology': 1, 'is': 1, 
# 'you': 2, 'lets': 1, 'applications.': 1, 
# 'which': 1, 'build,': 1, 'Docker': 1, 'your': 1}

Save this file with findfrequency.py in same directory. I am saving it in current directory for my convenience but you can save it anywhere and pass the absolute path.

Now lets create a Dockerfile.

FROM python:3

We need to use python in docker so we are using FROM keyword so this will create layer from python image. Means your image is based on python image.

Now we need to run our python file so we need to add this file to Dockerfile.

ADD findfrequency.py /

Use CMD to execute commands when image loads

CMD ["python", "./findfrequency.py"]

Combine all above lines and create a Dockerfile.

FROM python:3
ADD findfrequency.py /
CMD ["python", "./findfrequency.py"]

So we have created a Dockerfile now. I saved it with the name "Dockerfile" in current directory. When you run docker build . command then docker looks for Dockerfile if ithis file doesn't exist or file name is wrong or extension is wrong you'll get file not exists error.

Now we are ready to build image from the dockerfile.

Open the terminal and run the below command and make sure you are in the same directory where you saved your Dockerfile as well as python file.

docker build -t myapp .

-t : This is tagging a name to your image. In this case I gave my image a name "myapp"

.(dot) : Is current directory

Ok so you have successfully build your image. Now Let's check what's inside the image by inspecting it.

docker inspect myapp

[

{

"Id": "sha256:c4595feabbd0b9aba4ae67037ea3c43a8c0aaf2abe6f6fd28d25b22a7cf9",

"RepoTags": [

"myapp:latest"

"RepoDigests": [],

"Parent": "",

"Comment": "buildkit.dockerfile.v0",

"Created": "2021-10-01T08:42:53.450488763Z",

"Container": "",

"ContainerConfig": {

"Hostname": "",

"Domainname": "",

"User": "",

"AttachStdin": false,

"AttachStdout": false,

"AttachStderr": false,

"Tty": false,

"OpenStdin": false,

"StdinOnce": false,

"Env": null,

"Cmd": null,

"Image": "",

"Volumes": null,

"WorkingDir": "",

"Entrypoint": null,

"OnBuild": null,

"Labels": null

"DockerVersion": "",

"Author": "",

"Config": {

"Hostname": "",

"Domainname": "",

"User": "",

"AttachStdin": false,

"AttachStdout": false,

"AttachStderr": false,

"Tty": false,

"OpenStdin": false,

"StdinOnce": false,

"Env": [

"LANG=C.UTF-8",

"PYTHON_VERSION=3.9.7",

"PYTHON_PIP_VERSION=21.2.4",

"PYTHON_SETUPTOOLS_VERSION=57.5.0",

"PYTHON_GET_PIP_SHA256=fa6f3fb93cce234cd4e8dd2be9c247653b52855a48dd44e6b21ff28b"

"Cmd": [

"python",

"./findfrequency.py"

You'll see output something like above. Our python function is there inside the output under CMD tag.

Let's run the image.

docker run -it myapp

Enter your string: This is my test to test dockerfile.

{'': 37, 'is': 2, 'dockerfile.': 1, 'to': 1, 'my': 1, 'test': 2, 'This': 1}

See the output above and pass the desired string to count the words.

So we have successfully dockerized our application. You can send this image to others so that they can use your program and they don't have to worry about installing any dependencies which can cause your program to crash.

How to scan vulnerabilities for Docker images

Vulnerability scanning for Docker

Vulnerability scanning

How to scan

Scan images during Development and Production

Ending thoughts

Running your first Pod on Kubernetes

What is Kubernetes

Cluster

Node

Control Plane

Kubelete

Pods

How to run your first Pod on Kubernetes

You can add -o wide in you get Pod command to get more information about running Pods.

So you get more info.

Note:

kubectl get pods will check running Pods in default Namespace. Kubernetes has a concept of Namespace. So you can have multiple namespaces. When you install Kubernetes so by default the are two namespaces.

By running above command you can see all Pods running on all different namespaces.

What are some more flags/options in running a pod?

Containers orchestration: Kubernetes vs Docker swarm

What is container orchestration

Let's talk Kubernetes

Cluster

Node

Control Plane

Kubelete

Pods

Deployments, Replicas and ReplicaSets

Docker Swarm

Swarm

Service

Manager node

Worker node

Tasks

Choosing the right Orchestrator for your containers

Some fundamental differences between both

GUI:

Availability:

Scalability:

Monitoring:

How and why container monitoring is so important

What is container monitoring?

Container observability

But if you are unable to achieve observability so this can result in below:

Challenges with container monitoring

Then how can we monitor containers

What are the common features in monitoring tools

Here are famous container monitoring tools used by modern industries

Prometheus

features:

Grafana

How to run PostgreSQL on Docker

Postgres on Docker

How to dockerize your python application in docker

Dockerize your python application:

What do you mean by Dockerize application?

Let's write a python program which will count the occurrence of words from a given string.

Now lets create a Dockerfile.

Terraform for Data Engineers: How to Automate Your Database Setup