When building containerized applications, logging is definitely one of the most important things to get right from a DevOps standpoint. Log management helps DevOps teams debug and troubleshoot issues faster, making it easier to identify patterns, spot bugs, and resolve them.
In this article, we’ll introduce how to generate logs from containers and how to explore and view logs in a central place.
Docker Logging: Why Are Logs Important When Using Docker
The importance of logging applies to a much larger extent to Dockerized applications. When an application in a Docker container emits logs, they are sent to the application’s stdout and stderroutput streams.
The container’s logging driver can access these streams and send the logs to a file, a log collector running on the host, or a log management service endpoint.
By default, Docker uses a JSON-file driver, which writes JSON-formatted logs to a container-specific file on the host where the container is running.
The example below shows JSON logs created using the JSON-file driver:
Before moving on, let’s go over the basics.
What Is a Docker Container
A container is a unit of software that packages an application, making it easy to deploy and manage no matter the host. Say goodbye to the infamous “it works on my machine” statement!
How? Containers are isolated and stateless, which enables them to behave the same regardless of the differences in infrastructure. A Docker container is a runtime instance of an image that’s like a template for creating the environment you want.
What Is a Docker Image
A Docker image is an executable package that includes everything that the application needs to run. This includes code, libraries, configuration files, and environment variables.
Why Do You Need Containers
Containers allow breaking down applications into microservices – multiple small parts of the app that can interact with each other via functional APIs. Each microservice is responsible for a single feature so development teams can work on different parts of the application at the same time. That makes building an application easier and faster.
How Is Docker Logging Different
Most conventional log analysis methods don’t work on containerized logging – troubleshooting becomes more complex compared to traditional hardware-centric apps that run on a single node and need less troubleshooting. You need more data to work with so you must extend your search to get to the root of the problem.
Here’s why:
Containers are Ephemeral
Docker containers emit logs to the stdout and stderr output streams. Because containers are stateless, the logs are stored on the Docker host in JSON files by default. Why?
The default logging driver is JSON-file. The logs are then annotated with the log origin, either stdout or stderr, and a timestamp. Each log file contains information about only one container.
You can find these JSON log files in the/var/lib/docker/containers/
directory on a Linux Docker host. Here’s how you can access them:
That’s where logging comes into play. You can collect the logs with a log aggregator and store them in a place where they’ll be available forever. It’s dangerous to keep logs on the Docker host because they can build up over time and eat into your disk space. That’s why you should use a central location for your logs and enable log rotation for your Docker containers.
Get Started with Docker Container Logs
When you’re using Docker, you work with two different types of logs: daemon logs and container logs.
What Are Docker Container Logs?
Docker container logs are generated by the Docker containers. They need to be collected directly from the containers. Any messages that a container sends to stdout or stderr is logged then passed on to a logging driver that forwards them to a remote destination of your choosing.
Here are a few basic Docker commands to help you get started with Docker logs and metrics:
- Show container logs:
docker logs containerName
- Show only new logs:
docker logs -f containerName
- Show CPU and memory usage:
docker stats
- Show CPU and memory usage for specific containers:
docker stats containerName1 containerName2
- Show running processes in a container:
docker top containerName
- Show Docker events:
docker events
- Show storage usage:
docker system df
Watching logs in the console is nice for development and debugging, however in production you want to store the logs in a central location for search, analysis, troubleshooting and alerting.
Filebeat for Elasticsearch provides a simplified solution to store the logs for search, analysis, troubleshooting and alerting.
What is Filebeat
Filebeat is a log shipper belonging to the Beats family — a group of lightweight shippers installed on hosts for shipping different kinds of data into the ELK Stack for analysis. Each beat is dedicated to shipping different types of information — Winlogbeat, for example, ships Windows event logs, Metricbeat ships host metrics, and so forth. Filebeat, as the name implies, ships log files.
In an ELK-based logging pipeline, Filebeat plays the role of the logging agent—installed on the machine generating the log files, tailing them, and forwarding the data to either Logstash for more advanced processing or directly into Elasticsearch for indexing. Filebeat is, therefore, not a replacement for Logstash, but can and should in most cases, be used in tandem.
Written in Go and based on the Lumberjack protocol, Filebeat was designed to have a low memory footprint, handle large bulks of data, support encryption, and deal efficiently with back pressure. For example, Filebeat records the last successful line indexed in the registry, so in case of network issues or interruptions in transmissions, Filebeat will remember where it left off when re-establishing a connection. If there is an ingestion issue with the output, Logstash or Elasticsearch, Filebeat will slow down the reading of files.
Installing Filebeat
You can download and install Filebeat using various methods and on a variety of platforms. It only requires that you have a running ELK Stack to be able to ship the data that Filebeat collects. I will outline two methods, using Apt and Docker, but you can refer to the official docs for more options.
Install Filebeat using Apt
For an easier way of updating to a newer version, and depending on your Linux distro, you can use Apt or Yum to install Filebeat from Elastic’s repositories:
First, you need to add Elastic’s signing key so that the downloaded package can be verified (skip this step if you’ve already installed packages from Elastic):
(use the PGP key D88E42B4, Elasticsearch Signing Key, with fingerprint
4609 5ACC 8548 582C 1A26 99A9 D27D 666C D88E 42B4
to sign all packages. It is available from https://pgp.mit.edu)
The next step is to add the repository definition to your system:
All what’s left to do is to update your repositories and install Filebeat:
Install Filebeat on Docker
If you’re running Docker, you can install Filebeat as a container on your host and configure it to collect container logs or log files from your host.
Pull Elastic’s Filebeat image with:
Logs from Standard Output
Filebeat with Docker
Filebeat Fetches & ships metrics from Docker container.
- Deployment one Filebeat per Docker host.
- The Docker logs host folder (
/var/lib/docker/containers
) is the one monitored by the Filebeat - Filebeat starts an input for the files and begins harvesting them as soon as they appear in the folder
- Everything happens before line filtering, multiline, and JSON decoding, so this input can be used in combination with those settings
Filebeat Container Input
Docker config example – docker.yml
filebeat.inputs:
- type: container
paths:
- '/var/lib/docker/containers/*/*.log'
Kubernetes
Kubernetes is a Production-grade, open-source platform container orchestrator. It Automates the distribution and scheduling of application containers across a cluster in a more efficient way. Kubernetes is a platform for containers that solves the problem of managing containers at scale. It can be self-healing as it handles containers and nodes failure.
Kubernetes Architecture
A Kubernetes cluster consists of Master and Nodes.Each node runs a container runtime (Docker or rkt). A node contains pod(s), which are scheduling units (and can contain one or more containers with shared namespaces and shared volumes). Major components of Kubernetes include:
- API Server: master component that exposes the Kubernetes API and is the central management entity;
- kubelet: ensures containers are running on each node.
Filebeat with Kubernetes
filebeat.inputs:
- type: container
stream: stdout
paths:
- '/var/log/containers/*.log'
Kubernetes Logs
Deploy Filebeat as a DaemonSet to ensure there’s a running instance on each node of the cluster. The Docker logs host folder (/var/lib/docker/containers
) is mounted on the Filebeat container. Filebeat starts an input for the files and begins harvesting them as soon as they appear in the folder . To download the manifest file, run:
Metadata Processors
Define processors in your configuration to process events before they are sent to the configured output for:
- reducing the number of exported fields
- enhancing events with additional metadata
- performing additional processing and decoding
Filebeat has processors for enhancing your data from the environment, like: add_docker_metadata, add_kubernetes_metadata and add_cloud_metadata
Docker Metadata Processors
add_docker_metadata processor annotates each event with relevant metadata from Docker containers
Example of metadata:
- docker.container.id
- docker.container.image
- docker.container.name
- docker.container.labels
Docker config example – docker.yml
- Need to provide access to Docker’s unix socket
- May also need to add –user=root to the docker run flags, if Filebeat is running as non-root 30 processors:
Kubernetes Metadata Processors
add_kubernetes_metadata processor annotates each event based on which Kubernetes pod the event originated from
Example of metadata:
- kubernetes.pod.name
- kubernetes.namespace
- kubernetes.labels
- kubernetes.annotations
- kubernetes.container.name
- kubernetes.container.image
Kubernetes config example
Filebeat Autodiscover
Filebeat Autodiscover will Watch events and react to change
Scan existing containers and launch the proper configs for them. Then it will watch for new start/stop events
To enable define the settings in the filebeat.autodiscover section of the filebeat.yml config file specifying a list of providers
Need to provide access to Docker’s unix socket
May also need to add --user=root
to the docker run flags, if Filebeat is running as non-root
Autodiscover Providers
Watch for events on the system and translating those events into internal autodiscover events with a common format
The providers support the features for:
- Docker
- Kubernetes
- Jolokia
Fields from the autodiscover event can be used to set conditions using templates
Autodiscover Providers Templates
Filebeat supports templates for inputs and modules. Templates define a condition to match on autodiscover events. A list of configurations to launch when this condition happens ‒ equals, contains, regexp, range, has_fields, or, and, not
Can contain variables under the data namespace. Ex: “$ {data.port}” resolves to 6379
Docker Autodiscover Provider
Example of available fields:
- host
- port
- docker.container.id
- docker.container.image
- docker.container.name
- docker.container.labels
Docker: Example Autodiscover Config
Kubernetes Autodiscover Provider
Example of available fields:
- host ‒ port (if exposed)
- kubernetes.container.id
- kubernetes.container.image
- kubernetes.container.name
- kubernetes.labels
- kubernetes.namespace
- kubernetes.node.name
- kubernetes.pod.name
- kubernetes.pod.uid
Kubernetes: Example Autodiscover Config
Configuration (Redis running under K8s)
Log Inside a Container
Logs from applications that don’t write to the standard output (stdout/stderr), instead the logs are located in files inside the containers. Containers by nature are transient (meaning that any files inside the container will be lost if the container shuts down). It is not recommended to read the logs from inside a container as the performance is worse
Reading Logs from Inside a Container
- Configure the logs to be written to the standard output (stdout and/or stderr)
- Mount a shared volume and make it available to the container and configure the logs to be written to it
- Or configure a workaround:
-
- use symbolic links to link to the standard output
- write to
/proc/self/fd/1
(which is stdout) and its errors to/proc/self/fd/2
- more information can be found at: https://docs.docker.com/ config/containers/logging/configure/
- With Kubernetes stream the logs to sidecar containers
Reading Logs from Volume
Use shared data volumes to log events, the log data persists and can be shared with other containers
- If the volume is local add the volume to the configuration of the filebeat.yml
- If using a volume, use a single Filebeat outside your container architecture to avoid duplicate data
Kubernetes – Streaming Sidecar Container
- Read logs from a file, a socket, or the journald
- Each Sidecar container tail a particular log file from a shared volume
- Redirect the logs to its own stdout stream
- Separate several log streams from different parts of your application
Solution Examples
Symbolic Link Example – Dockerfile
Docker: Reading Log from Volume – filebeat.yml
Configuration File for a Kubernetes Pod – Sidecar
Configuration File for a Kubernetes Pod – Sidecar
Exploring logs in Kibana
Once logs start flowing into Elasticsearch, you can start watching them from Kibana interface, let’s have a look to one of them. This is one of the event reported by Filebeat, corresponding to a new log line in a NGINX server running on our Docker scenario:
Thanks to add_docker_metadata we not only get the log output but a series of fields enriching it, with useful context from Docker, like the container name, ID, Docker image, and labels!
As an example, you may want to debug what’s going on in a specific container, you just need to filter your search results by your container name.
Conclusion
While Docker containerization allows developers to encapsulate a program and its file system into a single portable package, that certainly doesn’t mean containerization is free of maintenance. Docker Logging is a bit more complex than traditional methods, so teams using Docker must familiarize themselves with the Docker logging to support full-stack visibility, troubleshooting, performance improvements, root cause analysis, etc.
As we have seen in this post, to facilitate logging, configuring Filebeat to send logs from Docker to Elasticsearch is quite easy. The configuration can also be adapted to the needs of your own applications without requiring much effort. Filebeat is also a small footprint software that can be deployed painlessly no matter what your production environment may look like.
Read More…
https://skillfield.com.au/easier-security-detections-with-elasticsearch-machine-learning/