Skip to main content

Command Palette

Search for a command to run...

Prometheus: Open-Source System Monitoring and Observability Platform

Updated
7 min read
Prometheus: Open-Source System Monitoring and Observability Platform

In this blog we are going to discuss Prometheus. It is an open-source observability tool for applications.

As system architecture continues to get more and more complex, new challenges arise as tracking down issues become far more challenging.

There's a greater need for observability as we move towards distributed systems and microservices-based applications.

When it comes to troubleshooting issues, we need more information than just what is wrong.

We need to know why our application entered a specific state, what component is responsible and how we can avoid it in the future.

  • Why are error rates rising
  • Why is there high latency
  • Why are services timing out

Observability gives you the flexibility to understand unpredictable events.

How do we accomplish observability?

  • Logging
  • Metrics
  • Tracing

Logging

Logs are records of events that have occurred and encapsulate information about the specific event.

Logs are comprised of:

  • Timestamp of when the log occurred
  • Message containing information

Logs are the most common form of observation produced by systems.

However, they can be difficult to use due to the verbosity of the logs outputted by the applications.

Logs of processes are likely to be interwoven with other concurrent processes spread across multiple systems.


Traces

Traces allow you to follow operations as they traverse through various systems and services. So, we can follow an individual request and see it flow through our system hop by hop.

Traces help us connect the dots on how processes and services work together.

Each trace has a trace-id that can be used to identify a request as it traverses the system.

Individual events forming a trace are called spans.

Each span tracks the following:

  • Start time
  • Duration
  • Parent-Id

Metrics

Metrics provide information about the state of a system using numerical values:

  • CPU Load
  • Number of open files
  • HTTP response times
  • Number of errors

The data collected can be aggregated over time and graphed using visualization tools to identify trends over time.

Prometheus is primarily written in Go.


Prometheus Architecture

Prometheus Node exporters on each worker node capture the metrics and the Prometheus hosted server runs a retrieval process to pull those metrics from exporters. So, Prometheus follows a pull mechanism.

Whereas there were cases some short-lived jobs present and it being collected but the retrieval not able to pull them instantly. For this reason Node exporters push those short-lived metrics to Push Gateway. Then the retrieval collects from the Push Gateway.

The target groups need to be known by Prometheus to know where to retrieve the metrics. For this we use Service discovery. There is also some case we can configure in the configuration file are the components as targets by in Auto-Scaling Group/Kubernetes like we need dynamic mechanism.

Alerting we can setup some threshold to trigger the alerts to send mails/messages to slack.

For visualization we follow either Prometheus webpage or Grafana.


Node Exporters

Node exporters are agents that run on target machines and expose system metrics in a format that Prometheus can scrape. They collect various system-level metrics such as:

  • CPU usage
  • Memory consumption
  • Disk I/O
  • Network statistics
  • File system metrics

Node exporters make it easy to monitor infrastructure components without modifying application code.


Installing Prometheus

Let's install Prometheus from the official docs:

wget https://github.com/prometheus/prometheus/releases/download/v3.2.1/prometheus-3.2.1.linux-amd64.tar.gz
  • prometheus -> Application Executable
  • prometheus.yml -> Configuration File
  • promtool -> CMD Utility
tar -xvf <tar-file>
cd <untared-directory>
./prometheus

Now open in the browser http://localhost:9090. It also monitors its hosted server too like cpu/mem/disk space. Type UP in search bar of prometheus.


Prometheus SystemD Unit

Running every time on terminal is a very poor approach. Let's deploy it in the Linux machine as Systemd daemon. Here we are creating a prometheus user for systemd service. It doesn't create the home directory and shell for it.

sudo useradd --no-create-home --shell /bin/false prometheus

Create the prometheus directory under etc directory to store the executables. In the /var/lib/prometheus directory we store all the collected metric data. Now we change the ownership for the directories created.

sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
sudo cp -r data /var/lib/prometheus/
sudo chown -R prometheus:prometheus /var/lib/prometheus

Run the prometheus server on command line:

sudo -u prometheus /usr/local/bin/prometheus \
  --config.file /etc/prometheus/prometheus.yml \
  --storage.tsdb.path /var/lib/prometheus \
  --web.console.libraries=/etc/prometheus/prometheus.yml

Create file sudo vim /etc/systemd/system/prometheus.service with the content as for systemd daemon process setup for prometheus:

[Unit]
Description=Prometheus
Wants=network-online.target
# startup after network is up
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
# normally we want to start the server used to on cmdline
ExecStart=/usr/local/bin/prometheus \
  --config.file /etc/prometheus/prometheus.yml \
  --storage.tsdb.path /var/lib/prometheus \
  --web.console.libraries=/etc/prometheus/prometheus.yml

[Install]
# Start the service as part of normal system start-up, whether or not
# local GUI is active.
WantedBy=multi-user.target
# reload systemd daemon
sudo systemctl daemon-reload
# start the daemon
sudo systemctl start prometheus
# enable to start on system boot
sudo systemctl enable prometheus
# Check status
sudo systemctl status prometheus

Node Exporter

Download the binary and run in command line:

wget https://github.com/prometheus/node_exporter/releases/download/v1.9.0/node_exporter-1.9.0.linux-amd64.tar.gz
tar -xvf node_exporter-1.9.0.linux-amd64.tar.gz
cd node_exporter-1.9.0.linux-amd64/
./node_exporter

Deploying in systemd process:

# copy the binary
sudo cp node_exporter /usr/local/bin
# create node_exporter user
sudo useradd --no-create-home --shell /bin/false node_exporter
# change the ownership of binary executable copied to node_exporter
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter

Create systemd node_exporter service file /etc/systemd/system/node_exporter.service:

[Unit]
Description="Node Exporter"
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter \
  --web.config.file=/etc/node_exporter/config.yaml \
  --web.listen-address=:9100

[Install]
WantedBy=multi-user.target

Now start and enable the process:

sudo systemctl enable node_exporter
sudo systemctl start node_exporter
sudo systemctl status node_exporter

Self-Signed Certificates Generation

Let's create self signed certificates for the localhost running node_exporter process:

sudo openssl req -new -newkey rsa:2048 -days 365 -nodes -x509 \
  -keyout node_exporter.key \
  -out node_exporter.crt \
  -subj "/C=US/ST=California/L=Oakland/O=MyOrg/CN=localhost" \
  -addext "subjectAltName = DNS:localhost"

Update the file config.yaml in /etc/node_exporter/config.yaml file:

tls_server_config:
  cert_file: /etc/node_exporter/node_exporter.crt
  key_file: /etc/node_exporter/node_exporter.key

Reload the daemon processes and restart the node_exporter service. Now wait for some time 5 to 10 mins to load all the things as self-signed certs loading takes time:

sudo systemctl daemon-reload
sudo systemctl restart node_exporter
sudo systemctl status node_exporter

Now copy the cert file from node exporter to the prometheus server:

rsync -aurvz /etc/node_exporter/node_exporter.crt alex@worker:/etc/prometheus/

In the /etc/prometheus/prometheus.yml file update the tls cert copied from node_exporter:

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    scheme: https
    tls_config:
      ca_file: /etc/prometheus/node_exporter.crt
      # only need for self-signed certificates
      insecure_skip_verify: true

To generate the hash for the passwords to authenticate, install apache2-utils:

sudo apt install apache2-utils

Create a password from this and it gives the hash:

htpasswd -nBC 12 "" | tr -d ":\n"

In node exporter update the /etc/node_exporter/config.yaml file:

basic_auth_users:
  prometheus: <hash-code-generated-above>

Now access the prometheus page and check the targets. The host server was down.

Now update the /etc/prometheus/prometheus.yml file username and password in plain text:

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    scheme: https
    basic_auth:
      username: <user-name>
      password: <password>

The Prometheus server running on http but node_exporter server running on https. Let's change the prometheus server to run in https. Update the prometheus daemon service /etc/systemd/system/prometheus.service add the web-config.yml file:

ExecStart=/usr/local/bin/prometheus \
  --config.file /etc/prometheus/prometheus.yml \
  --storage.tsdb.path /var/lib/prometheus \
  --web.console.libraries=/etc/prometheus/prometheus.yml \
  --web.config.file /etc/prometheus/web-config.yml

In the /etc/prometheus/web-config.yml file update the tls certs:

tls_server_config:
  cert_file: /etc/prometheus/example.com.crt
  key_file: /etc/prometheus/example.com.key

Try to restart the prometheus service:

sudo systemctl daemon-reload
sudo systemctl restart prometheus
sudo systemctl status prometheus

Let's check the target health.


Prometheus Metrics

Prometheus collects metrics in a time-series format. Each metric has:

  • Metric name: Identifies what is being measured
  • Labels: Key-value pairs that provide additional context
  • Timestamp: When the metric was collected
  • Value: The actual measurement

Common metric types include:

  • Counter: A cumulative metric that only increases
  • Gauge: A metric that can go up or down
  • Histogram: Samples observations and counts them in configurable buckets
  • Summary: Similar to histogram but calculates quantiles

Metrics are exposed via HTTP endpoints, typically at /metrics, in a text-based format that Prometheus can scrape.


Summary

Prometheus is a powerful open-source observability tool that helps monitor and alert on system metrics. Key points to remember:

  • Observability consists of logging, metrics, and tracing.
  • Prometheus uses a pull-based model to collect metrics from exporters.
  • Node exporters expose system-level metrics for infrastructure monitoring.
  • Service discovery helps Prometheus dynamically find targets to scrape.
  • Alerting can be configured to notify when thresholds are exceeded.
  • Visualization is typically done through Prometheus's web UI or Grafana.

Prometheus provides a robust foundation for monitoring distributed systems and microservices, enabling teams to understand system behavior and troubleshoot issues effectively.

More from this blog

Go & DevOps Blog

24 posts

Backend Developer | Python | Go | gRPC | Kubernetes | Ansible | IaC