Kubernetes Monitoring¶
Overview¶
This guide covers monitoring strategies in Kubernetes, including metrics collection, logging, tracing, and best practices for monitoring cluster and application health.
Prerequisites¶
- Basic understanding of Kubernetes concepts
- Knowledge of monitoring concepts
- Familiarity with Prometheus and Grafana
- Understanding of logging systems
Learning Objectives¶
- Understand monitoring concepts
- Learn metrics collection
- Master logging strategies
- Implement tracing
- Configure alerts
Table of Contents¶
Metrics Collection¶
Prometheus Configuration¶
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
spec:
serviceAccountName: prometheus
serviceMonitorSelector:
matchLabels:
team: frontend
resources:
requests:
memory: 400Mi
enableAdminAPI: false
Service Monitor¶
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: app-monitor
labels:
team: frontend
spec:
selector:
matchLabels:
app: web
endpoints:
- port: metrics
interval: 15s
Metrics Endpoint¶
apiVersion: v1
kind: Service
metadata:
name: app-service
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '9090'
spec:
selector:
app: web
ports:
- name: metrics
port: 9090
targetPort: metrics
Logging¶
Fluentd Configuration¶
apiVersion: v1
kind: ConfigMap
metadata:
name: fluentd-config
data:
fluent.conf: |
<source>
@type tail
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
read_from_head true
<parse>
@type json
time_key time
time_format %Y-%m-%dT%H:%M:%S.%NZ
</parse>
</source>
<match kubernetes.**>
@type elasticsearch
host elasticsearch-logging
port 9200
logstash_format true
logstash_prefix k8s
</match>
Elasticsearch Configuration¶
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch
spec:
serviceName: elasticsearch
replicas: 3
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
containers:
- name: elasticsearch
image: elasticsearch:7.9.3
env:
- name: discovery.type
value: single-node
ports:
- containerPort: 9200
name: http
- containerPort: 9300
name: transport
Distributed Tracing¶
Jaeger Configuration¶
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: jaeger
spec:
strategy: production
storage:
type: elasticsearch
options:
es:
server-urls: http://elasticsearch:9200
ingress:
enabled: true
OpenTelemetry Configuration¶
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: otel
spec:
config: |
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
processors:
batch:
exporters:
jaeger:
endpoint: jaeger-collector:14250
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [jaeger]
Alerting¶
Alert Manager Configuration¶
apiVersion: monitoring.coreos.com/v1
kind: AlertmanagerConfig
metadata:
name: alert-config
spec:
route:
receiver: 'slack'
group_by: ['alertname', 'cluster', 'service']
receivers:
- name: 'slack'
slack_configs:
- api_url: 'https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX'
channel: '#alerts'
send_resolved: true
Alert Rules¶
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: prometheus-rules
spec:
groups:
- name: node
rules:
- alert: HighCPUUsage
expr: node_cpu_usage_percentage > 80
for: 5m
labels:
severity: warning
annotations:
description: CPU usage is above 80% for 5 minutes
Monitoring Tools¶
Grafana Dashboard¶
apiVersion: integreatly.org/v1alpha1
kind: GrafanaDashboard
metadata:
name: kubernetes-dashboard
spec:
json: |
{
"dashboard": {
"id": null,
"title": "Kubernetes Cluster Monitoring",
"panels": [
{
"title": "CPU Usage",
"type": "graph",
"datasource": "Prometheus",
"targets": [
{
"expr": "sum(rate(container_cpu_usage_seconds_total[5m])) by (pod)"
}
]
}
]
}
}
Prometheus Operator¶
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-operator
spec:
replicas: 1
selector:
matchLabels:
app: prometheus-operator
template:
metadata:
labels:
app: prometheus-operator
spec:
containers:
- name: prometheus-operator
image: quay.io/prometheus-operator/prometheus-operator:v0.50.0
args:
- --kubelet-service=kube-system/kubelet
- --config-reloader-image=jimmidyson/configmap-reload:v0.5.0
Best Practices¶
- Implement comprehensive monitoring
- Configure proper log retention
- Set up distributed tracing
- Implement meaningful alerts
- Use appropriate monitoring tools
- Monitor cluster health
- Implement proper dashboards
Common Pitfalls¶
- Insufficient monitoring coverage
- Poor log management
- Missing alerts
- Inadequate resource monitoring
- Poor visualization
- Alert fatigue
Implementation Examples¶
Complete Monitoring Stack¶
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
spec:
serviceAccountName: prometheus
serviceMonitorSelector:
matchLabels:
team: frontend
resources:
requests:
memory: 400Mi
alerting:
alertmanagers:
- name: alertmanager-main
namespace: monitoring
port: web
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: app-monitor
spec:
selector:
matchLabels:
app: web
endpoints:
- port: metrics
interval: 15s
---
apiVersion: monitoring.coreos.com/v1
kind: AlertmanagerConfig
metadata:
name: alert-config
spec:
route:
receiver: 'slack'
group_by: ['alertname', 'cluster', 'service']
receivers:
- name: 'slack'
slack_configs:
- api_url: 'https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX'
channel: '#alerts'
Logging Configuration¶
apiVersion: v1
kind: ConfigMap
metadata:
name: fluentd-config
data:
fluent.conf: |
<source>
@type tail
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
read_from_head true
<parse>
@type json
</parse>
</source>
<match kubernetes.**>
@type elasticsearch
host elasticsearch-logging
port 9200
logstash_format true
</match>
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd
spec:
selector:
matchLabels:
name: fluentd
template:
metadata:
labels:
name: fluentd
spec:
containers:
- name: fluentd
image: fluent/fluentd-kubernetes-daemonset:v1.12.0-debian-elasticsearch7-1.1
volumeMounts:
- name: varlog
mountPath: /var/log
- name: config
mountPath: /fluentd/etc
volumes:
- name: varlog
hostPath:
path: /var/log
- name: config
configMap:
name: fluentd-config
Resources for Further Learning¶
Practice Exercises¶
- Set up Prometheus monitoring
- Configure logging with ELK stack
- Implement distributed tracing
- Create Grafana dashboards
- Configure alerting rules