Microservices Monitoring¶
Overview¶
This guide covers monitoring strategies, tools, and implementation approaches for microservices architecture, including metrics collection, logging, tracing, and alerting.
Prerequisites¶
- Basic understanding of microservices architecture
- Knowledge of Spring Boot Actuator
- Familiarity with monitoring tools
- Understanding of distributed systems
Learning Objectives¶
- Understand microservices monitoring patterns
- Learn metrics collection and visualization
- Master distributed tracing
- Implement centralized logging
- Set up effective alerting
Table of Contents¶
Metrics Collection¶
Spring Boot Actuator Configuration¶
management:
endpoints:
web:
exposure:
include: health,metrics,prometheus
endpoint:
health:
show-details: always
metrics:
tags:
application: ${spring.application.name}
Prometheus Configuration¶
scrape_configs:
- job_name: 'spring-actuator'
metrics_path: '/actuator/prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:8080']
Custom Metrics¶
@Component
public class CustomMetricsService {
private final MeterRegistry registry;
public CustomMetricsService(MeterRegistry registry) {
this.registry = registry;
}
public void recordOrderProcessingTime(long timeInMs) {
registry.timer("order.processing.time")
.record(timeInMs, TimeUnit.MILLISECONDS);
}
public void incrementOrderCounter() {
registry.counter("order.processed").increment();
}
}
Distributed Tracing¶
Sleuth Configuration¶
spring:
sleuth:
sampler:
probability: 1.0
zipkin:
base-url: http://localhost:9411
Trace Implementation¶
@Service
public class OrderService {
private static final Logger log = LoggerFactory.getLogger(OrderService.class);
@Autowired
private Tracer tracer;
public Order processOrder(OrderRequest request) {
Span span = tracer.currentSpan();
span.tag("orderId", request.getOrderId());
log.info("Processing order: {}", request.getOrderId());
// Process order
return order;
}
}
Centralized Logging¶
Logback Configuration¶
<configuration>
<appender name="ELK" class="net.logstash.logback.appender.LogstashTcpSocketAppender">
<destination>localhost:5000</destination>
<encoder class="net.logstash.logback.encoder.LogstashEncoder">
<customFields>{"app":"${springApplicationName}"}</customFields>
</encoder>
</appender>
<root level="INFO">
<appender-ref ref="ELK" />
</root>
</configuration>
Structured Logging¶
@Slf4j
@Service
public class PaymentService {
public void processPayment(Payment payment) {
log.info("Processing payment: {}", payment.getId(),
kv("paymentId", payment.getId()),
kv("amount", payment.getAmount()),
kv("status", payment.getStatus()));
}
}
Health Checks¶
Custom Health Indicator¶
@Component
public class DatabaseHealthIndicator implements HealthIndicator {
private final DataSource dataSource;
@Override
public Health health() {
try (Connection conn = dataSource.getConnection()) {
PreparedStatement ps = conn.prepareStatement("SELECT 1");
ps.executeQuery();
return Health.up()
.withDetail("database", "PostgreSQL")
.withDetail("status", "Connected")
.build();
} catch (SQLException ex) {
return Health.down()
.withDetail("error", ex.getMessage())
.build();
}
}
}
Composite Health Check¶
@Configuration
public class HealthCheckConfig {
@Bean
public CompositeHealthContributor healthContributor(
DatabaseHealthIndicator dbHealth,
CacheHealthIndicator cacheHealth) {
Map<String, HealthIndicator> indicators = new HashMap<>();
indicators.put("database", dbHealth);
indicators.put("cache", cacheHealth);
return CompositeHealthContributor.fromMap(indicators);
}
}
Alerting¶
Alert Configuration¶
alerting:
rules:
- alert: HighErrorRate
expr: rate(http_server_requests_seconds_count{status="5xx"}[5m]) > 0.1
for: 5m
labels:
severity: critical
annotations:
summary: High error rate detected
description: "Service {{ $labels.service }} has high error rate"
Alert Manager Configuration¶
route:
group_by: ['alertname', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: 'team-emails'
receivers:
- name: 'team-emails'
email_configs:
- to: 'team@example.com'
Best Practices¶
- Implement comprehensive metrics collection
- Use distributed tracing for request flows
- Centralize logs with proper context
- Implement meaningful health checks
- Set up proper alerting thresholds
- Monitor system resources
- Implement proper dashboard visualization
Common Pitfalls¶
- Insufficient monitoring coverage
- Poor log aggregation
- Missing important metrics
- Inadequate alerting
- Resource-heavy monitoring
- Poor visualization
Implementation Examples¶
Complete Monitoring Setup¶
@Configuration
public class MonitoringConfig {
@Bean
public MeterRegistry meterRegistry() {
CompositeMeterRegistry registry = new CompositeMeterRegistry();
registry.config()
.commonTags("application", "${spring.application.name}");
return registry;
}
@Bean
public TimedAspect timedAspect(MeterRegistry registry) {
return new TimedAspect(registry);
}
}
Metrics Aspect¶
@Aspect
@Component
public class MetricsAspect {
private final MeterRegistry registry;
@Around("@annotation(Timed)")
public Object timeMethod(ProceedingJoinPoint joinPoint) throws Throwable {
Timer.Sample sample = Timer.start(registry);
try {
return joinPoint.proceed();
} finally {
sample.stop(Timer.builder("method.execution.time")
.tag("class", joinPoint.getSignature().getDeclaringTypeName())
.tag("method", joinPoint.getSignature().getName())
.register(registry));
}
}
}
Resources for Further Learning¶
- Spring Boot Actuator Documentation
- Prometheus Documentation
- Grafana Documentation
- ELK Stack Documentation
Practice Exercises¶
- Set up Spring Boot Actuator with custom metrics
- Implement distributed tracing with Sleuth and Zipkin
- Configure centralized logging with ELK stack
- Create custom health indicators
- Set up Prometheus and Grafana dashboards