Add observability stack: ServiceMonitors, Tempo, OTel API instrumentation, dashboards

- Add ServiceMonitors for Traefik, ArgoCD, and Longhorn
- Enable cert-manager ServiceMonitor via helm values
- Deploy Grafana Tempo for distributed tracing (single-binary, Longhorn PVC)
- Add Tempo datasource with trace-to-logs and trace-to-metrics correlation
- Instrument API with OpenTelemetry SDK (Prometheus metrics + OTLP traces)
- Replace console.log with pino structured logging + pino-http middleware
- Add Grafana dashboards for Traefik, API overview, and PostgreSQL (CNPG)
This commit is contained in:
Julia McGhee
2026-03-20 21:00:48 +00:00
parent 8a23d5d5f6
commit 051c957347
23 changed files with 2259 additions and 11 deletions

View File

@@ -15,3 +15,22 @@ data:
url: http://loki.observability.svc:3100
jsonData:
maxLines: 1000
derivedFields:
- datasourceUid: tempo
matcherRegex: '"traceID":"(\w+)"'
name: TraceID
url: "$${__value.raw}"
- name: Tempo
type: tempo
uid: tempo
access: proxy
url: http://tempo.observability.svc:3100
jsonData:
tracesToLogs:
datasourceUid: loki
filterByTraceID: true
filterBySpanID: false
tracesToMetrics:
datasourceUid: prometheus
serviceMap:
datasourceUid: prometheus