Add observability stack: ServiceMonitors, Tempo, OTel API instrumentation, dashboards
- Add ServiceMonitors for Traefik, ArgoCD, and Longhorn - Enable cert-manager ServiceMonitor via helm values - Deploy Grafana Tempo for distributed tracing (single-binary, Longhorn PVC) - Add Tempo datasource with trace-to-logs and trace-to-metrics correlation - Instrument API with OpenTelemetry SDK (Prometheus metrics + OTLP traces) - Replace console.log with pino structured logging + pino-http middleware - Add Grafana dashboards for Traefik, API overview, and PostgreSQL (CNPG)
This commit is contained in:
41
infra/kubernetes/observability/tempo/application.yaml
Normal file
41
infra/kubernetes/observability/tempo/application.yaml
Normal file
@@ -0,0 +1,41 @@
|
||||
apiVersion: argoproj.io/v1alpha1
|
||||
kind: Application
|
||||
metadata:
|
||||
name: tempo
|
||||
namespace: argocd
|
||||
spec:
|
||||
project: default
|
||||
source:
|
||||
repoURL: https://grafana.github.io/helm-charts
|
||||
chart: tempo
|
||||
targetRevision: 1.12.0
|
||||
helm:
|
||||
valuesObject:
|
||||
tempo:
|
||||
receivers:
|
||||
otlp:
|
||||
protocols:
|
||||
grpc:
|
||||
endpoint: "0.0.0.0:4317"
|
||||
http:
|
||||
endpoint: "0.0.0.0:4318"
|
||||
retention: 168h
|
||||
resources:
|
||||
requests:
|
||||
memory: 256Mi
|
||||
cpu: 100m
|
||||
limits:
|
||||
memory: 1Gi
|
||||
persistence:
|
||||
enabled: true
|
||||
storageClassName: longhorn
|
||||
size: 10Gi
|
||||
destination:
|
||||
server: https://kubernetes.default.svc
|
||||
namespace: observability
|
||||
syncPolicy:
|
||||
automated:
|
||||
prune: true
|
||||
selfHeal: true
|
||||
syncOptions:
|
||||
- CreateNamespace=true
|
||||
Reference in New Issue
Block a user