Commit Graph

21 Commits

Author SHA1 Message Date
Julia McGhee
a3c73dccb0 Fix Gitea DB auth: use additionalConfigFromEnvs for password
The _secret/_key syntax doesn't work in Gitea Helm values. Use
additionalConfigFromEnvs to inject GITEA__database__PASSWD from
the sealed secret, which the chart translates into app.ini config.
2026-03-21 15:56:18 +00:00
Julia McGhee
7db7bc916e Fix longhorn-nvme: add storageclass.yaml to Longhorn kustomization
The longhorn-nvme StorageClass was defined but never included in the
Longhorn kustomization, so it was never deployed. Add it and revert
Gitea manifests back to longhorn-nvme as intended.
2026-03-21 15:51:24 +00:00
Julia McGhee
aed0bff28a Fix storage class: use longhorn instead of longhorn-nvme
The longhorn-nvme storage class doesn't exist yet in the cluster.
Use the available longhorn class to unblock PVC provisioning.
2026-03-21 15:49:49 +00:00
Julia McGhee
5b4086e71f Revert ArgoCD repoURLs to GitHub temporarily
Gitea needs to be deployed before ArgoCD can read from it.
Keep GitHub repoURLs so ArgoCD can discover and deploy the
new gitea-pg, gitea, and gitea-runner directories. Switch
to Gitea repoURLs after Gitea is running and repo is pushed.
2026-03-21 15:46:41 +00:00
Julia McGhee
f04ecbf5cd Add Gitea self-hosted git/CI/registry to replace GitHub
Deploy Gitea via Helm with dedicated CloudNativePG database,
in-cluster Actions runner (DinD), and built-in container registry.
ArgoCD repoURLs updated to use in-cluster Gitea SSH. Preview
ApplicationSet switched from GitHub PR generator to Gitea PR
generator. App images now pull from gitea.coreworlds.io registry.

Remaining setup after deploy: seal runner token, ArgoCD API token,
and registry pull secret once Gitea is running. Add ArgoCD deploy
key to Gitea repo settings.
2026-03-21 15:43:30 +00:00
Julia McGhee
6dde7c8aef Add harness app: agent orchestrator with cluster deployment
- Next.js app for orchestrating coding agent benchmarks (Claude Code, Codex, OpenCode)
- Dockerfile installs git, gh CLI, and agent CLIs for headless execution
- K8s deployment with workspace volume, sealed credentials for Claude + OpenCode
- Traefik IngressRoute at harness.coreworlds.io with internal-only middleware + TLS
- CI pipeline path filter for harness builds
- Fix OpenCode runtime flags (subcommand-based headless mode)
2026-03-21 15:26:09 +00:00
Julia McGhee
9e7077cd82 Add Grafana admin sealed secret 2026-03-21 13:19:08 +00:00
Julia McGhee
c6ce40a557 Add Ansible storage role for NVMe setup and Longhorn dual-disk config
Automates LV expansion, NVMe mount, and Longhorn node disk tagging
(hdd/nvme) via Ansible instead of Kustomize-managed manifests.
2026-03-21 13:19:04 +00:00
Julia McGhee
3b8fd4afd2 expand disk storage 2026-03-21 09:53:50 +00:00
Julia McGhee
051c957347 Add observability stack: ServiceMonitors, Tempo, OTel API instrumentation, dashboards
- Add ServiceMonitors for Traefik, ArgoCD, and Longhorn
- Enable cert-manager ServiceMonitor via helm values
- Deploy Grafana Tempo for distributed tracing (single-binary, Longhorn PVC)
- Add Tempo datasource with trace-to-logs and trace-to-metrics correlation
- Instrument API with OpenTelemetry SDK (Prometheus metrics + OTLP traces)
- Replace console.log with pino structured logging + pino-http middleware
- Add Grafana dashboards for Traefik, API overview, and PostgreSQL (CNPG)
2026-03-20 21:01:05 +00:00
Julia McGhee
e863ebed9b Set Longhorn default replica count to 1 for single-node cluster
With only one node, 2 replicas can never be scheduled — volumes report
as degraded. Match the replica count to the node count.
2026-03-20 19:39:15 +00:00
Julia McGhee
04fc7c7576 Disable ArgoCD internal TLS to fix redirect loop behind Traefik
Traefik terminates TLS, so ArgoCD server must run in insecure mode.
Also update ArgoCD URL from homelab.local to coreworlds.io.
2026-03-20 19:33:17 +00:00
Julia McGhee
11c9c0f1bc Add Certificate resources for internal IngressRoutes
cert-manager annotations don't work on Traefik IngressRoutes — explicit
Certificate resources are needed to trigger Let's Encrypt issuance.
2026-03-20 19:26:25 +00:00
Julia McGhee
71442a0405 Switch from homelab.local to coreworlds.io with split-horizon DNS and LAN-only access controls
- Migrate all ingress hostnames from *.homelab.local to *.coreworlds.io
- Remove broken Traefik certresolver config (cert-manager handles TLS)
- Add internal-only IP allowlist middleware for platform services
- Add IngressRoutes for ArgoCD, Grafana, Longhorn (LAN-only via middleware)
- Seal and add Cloudflare API token for cert-manager DNS-01 challenges
- Update cert-manager ClusterIssuers with real email
- Update k3s TLS SAN to k3s.coreworlds.io
- Rewrite Ubiquiti docs for single-node topology and split-horizon DNS
- Fix seal-secret.sh controller name to match Helm release
- Add UCG DNS setup script using API key auth
2026-03-20 19:21:46 +00:00
Julia McGhee
4135d2102e Bump CNPG chart to 0.23.2 for missing Pooler CRD 2026-03-20 18:51:15 +00:00
Julia McGhee
9867129eff Add retry/ServerSideApply to CNPG helm app 2026-03-20 18:49:37 +00:00
Julia McGhee
6f1418d0c6 Disable Longhorn pre-upgrade checker job for ArgoCD compatibility 2026-03-20 18:48:06 +00:00
Julia McGhee
b359cc9560 Separate CRD-dependent resources from operator installs
cert-manager and CloudNativePG operator installs must complete before
their custom resources (ClusterIssuer, CNPG Cluster) can be applied.

Split into separate kustomize dirs so the ApplicationSet creates
independent ArgoCD apps that can sync in order:
- platform-cert-manager → installs operator
- platform-cert-manager-config → creates ClusterIssuers (after CRDs exist)
- platform-cloudnativepg → installs operator
- platform-cloudnativepg-cluster → creates PG cluster (after CRDs exist)
2026-03-20 18:43:01 +00:00
Julia McGhee
4aff69d0e6 Add Helm-based ArgoCD Applications for platform operators
- Longhorn: Helm chart v1.7.2 (sync-wave -2, installs first)
- cert-manager: Helm chart v1.16.3 with CRDs enabled
- CloudNativePG: Helm chart v0.23.0
- Sealed Secrets: Helm chart v2.16.2
- Remove custom StorageClass (Helm chart manages it)

Previously only config resources were deployed without the actual
operators, causing PVCs to pend and CRDs to be missing.
2026-03-20 18:40:16 +00:00
Julia McGhee
7f3585a013 Configure ArgoCD for private repo access
- Update repo URLs from HTTPS placeholder to git@github.com:lazorgurl/homelab.git
- Update container image refs from OWNER to lazorgurl
- Set KUBECONFIG env in Taskfile
- Fix kubeconfig-fetch.sh to auto-detect server IP from inventory
- Fix Ansible: callback plugin, br_netfilter ordering, ssh service name
2026-03-20 18:33:30 +00:00
Julia McGhee
96e3f32f28 Initial monorepo scaffold
Turborepo + pnpm monorepo for k3s homelab cluster on Intel NUCs.

- Apps: Next.js web frontend, Express API (TypeScript, Dockerfiles, k8s manifests)
- Packages: shared UI, ESLint config, TypeScript config, Drizzle DB schemas
- Infra/Ansible: bare-metal provisioning with roles for common, k3s-server, k3s-agent, hardening
- Infra/Kubernetes: ArgoCD GitOps (app-of-apps + ApplicationSets), platform components
  (cert-manager, Traefik, CloudNativePG, Valkey, Longhorn, Sealed Secrets), namespaces
- Observability: kube-prometheus-stack, Loki, Promtail as ArgoCD Applications
- CI/CD: GitHub Actions for PR builds, preview deploys, production deploys
- DX: Taskfile, utility scripts, copier templates, Ubiquiti network docs
2026-03-19 22:24:56 +00:00