47 Commits

Author SHA1 Message Date
Julia McGhee
9a40240bd2 Enable ServerSideApply for app-of-apps to fix CRD annotation size limit
All checks were successful
CI / lint-and-test (push) Successful in 23s
Deploy Production / deploy (push) Successful in 25s
CI / build (push) Successful in 24s
ArgoCD v3.3 ApplicationSet CRD exceeds the 262144-byte client-side apply
annotation limit. ServerSideApply=true avoids this.
2026-03-21 19:33:24 +00:00
Julia McGhee
cfa9699926 Upgrade ArgoCD v2.13.3 → v3.3.4
Some checks failed
CI / lint-and-test (push) Successful in 28s
Deploy Production / deploy (push) Successful in 24s
CI / build (push) Has been cancelled
Stepped through v2.14.12 → v3.0.7 → v3.1.6 → v3.2.5 → v3.3.4.
Use server-side apply with force-conflicts for CRD size limits in v3.3+.
2026-03-21 19:32:09 +00:00
Julia McGhee
fccf749598 Set Gitea deployment strategy to Recreate to avoid LevelDB lock conflicts
All checks were successful
CI / lint-and-test (push) Successful in 23s
Deploy Production / deploy (push) Successful in 15s
CI / build (push) Successful in 17s
2026-03-21 19:14:32 +00:00
Julia McGhee
0d7fa44577 Fix Gitea admin: use existing lazorgurl account and matching email
All checks were successful
CI / lint-and-test (push) Successful in 26s
CI / build (push) Successful in 22s
2026-03-21 19:06:41 +00:00
Julia McGhee
8eefb12c97 Fix Gitea admin init: set email explicitly to avoid default conflict
All checks were successful
CI / lint-and-test (push) Successful in 19s
CI / build (push) Successful in 16s
2026-03-21 19:05:32 +00:00
Julia McGhee
76cda86791 Fix Gitea upgrade: disable bundled valkey (renamed from redis in chart v12)
All checks were successful
CI / lint-and-test (push) Successful in 21s
CI / build (push) Successful in 23s
2026-03-21 19:03:20 +00:00
Julia McGhee
f7ffc91a4c Upgrade Gitea Helm chart 10.6.0 → 12.5.0 for workflow_dispatch UI
All checks were successful
CI / lint-and-test (push) Successful in 22s
CI / build (push) Successful in 21s
2026-03-21 19:00:58 +00:00
Julia McGhee
1dd93aa5a3 Disable telemetry for turbo, next.js in runner image
Some checks failed
CI / lint-and-test (push) Failing after 0s
CI / build (push) Has been skipped
2026-03-21 17:54:10 +00:00
Julia McGhee
0a8b65a496 Mount Docker socket into job containers for docker build
Some checks failed
CI / lint-and-test (push) Failing after 8s
CI / build (push) Has been skipped
Job containers need access to the DinD daemon for docker build/push.
Mount /var/run/docker.sock from DinD into job containers and set
docker_host in runner config.
2026-03-21 17:32:53 +00:00
Julia McGhee
64baf319fe Fix runner: use explicit register + daemon with --config flag
All checks were successful
CI / changes (push) Successful in 1s
CI / lint-and-test (push) Successful in 32s
CI / build (push) Has been skipped
The act_runner entrypoint ignores CONFIG_FILE for the daemon
command, so container.options (pnpm cache volume) never loads.
Use a custom command that registers manually then runs daemon
with --config explicitly.
2026-03-21 17:23:25 +00:00
Julia McGhee
e57f458058 Fix runner: use CONFIG_FILE env var instead of command override
All checks were successful
CI / changes (push) Successful in 14s
CI / lint-and-test (push) Successful in 37s
CI / build (push) Has been skipped
The command override bypasses the entrypoint that handles
registration. Use CONFIG_FILE env var which the entrypoint
respects, keeping the registration flow intact.
2026-03-21 17:14:30 +00:00
Julia McGhee
ab52874970 Fix pnpm cache: use explicit /pnpm-store path and env vars
Some checks are pending
CI / build (push) Blocked by required conditions
CI / changes (push) Successful in 15s
CI / lint-and-test (push) Successful in 21s
Mount volume at /pnpm-store and set PNPM_STORE_DIR and
COREPACK_HOME env vars in job containers so pnpm and corepack
both write to the cached volume. Corepack cache avoids
re-downloading pnpm binary each run.
2026-03-21 16:52:46 +00:00
Julia McGhee
14cf33f57f Bake pnpm into runner image, fix config loading with --config flag
Some checks are pending
CI / build (push) Blocked by required conditions
CI / changes (push) Successful in 2s
CI / lint-and-test (push) Successful in 27s
Deploy Production / deploy (push) Successful in 24s
Pre-install pnpm 9.15.4 via corepack in the image so it doesn't
download every run. Use --config CLI flag instead of CONFIG_FILE
env var to ensure container.options volume mount is applied.
2026-03-21 16:49:14 +00:00
Julia McGhee
65abed3426 Fix runner config: timeout needs duration string not int
All checks were successful
CI / changes (push) Successful in 10s
CI / lint-and-test (push) Successful in 51s
CI / build (push) Has been skipped
Deploy Production / deploy (push) Successful in 22s
2026-03-21 16:43:50 +00:00
Julia McGhee
eced4c1473 Add pnpm store cache to runner via persistent Docker volume
Some checks failed
CI / changes (push) Successful in 2s
CI / lint-and-test (push) Successful in 49s
Deploy Production / deploy (push) Failing after 20s
CI / build (push) Has been skipped
Mount a named Docker volume (pnpm-store) into every job container
at the default pnpm store path. The volume persists in the DinD
sidecar across job runs, so pnpm install reuses cached packages.
2026-03-21 16:41:37 +00:00
Julia McGhee
98ab851b60 Use custom runner image with jq, kustomize, docker pre-installed
Some checks failed
CI / changes (push) Successful in 1s
Deploy Production / deploy (push) Failing after 26s
CI / build (push) Has been skipped
CI / lint-and-test (push) Successful in 35s
Build a runner-image based on node:20-bookworm with all CI tools
baked in, avoiding apt-get install in every workflow run. Runner
labels now point to gitea.coreworlds.io/lazorgurl/runner-image.
2026-03-21 16:39:34 +00:00
Julia McGhee
9c02fd7f4c Add Gitea SSH host key to ArgoCD known_hosts via kustomize patch
Some checks failed
CI / build (push) Blocked by required conditions
Deploy Production / deploy (push) Waiting to run
CI / changes (push) Successful in 2s
CI / lint-and-test (push) Has been cancelled
Without this, ArgoCD rejects SSH connections to the in-cluster
Gitea service. Uses a patch file to replace the known_hosts
ConfigMap with defaults + Gitea key.
2026-03-21 16:23:49 +00:00
Julia McGhee
b8ef09359d Re-seal ArgoCD repo secret with insecure flag for in-cluster SSH
Some checks failed
CI / build (push) Blocked by required conditions
Deploy Production / deploy (push) Waiting to run
CI / changes (push) Successful in 2s
CI / lint-and-test (push) Has been cancelled
2026-03-21 16:19:29 +00:00
Julia McGhee
1d98d6e131 Cut over ArgoCD to Gitea: update all repoURLs and PR generator
Some checks failed
CI / build (push) Blocked by required conditions
Deploy Production / deploy (push) Waiting to run
CI / changes (push) Successful in 1s
CI / lint-and-test (push) Has been cancelled
Switch app-of-apps, platform, apps, and previews ApplicationSets
to read from in-cluster Gitea (gitea-helm-ssh.platform.svc:2222).
Previews now use Gitea PR generator instead of GitHub.
2026-03-21 16:15:22 +00:00
Julia McGhee
e6f8054055 Fix runner DinD: disable TLS between sidecar containers
Some checks failed
CI / build (push) Blocked by required conditions
Deploy Production / deploy (push) Waiting to run
CI / changes (push) Successful in 19s
CI / lint-and-test (push) Has been cancelled
TLS between act_runner and DinD in the same pod is unnecessary
and causes race conditions with cert generation. Use port 2375
(no TLS) and set DOCKER_TLS_CERTDIR="" on the DinD sidecar.
2026-03-21 16:13:19 +00:00
Julia McGhee
30c6f89f20 Seal remaining Gitea secrets: API token, runner token, pull secret
Some checks are pending
CI / changes (push) Waiting to run
CI / lint-and-test (push) Waiting to run
CI / build (push) Blocked by required conditions
Deploy Production / deploy (push) Waiting to run
All placeholder secrets replaced with real sealed values:
- argocd-gitea-token: API token for ArgoCD PR generator
- gitea-runner-token: registration token for in-cluster runner
- gitea-pull-secret: registry credentials for app image pulls
2026-03-21 16:09:19 +00:00
Julia McGhee
cb733c92a0 Add internal-only middleware to Gitea IngressRoute
Some checks are pending
CI / changes (push) Waiting to run
CI / lint-and-test (push) Waiting to run
CI / build (push) Blocked by required conditions
Deploy Production / deploy (push) Waiting to run
Restrict Gitea web UI to LAN access only, matching other
platform services. SSH NodePort (30022) is unaffected.
2026-03-21 16:02:24 +00:00
Julia McGhee
a4553fbeae Fix Gitea service names: gitea-http → gitea-helm-http
The Gitea Helm chart names services as gitea-helm-http and
gitea-helm-ssh, not gitea-http/gitea-ssh. Update IngressRoute
and runner deployment to match.
2026-03-21 16:00:08 +00:00
Julia McGhee
e78807bff1 Fix Gitea Valkey auth: inject password via env var interpolation
Valkey requires authentication. Use additionalConfigFromEnvs to
read the password from valkey-credentials secret and interpolate
it into the Redis URLs for cache and session config.
2026-03-21 15:58:48 +00:00
Julia McGhee
a3c73dccb0 Fix Gitea DB auth: use additionalConfigFromEnvs for password
The _secret/_key syntax doesn't work in Gitea Helm values. Use
additionalConfigFromEnvs to inject GITEA__database__PASSWD from
the sealed secret, which the chart translates into app.ini config.
2026-03-21 15:56:18 +00:00
Julia McGhee
7db7bc916e Fix longhorn-nvme: add storageclass.yaml to Longhorn kustomization
The longhorn-nvme StorageClass was defined but never included in the
Longhorn kustomization, so it was never deployed. Add it and revert
Gitea manifests back to longhorn-nvme as intended.
2026-03-21 15:51:24 +00:00
Julia McGhee
aed0bff28a Fix storage class: use longhorn instead of longhorn-nvme
The longhorn-nvme storage class doesn't exist yet in the cluster.
Use the available longhorn class to unblock PVC provisioning.
2026-03-21 15:49:49 +00:00
Julia McGhee
5b4086e71f Revert ArgoCD repoURLs to GitHub temporarily
Gitea needs to be deployed before ArgoCD can read from it.
Keep GitHub repoURLs so ArgoCD can discover and deploy the
new gitea-pg, gitea, and gitea-runner directories. Switch
to Gitea repoURLs after Gitea is running and repo is pushed.
2026-03-21 15:46:41 +00:00
Julia McGhee
f04ecbf5cd Add Gitea self-hosted git/CI/registry to replace GitHub
Deploy Gitea via Helm with dedicated CloudNativePG database,
in-cluster Actions runner (DinD), and built-in container registry.
ArgoCD repoURLs updated to use in-cluster Gitea SSH. Preview
ApplicationSet switched from GitHub PR generator to Gitea PR
generator. App images now pull from gitea.coreworlds.io registry.

Remaining setup after deploy: seal runner token, ArgoCD API token,
and registry pull secret once Gitea is running. Add ArgoCD deploy
key to Gitea repo settings.
2026-03-21 15:43:30 +00:00
Julia McGhee
6dde7c8aef Add harness app: agent orchestrator with cluster deployment
- Next.js app for orchestrating coding agent benchmarks (Claude Code, Codex, OpenCode)
- Dockerfile installs git, gh CLI, and agent CLIs for headless execution
- K8s deployment with workspace volume, sealed credentials for Claude + OpenCode
- Traefik IngressRoute at harness.coreworlds.io with internal-only middleware + TLS
- CI pipeline path filter for harness builds
- Fix OpenCode runtime flags (subcommand-based headless mode)
2026-03-21 15:26:09 +00:00
Julia McGhee
9e7077cd82 Add Grafana admin sealed secret 2026-03-21 13:19:08 +00:00
Julia McGhee
c6ce40a557 Add Ansible storage role for NVMe setup and Longhorn dual-disk config
Automates LV expansion, NVMe mount, and Longhorn node disk tagging
(hdd/nvme) via Ansible instead of Kustomize-managed manifests.
2026-03-21 13:19:04 +00:00
Julia McGhee
3b8fd4afd2 expand disk storage 2026-03-21 09:53:50 +00:00
Julia McGhee
051c957347 Add observability stack: ServiceMonitors, Tempo, OTel API instrumentation, dashboards
- Add ServiceMonitors for Traefik, ArgoCD, and Longhorn
- Enable cert-manager ServiceMonitor via helm values
- Deploy Grafana Tempo for distributed tracing (single-binary, Longhorn PVC)
- Add Tempo datasource with trace-to-logs and trace-to-metrics correlation
- Instrument API with OpenTelemetry SDK (Prometheus metrics + OTLP traces)
- Replace console.log with pino structured logging + pino-http middleware
- Add Grafana dashboards for Traefik, API overview, and PostgreSQL (CNPG)
2026-03-20 21:01:05 +00:00
Julia McGhee
e863ebed9b Set Longhorn default replica count to 1 for single-node cluster
With only one node, 2 replicas can never be scheduled — volumes report
as degraded. Match the replica count to the node count.
2026-03-20 19:39:15 +00:00
Julia McGhee
04fc7c7576 Disable ArgoCD internal TLS to fix redirect loop behind Traefik
Traefik terminates TLS, so ArgoCD server must run in insecure mode.
Also update ArgoCD URL from homelab.local to coreworlds.io.
2026-03-20 19:33:17 +00:00
Julia McGhee
1bafc75429 Enable servicelb for LoadBalancer IP assignment on single-node cluster
Without servicelb, Traefik had no external IP and was only reachable via
NodePort. Klipper LB binds ports 80/443 directly to the node.
2026-03-20 19:31:27 +00:00
Julia McGhee
11c9c0f1bc Add Certificate resources for internal IngressRoutes
cert-manager annotations don't work on Traefik IngressRoutes — explicit
Certificate resources are needed to trigger Let's Encrypt issuance.
2026-03-20 19:26:25 +00:00
Julia McGhee
71442a0405 Switch from homelab.local to coreworlds.io with split-horizon DNS and LAN-only access controls
- Migrate all ingress hostnames from *.homelab.local to *.coreworlds.io
- Remove broken Traefik certresolver config (cert-manager handles TLS)
- Add internal-only IP allowlist middleware for platform services
- Add IngressRoutes for ArgoCD, Grafana, Longhorn (LAN-only via middleware)
- Seal and add Cloudflare API token for cert-manager DNS-01 challenges
- Update cert-manager ClusterIssuers with real email
- Update k3s TLS SAN to k3s.coreworlds.io
- Rewrite Ubiquiti docs for single-node topology and split-horizon DNS
- Fix seal-secret.sh controller name to match Helm release
- Add UCG DNS setup script using API key auth
2026-03-20 19:21:46 +00:00
Julia McGhee
4135d2102e Bump CNPG chart to 0.23.2 for missing Pooler CRD 2026-03-20 18:51:15 +00:00
Julia McGhee
9867129eff Add retry/ServerSideApply to CNPG helm app 2026-03-20 18:49:37 +00:00
Julia McGhee
6f1418d0c6 Disable Longhorn pre-upgrade checker job for ArgoCD compatibility 2026-03-20 18:48:06 +00:00
Julia McGhee
b359cc9560 Separate CRD-dependent resources from operator installs
cert-manager and CloudNativePG operator installs must complete before
their custom resources (ClusterIssuer, CNPG Cluster) can be applied.

Split into separate kustomize dirs so the ApplicationSet creates
independent ArgoCD apps that can sync in order:
- platform-cert-manager → installs operator
- platform-cert-manager-config → creates ClusterIssuers (after CRDs exist)
- platform-cloudnativepg → installs operator
- platform-cloudnativepg-cluster → creates PG cluster (after CRDs exist)
2026-03-20 18:43:01 +00:00
Julia McGhee
4aff69d0e6 Add Helm-based ArgoCD Applications for platform operators
- Longhorn: Helm chart v1.7.2 (sync-wave -2, installs first)
- cert-manager: Helm chart v1.16.3 with CRDs enabled
- CloudNativePG: Helm chart v0.23.0
- Sealed Secrets: Helm chart v2.16.2
- Remove custom StorageClass (Helm chart manages it)

Previously only config resources were deployed without the actual
operators, causing PVCs to pend and CRDs to be missing.
2026-03-20 18:40:16 +00:00
Julia McGhee
9cb517fcbe Remove accidentally committed secrets, harden .gitignore
- Remove vault.yaml and kubeconfig from tracking
- Add vault files and kubeconfig to .gitignore everywhere
- Clean up stray infra/ansible/infra/ directory
2026-03-20 18:33:55 +00:00
Julia McGhee
7f3585a013 Configure ArgoCD for private repo access
- Update repo URLs from HTTPS placeholder to git@github.com:lazorgurl/homelab.git
- Update container image refs from OWNER to lazorgurl
- Set KUBECONFIG env in Taskfile
- Fix kubeconfig-fetch.sh to auto-detect server IP from inventory
- Fix Ansible: callback plugin, br_netfilter ordering, ssh service name
2026-03-20 18:33:30 +00:00
Julia McGhee
96e3f32f28 Initial monorepo scaffold
Turborepo + pnpm monorepo for k3s homelab cluster on Intel NUCs.

- Apps: Next.js web frontend, Express API (TypeScript, Dockerfiles, k8s manifests)
- Packages: shared UI, ESLint config, TypeScript config, Drizzle DB schemas
- Infra/Ansible: bare-metal provisioning with roles for common, k3s-server, k3s-agent, hardening
- Infra/Kubernetes: ArgoCD GitOps (app-of-apps + ApplicationSets), platform components
  (cert-manager, Traefik, CloudNativePG, Valkey, Longhorn, Sealed Secrets), namespaces
- Observability: kube-prometheus-stack, Loki, Promtail as ArgoCD Applications
- CI/CD: GitHub Actions for PR builds, preview deploys, production deploys
- DX: Taskfile, utility scripts, copier templates, Ubiquiti network docs
2026-03-19 22:24:56 +00:00