Deployment
The production cluster, its networking, and how data is laid out. For how a code change gets there, see Shipping.
Production architecture
EC2 (t3.medium, 4 GB RAM, 30 GB disk)
k3s
Traefik (ingress, TLS)
Pod: api ghcr.io/katforge/api.katforge.com
Pod: web ghcr.io/katforge/katforge.com
Pod: lextris ghcr.io/katforge/lextris.com
Pod: stumper ghcr.io/katforge/stumper.gg
PostgreSQL 16 (native on the host, not containerized)
Postgres runs natively on the host instead of in k3s. That keeps memory pressure off the kubelet, simplifies backups (pg_dump against 127.0.0.1), and avoids the operational overhead of a clustered PG operator for a single-instance studio.
Network topology
| Component | Address |
|---|---|
| Instance (Elastic IP) | 3.19.192.157 |
| Instance (private) | 172.31.14.98 |
| k3s API | k3s.katforge.com:6443 |
| PostgreSQL | 172.31.14.98:5432 |
| VPN | vpn.katforge.com (3.15.140.218) |
k3s
| Property | Value |
|---|---|
| Distribution | k3s v1.34 |
| Namespaces | default (prod), qa, ci |
| Ingress | Traefik (bundled with k3s) |
| TLS | Let's Encrypt via Route53 DNS challenge |
| Container registry | ghcr.io/katforge/ |
The ci namespace holds nothing but the GitHub Actions service account. See Shipping → Service account for CI.
AWS resources
| Service | Usage |
|---|---|
| EC2 | k3s instance (t3.medium), VPN (t2.nano) |
| Route53 | DNS for katforge.com, stumper.gg |
| SES | Transactional email (us-east-2) |
| Elastic IP | 3.19.192.157 (k3s instance) |
DNS and TLS
Domain routing
| Domain | Type | Target | Stage |
|---|---|---|---|
katforge.com | A | 3.19.192.157 | prod |
www.katforge.com | CNAME | katforge.com | prod (301 redirect) |
api.katforge.com | CNAME | katforge.com | prod |
stumper.gg | A | 3.19.192.157 | prod |
k3s.katforge.com | A | 3.19.192.157 | infra (kubectl) |
qa.katforge.com | A | 172.31.14.98 | qa (VPN-only) |
qa.api.katforge.com | A | 172.31.14.98 | qa (VPN-only) |
vpn.katforge.com | A | 3.15.140.218 | infra |
Lextris is served as a path under katforge.com/play/lextris/, not a separate hostname.
TLS provisioning
Traefik provisions Let's Encrypt certificates using the Route53 DNS challenge. Certificates are stored in Traefik's persistent volume and auto-renewed.
The DNS challenge uses an IAM user with AmazonRoute53FullAccess on hosted zone Z0292557263L1FJT7JXXS. Credentials are passed to Traefik via the HelmChartConfig at /var/lib/rancher/k3s/server/manifests/traefik-config.yaml.
Database
Postgres 16 runs on the k3s host (native, not in a pod).
| Property | Value |
|---|---|
| Version | PostgreSQL 16 |
| Host | 172.31.14.98 (node IP, accessible from k3s pods) |
| Port | 5432 |
| Database | katforge |
| User | postgres |
Pods connect via the node IP. pg_hba.conf allows the k3s pod network (10.0.0.0/8), Docker/k3s networks (172.16.0.0/12), and the VPC (172.31.0.0/16).
Migrations
Schema migrations run automatically on pod start via a Doctrine migrate init container. A failed migration fails the rollout, so kubectl rollout status doubles as the migration health check. See Shipping for how the init container is built and pinned.
Backups
# Manual snapshot from any host with kubectl access. Streams pg_dump from a
# one-off pod inside the cluster, no SSH to the node required.
KUBECONFIG=~/.kube/katforge-config kubectl run --rm -i --restart=Never \
--image=postgres:16-alpine pgdump -- \
sh -c 'PGPASSWORD=<pw> pg_dump -h 172.31.14.98 -U postgres katforge' | \
gzip > backup-$(date -u +%Y%m%dT%H%M%SZ).sql.gz
No automated schedule today. The manual snapshot above is the only backup path; production data is recoverable but the recovery window is whatever the most recent ad-hoc dump is. Wire up a cron job (or a managed backup product) before any data on this cluster becomes load-bearing for users.
Status and observability
hearth status # deployed versions and pod health
hearth status --qa # check QA
hearth logs api # tail one service
hearth logs # tail every service, color-prefixed
hearth status is read-only against the cluster — it uses your personal kubeconfig at ~/.kube/katforge-config, not the CI service account.
Rollback
hearth rollback api
Runs kubectl rollout undo, reverting to the previous ReplicaSet. The CI-managed Release on GitHub still says the broken version is latest, so for a clean record either re-ship a fix or edit the broken release in the UI (mark draft, set previous good release as latest, the workflow re-fires). See Shipping → Rollback.
How code gets here
For the ship pipeline (pre-ship tests → imp ship → GitHub Release → Actions workflow → kubectl), see Shipping.