Deployment

The production cluster, its networking, and how data is laid out. For how a code change gets there, see Shipping.

Production architecture

text

EC2 (t3.medium, 4 GB RAM, 30 GB disk)
   k3s
      Traefik (ingress, TLS)
      Pod: api         ghcr.io/katforge/api.katforge.com
      Pod: web         ghcr.io/katforge/katforge.com
      Pod: lextris     ghcr.io/katforge/lextris.com
      Pod: stumper     ghcr.io/katforge/stumper.gg
   PostgreSQL 16 (native on the host, not containerized)

Postgres runs natively on the host instead of in k3s. That keeps memory pressure off the kubelet, simplifies backups (pg_dump against 127.0.0.1), and avoids the operational overhead of a clustered PG operator for a single-instance studio.

Network topology

Component	Address
Instance (Elastic IP)	`3.19.192.157`
Instance (private)	`172.31.14.98`
k3s API	`k3s.katforge.com:6443`
PostgreSQL	`172.31.14.98:5432`
VPN	`vpn.katforge.com` (`3.15.140.218`)

k3s

Property	Value
Distribution	k3s v1.34
Namespaces	`default` (prod), `qa`, `ci`
Ingress	Traefik (bundled with k3s)
TLS	Let's Encrypt via Route53 DNS challenge
Container registry	`ghcr.io/katforge/`

The ci namespace holds nothing but the GitHub Actions service account. See Shipping → Service account for CI.

AWS resources

Service	Usage
EC2	k3s instance (t3.medium), VPN (t2.nano)
Route53	DNS for katforge.com, stumper.gg
SES	Transactional email (us-east-2)
Elastic IP	`3.19.192.157` (k3s instance)

DNS and TLS

Domain routing

Domain	Type	Target	Stage
`katforge.com`	A	`3.19.192.157`	prod
`www.katforge.com`	CNAME	`katforge.com`	prod (301 redirect)
`api.katforge.com`	CNAME	`katforge.com`	prod
`stumper.gg`	A	`3.19.192.157`	prod
`k3s.katforge.com`	A	`3.19.192.157`	infra (kubectl)
`qa.katforge.com`	A	`172.31.14.98`	qa (VPN-only)
`qa.api.katforge.com`	A	`172.31.14.98`	qa (VPN-only)
`vpn.katforge.com`	A	`3.15.140.218`	infra

Lextris is served as a path under katforge.com/play/lextris/, not a separate hostname.

TLS provisioning

Traefik provisions Let's Encrypt certificates using the Route53 DNS challenge. Certificates are stored in Traefik's persistent volume and auto-renewed.

The DNS challenge uses an IAM user with AmazonRoute53FullAccess on hosted zone Z0292557263L1FJT7JXXS. Credentials are passed to Traefik via the HelmChartConfig at /var/lib/rancher/k3s/server/manifests/traefik-config.yaml.

Database

Postgres 16 runs on the k3s host (native, not in a pod).

Property	Value
Version	PostgreSQL 16
Host	`172.31.14.98` (node IP, accessible from k3s pods)
Port	`5432`
Database	`katforge`
User	`postgres`

Pods connect via the node IP. pg_hba.conf allows the k3s pod network (10.0.0.0/8), Docker/k3s networks (172.16.0.0/12), and the VPC (172.31.0.0/16).

Migrations

Schema migrations run automatically on pod start via a Doctrine migrate init container. A failed migration fails the rollout, so kubectl rollout status doubles as the migration health check. See Shipping for how the init container is built and pinned.

Backups

shell

# Manual snapshot from any host with kubectl access. Streams pg_dump from a
# one-off pod inside the cluster, no SSH to the node required.
KUBECONFIG=~/.kube/katforge-config kubectl run --rm -i --restart=Never \
   --image=postgres:16-alpine pgdump -- \
   sh -c 'PGPASSWORD=<pw> pg_dump -h 172.31.14.98 -U postgres katforge' | \
   gzip > backup-$(date -u +%Y%m%dT%H%M%SZ).sql.gz

No automated schedule today. The manual snapshot above is the only backup path; production data is recoverable but the recovery window is whatever the most recent ad-hoc dump is. Wire up a cron job (or a managed backup product) before any data on this cluster becomes load-bearing for users.

Status and observability

shell

hearth status              # deployed versions and pod health
hearth status --qa         # check QA
hearth logs api            # tail one service
hearth logs                # tail every service, color-prefixed

hearth status is read-only against the cluster — it uses your personal kubeconfig at ~/.kube/katforge-config, not the CI service account.

Rollback

shell

hearth rollback api

Runs kubectl rollout undo, reverting to the previous ReplicaSet. The CI-managed Release on GitHub still says the broken version is latest, so for a clean record either re-ship a fix or edit the broken release in the UI (mark draft, set previous good release as latest, the workflow re-fires). See Shipping → Rollback.

How code gets here

For the ship pipeline (pre-ship tests → imp ship → GitHub Release → Actions workflow → kubectl), see Shipping.

← PreviousSecrets Next →Shipping