Docs/Hearth/Deployment

Deployment

The production cluster, its networking, and how data is laid out. For how a code change gets there, see Shipping.

Production architecture

text
EC2 (t3.medium, 4 GB RAM, 30 GB disk)
   k3s
      Traefik (ingress, TLS)
      Pod: api         ghcr.io/katforge/api.katforge.com
      Pod: web         ghcr.io/katforge/katforge.com
      Pod: lextris     ghcr.io/katforge/lextris.com
      Pod: stumper     ghcr.io/katforge/stumper.gg
   PostgreSQL 16 (native on the host, not containerized)

Postgres runs natively on the host instead of in k3s. That keeps memory pressure off the kubelet, simplifies backups (pg_dump against 127.0.0.1), and avoids the operational overhead of a clustered PG operator for a single-instance studio.

Network topology

ComponentAddress
Instance (Elastic IP)3.19.192.157
Instance (private)172.31.14.98
k3s APIk3s.katforge.com:6443
PostgreSQL172.31.14.98:5432
VPNvpn.katforge.com (3.15.140.218)

k3s

PropertyValue
Distributionk3s v1.34
Namespacesdefault (prod), qa, ci
IngressTraefik (bundled with k3s)
TLSLet's Encrypt via Route53 DNS challenge
Container registryghcr.io/katforge/

The ci namespace holds nothing but the GitHub Actions service account. See Shipping → Service account for CI.

AWS resources

ServiceUsage
EC2k3s instance (t3.medium), VPN (t2.nano)
Route53DNS for katforge.com, stumper.gg
SESTransactional email (us-east-2)
Elastic IP3.19.192.157 (k3s instance)

DNS and TLS

Domain routing

DomainTypeTargetStage
katforge.comA3.19.192.157prod
www.katforge.comCNAMEkatforge.comprod (301 redirect)
api.katforge.comCNAMEkatforge.comprod
stumper.ggA3.19.192.157prod
k3s.katforge.comA3.19.192.157infra (kubectl)
qa.katforge.comA172.31.14.98qa (VPN-only)
qa.api.katforge.comA172.31.14.98qa (VPN-only)
vpn.katforge.comA3.15.140.218infra

Lextris is served as a path under katforge.com/play/lextris/, not a separate hostname.

TLS provisioning

Traefik provisions Let's Encrypt certificates using the Route53 DNS challenge. Certificates are stored in Traefik's persistent volume and auto-renewed.

The DNS challenge uses an IAM user with AmazonRoute53FullAccess on hosted zone Z0292557263L1FJT7JXXS. Credentials are passed to Traefik via the HelmChartConfig at /var/lib/rancher/k3s/server/manifests/traefik-config.yaml.

Database

Postgres 16 runs on the k3s host (native, not in a pod).

PropertyValue
VersionPostgreSQL 16
Host172.31.14.98 (node IP, accessible from k3s pods)
Port5432
Databasekatforge
Userpostgres

Pods connect via the node IP. pg_hba.conf allows the k3s pod network (10.0.0.0/8), Docker/k3s networks (172.16.0.0/12), and the VPC (172.31.0.0/16).

Migrations

Schema migrations run automatically on pod start via a Doctrine migrate init container. A failed migration fails the rollout, so kubectl rollout status doubles as the migration health check. See Shipping for how the init container is built and pinned.

Backups

shell
# Manual snapshot from any host with kubectl access. Streams pg_dump from a
# one-off pod inside the cluster, no SSH to the node required.
KUBECONFIG=~/.kube/katforge-config kubectl run --rm -i --restart=Never \
   --image=postgres:16-alpine pgdump -- \
   sh -c 'PGPASSWORD=<pw> pg_dump -h 172.31.14.98 -U postgres katforge' | \
   gzip > backup-$(date -u +%Y%m%dT%H%M%SZ).sql.gz

No automated schedule today. The manual snapshot above is the only backup path; production data is recoverable but the recovery window is whatever the most recent ad-hoc dump is. Wire up a cron job (or a managed backup product) before any data on this cluster becomes load-bearing for users.

Status and observability

shell
hearth status              # deployed versions and pod health
hearth status --qa         # check QA
hearth logs api            # tail one service
hearth logs                # tail every service, color-prefixed

hearth status is read-only against the cluster — it uses your personal kubeconfig at ~/.kube/katforge-config, not the CI service account.

Rollback

shell
hearth rollback api

Runs kubectl rollout undo, reverting to the previous ReplicaSet. The CI-managed Release on GitHub still says the broken version is latest, so for a clean record either re-ship a fix or edit the broken release in the UI (mark draft, set previous good release as latest, the workflow re-fires). See Shipping → Rollback.

How code gets here

For the ship pipeline (pre-ship tests → imp ship → GitHub Release → Actions workflow → kubectl), see Shipping.