Docs/Codex/Dark and Darker

Dark and Darker extraction

The codex extraction pipeline for Dark and Darker lives at ~/.katforge/codex/ as a Docker Compose stack. It runs in five stages:

  1. fetch — pull the game depots from Steam via DepotDownloader.
  2. usmap — dump UE5 type metadata (.usmap) by booting Tavern.exe under Proton with the UE4SS AutoUSMAP mod.
  3. decode — decode paks to JSON + PNG + locres via a thin .NET CLI built on CUE4Parse.
  4. adapt — walk the per-patch dump tree with a PHP adapter (KAT\Codex\Adapter\DarkAndDarker) that emits a normalized envelope.
  5. import — land the envelope into the codex DB via bin/console codex:import.

Steps 1–3 run inside the dark-and-darker compose service; steps 4–5 run inside the hearth api container. The hearth codex CLI orchestrates both.

  • Tavern.exe — the Dark and Darker game executable.
  • Proton — Valve's Wine fork, lets Linux run Windows games.
  • UE4SS — a Lua-scriptable plugin host for Unreal Engine games.
  • AutoUSMAP — a UE4SS Lua mod that dumps type metadata at game-thread entry.
  • .usmap — Unreal Engine "mapping" file describing the type layout the cooked data uses.
  • CUE4Parse — open-source .NET library for reading cooked Unreal Engine .pak and .uasset files.
  • .locres — Unreal localization resource — a binary key/value store of translated strings.

Operator UX

shell
# Extract: fetch + usmap + decode (defaults to latest upstream BuildVersion.txt).
hearth codex extract dark-and-darker
hearth codex extract dark-and-darker --patch=0.16.135.8645-2663

# Import: run the adapter, then codex:import. Defaults to the most recently
# extracted patch on disk.
hearth codex import dark-and-darker

# One-shot: extract + import in sequence.
hearth codex update dark-and-darker

# Read: query a single entity from the local hearth api.
hearth codex get dark-and-darker id.item.longsword_5001

import and the underlying codex:import are idempotent — re-running with the same patch + branch yields added=0 changed=0 removed=0 unchanged=N.

Per-patch layout

Each patch is self-contained under the per-game dir:

text
~/.katforge/codex/dark-and-darker/
   patches/
      <version>/
         source/                            DepotDownloader output (paks, exe, dlls)
         data/
            json/                           ~90k extracted .uasset → JSON
            icons/                          ~3k PNGs (icons + portraits)
            locales/                        .locres → flat namespace/key/value JSON
         manifest/decode.json               counts + paths
         logs/decode.*.{log,err}            per-pass diagnostics
         logs/proton.log                    usmap stage stderr
         Mappings.usmap                     UE5 type metadata
         BuildVersion.txt                   the upstream version this patch was tagged with
         import-envelope.json               generated by codex:adapt; consumed by codex:import
   steam-state/                             persistent DepotDownloader cache (~10 GB)
   dd-auth/                                 DepotDownloader account.config (Steam Guard token)

The mount inside the dark-and-darker container is ./dark-and-darker/patches:/work, so scripts see paths like /work/<version>/source/.... The hearth api container mounts ../codex:/codex so codex:adapt can read /codex/<slug>/patches/<version>/data/....

Sequence

sequenceDiagram
    autonumber
    participant Op       as Operator
    participant Hearth   as hearth CLI
    participant DnD      as dark-and-darker container
    participant API      as hearth api container
    participant Codex    as codex DB

    Op->>Hearth: hearth codex update dark-and-darker
    Hearth->>DnD: docker exec extract <version>
    DnD->>DnD: fetch (DepotDownloader)
    DnD->>DnD: usmap (Proton + UE4SS)
    DnD->>DnD: decode (CUE4Parse + locres)
    DnD-->>Hearth: patches/<v>/{source, data, manifest, logs}
    Hearth->>API: docker exec codex:adapt
    API-->>API: walk data/json/, register icons, emit envelope
    API-->>Hearth: patches/<v>/import-envelope.json
    Hearth->>API: docker exec codex:import
    API->>Codex: INSERT entities, assets, locale_strings, diffs
    Codex-->>API: report
    API-->>Hearth: added=N changed=M removed=K unchanged=L

Stages in detail

1. fetch (DepotDownloader)

scripts/fetch.sh invokes DepotDownloader against app 2016590, writing depots to <patch>/source/. Steam credentials are read from a darkerdb config.ini mounted at /run/secrets/darkerdb-config.ini. Steam Guard 2FA codes come from steamguard-cli against a maFile mounted at /run/secrets/steamguard; if neither is available the operator can pass STEAM_GUARD_CODE=XXXXX for one run and the cached token covers subsequent runs.

2. usmap (Proton + UE4SS)

scripts/usmap.sh boots Tavern.exe under Proton Experimental with xvfb and the headless-darker UE4SS payload. The AutoUSMAP Lua mod calls DumpUSMAP() on game-thread entry; Mappings.usmap lands at <patch>/Mappings.usmap. Proton stderr → <patch>/logs/proton.log. Default timeout 300s.

3. decode (CUE4Parse)

scripts/decode.sh runs the .NET exporter (/opt/unrealexporter/UnrealExporter, built from dark-and-darker/exporter/) in three configs:

  • dark-and-darker-json-{0..31} — chunked JSON dump of every .uasset / .umap under DungeonCrawler/Content/DungeonCrawler/. Chunked because a single pass OOMs on the ~90k file working set; each chunk is a fresh PHP process keyed by FNV-1a(path) % 32.
  • dark-and-darker-icons — PNG export of *Icon_*.uasset and Portrait_*.uasset files via the AssetRipper texture decoder (Linux-safe; Detex.dll isn't loadable here).
  • dark-and-darker-locres — flat JSON dump of every .locres file via CUE4Parse's FTextLocalizationResource.

Per-pass output is sorted into data/{json,icons,locales}/ after the run; per-pass logs go to logs/decode.*.{log,err}.

The exporter is our thin .NET CLI under dark-and-darker/exporter/, not luk-gg/UnrealExporter — that upstream is unmaintained and pinned to a CUE4Parse version too stale to handle current DaD cooked data. We pin CUE4Parse directly via CUE4PARSE_REF in the Dockerfile.

4. adapt (codex:adapt)

bin/console codex:adapt dark-and-darker --dump-dir=<path> --output=<path> runs in the api container. Internally it instantiates KAT\Codex\Adapter\DarkAndDarker\Normalizer, walks every registered type's V2 directory, and writes a single envelope:

jsonc
{
  "entities": [ { "id": "id.item.longsword_5001", "type": "item", ... }, ... ],
  "assets":   [ { "hash": "...", "media_type": "image/png", "bytes": 8868, "path": "..." }, ... ],
  "locales":  [ { "locale": "en", "namespace": "DC", "key": "...", "value": "Longsword" }, ... ]
}

Per-type modeling lives in KAT\Codex\Adapter\DarkAndDarker\Type\<Name> — one class per V2 entity type. The base Type class handles everything generic (file glob, projection, locale lookup); subclasses override tagsFor(), linksFor(), iconFor() for per-type behavior. See Adding a type.

5. import (codex:import)

bin/console codex:import dark-and-darker --patch-version-raw=<v> --source=<envelope> lands the envelope. KAT\Codex\Import\Importer runs three sub-importers in order:

  1. AssetRegistrar — copies icons into var/codex-assets/<hash[0..2]>/<hash>.<ext> and upserts codex.assets.
  2. Entity loop — snapshot-on-change inserts into codex.entities, plus entity_resolved_current, entity_links, entity_diffs, source_keys.
  3. LocaleImporter — bulk INSERT into codex.locale_strings (1k-row batches; one prior-snapshot fetch).

A register_shutdown_function closes the codex.imports row as failed even on fatal errors (memory_limit, segfault), so an OOM mid-run leaves a diagnosable trail instead of a stuck running row.

Adding a type

V2 currently has ~55 entity types under Data/Generated/V2/<TypeDir>/<TypeDir>/Id_*.json. Each one becomes a Type/*.php class:

PHP
namespace KAT\Codex\Adapter\DarkAndDarker\Type;

use KAT\Codex\Adapter\DarkAndDarker\AssetRef;
use KAT\Codex\Adapter\DarkAndDarker\Id;
use KAT\Codex\Adapter\DarkAndDarker\IdTag;
use KAT\Codex\Import\NormalizedAsset;

final class MyThing extends Type
{
   public function codexType (): string { return 'my_thing'; }
   public function v2Subdir  (): string { return 'MyThing'; }

   protected function tagsFor (array $props): array
   {
      return IdTag::tagsFor ([ $props ['Tier'] ?? null, $props ['Category'] ?? null ]);
   }

   protected function linksFor (array $props): array
   {
      $links = [];
      foreach ($props ['Abilities'] ?? [] as $entry) {
         $name = AssetRef::name ($entry);
         if ($name === null) continue;
         $links [] = [ 'rel' => 'has_ability', 'target' => Id::for ('my_thing_ability', $name) ];
      }
      return $links;
   }

   protected function iconFor (array $props, string $assetName): ?NormalizedAsset
   {
      return $this->icons->viaArtData ($props ['ArtData'] ?? null);
   }
}

Then register it in Normalizer::dump():

PHP
$types = [
   ...,
   new MyThing ($dump, $icons, $locale)
];

For types whose upstream id prefix doesn't match the directory name (e.g. Id_DesignDataReligion_* vs religion):

PHP
public function v2Subdir  (): string { return 'Religion'; }
protected function idPrefix (): string { return 'DesignDataReligion'; }

Id::for('religion', $name, 'DesignDataReligion') strips the right prefix.

Idempotency

Every step is idempotent on its own:

  • extract overwrites the per-version dir; re-running on an unchanged version produces byte-identical output.
  • codex:adapt is pure (same dump → same envelope; SHA-256s of icons are the dedup key).
  • codex:import is snapshot-on-change keyed by per-entity content_hash. Re-running with no changes: added=0 changed=0 removed=0 unchanged=N.

Caveat: when an adapter change updates an entity's content (e.g. new tag derivation, more icons resolved), re-importing the same patch version is a no-op by design — the importer refuses to overwrite an existing row at the same patch (the "history rewrite" guard). To force a backfill, wipe the patch's rows in codex.entities (and the dependent tables) and re-import. The next patch import picks up the change naturally.

Diagnostics

shell
# Pipeline state for a patch.
cat ~/.katforge/codex/dark-and-darker/patches/<v>/manifest/decode.json

# Per-pass diagnostics (if the importer reports non-zero failed counts).
ls ~/.katforge/codex/dark-and-darker/patches/<v>/logs/

# DB-side import log.
docker exec katforge-hearth-api-1 bin/console dbal:run-sql --force-fetch \
   "SELECT id, status, added, changed, removed, unchanged, error, started_at
    FROM codex.imports
    ORDER BY id DESC LIMIT 10"

# Per-type counts on a branch.
docker exec katforge-hearth-api-1 bin/console dbal:run-sql --force-fetch \
   "SELECT type, count(*) AS n
    FROM codex.entity_resolved_current
    WHERE game = 'dark-and-darker' AND branch = 'main'
    GROUP BY type ORDER BY n DESC"

# Single entity via the public endpoint.
hearth codex get dark-and-darker id.item.longsword_5001

Auth

Steam credentials come from $DARKERDB_CONFIG (default ~/Projects/darkerdb/config.ini), parsed for [darkerdb.steam].username and .password. Steam Guard 2FA codes auto-generate from steamguard-cli against the maFile at $STEAMGUARD_MAFILES (default ~/.config/steamguard-cli/maFiles). After the first successful run the DepotDownloader token caches in dark-and-darker/dd-auth/ for weeks; subsequent fetches need no codes.

Protobuf extraction (protodump)

A separate sub-pipeline pulls the game's network proto definitions out of the binary so the DarkerDB ghost bot can stay synchronized with patches:

  1. bin/protodump <build> runs protodump against DungeonCrawler.exe in the patch's source/ tree.
  2. Output lands in codex/dark-and-darker/protos/<build>/*.proto — the full proto set (Account, Ranking, MarketPlace, Merchant, Friend, Party, etc.).
  3. The ghost bot symlinks realms/darkerdb.com/ghost/src/packets/ to the current build's protos so a bot rebuild picks up the new schema.

The script logs a diff against the previous build's protos so patch-day reviews show exactly which packets changed.

3D model export (gltf)

The Unreal exporter (codex/dark-and-darker/exporter/) supports a gltf action alongside json | locres | png:

JSON
{
   "Export": [
      ".*/Characters/Monster/.*SK_.*:gltf",
      ".*/Characters/Monster/.*SM_.*:gltf"
   ]
}

Skeletal and static meshes go through CUE4Parse's MeshExporter and land as binary glTF (.glb) files under patches/<build>/data/models/. The DarkerDB API serves them directly via GET /v2/monsters/{id}/model with Content-Type: model/gltf-binary; frontends render with <model-viewer> or three.js's GLTFLoader. First release ships static T-pose meshes; animations land in a follow-up.