Dark and Darker extraction
The codex extraction pipeline for Dark and Darker lives at ~/.katforge/codex/ as a Docker Compose stack. It runs in five stages:
- fetch — pull the game depots from Steam via DepotDownloader.
- usmap — dump UE5 type metadata (
.usmap) by bootingTavern.exeunder Proton with the UE4SS AutoUSMAP mod. - decode — decode paks to JSON + PNG + locres via a thin .NET CLI built on CUE4Parse.
- adapt — walk the per-patch dump tree with a PHP adapter (
KAT\Codex\Adapter\DarkAndDarker) that emits a normalized envelope. - import — land the envelope into the codex DB via
bin/console codex:import.
Steps 1–3 run inside the dark-and-darker compose service; steps 4–5 run inside the hearth api container. The hearth codex CLI orchestrates both.
- Tavern.exe — the Dark and Darker game executable.
- Proton — Valve's Wine fork, lets Linux run Windows games.
- UE4SS — a Lua-scriptable plugin host for Unreal Engine games.
- AutoUSMAP — a UE4SS Lua mod that dumps type metadata at game-thread entry.
.usmap— Unreal Engine "mapping" file describing the type layout the cooked data uses.- CUE4Parse — open-source .NET library for reading cooked Unreal Engine
.pakand.uassetfiles. .locres— Unreal localization resource — a binary key/value store of translated strings.
Operator UX
# Extract: fetch + usmap + decode (defaults to latest upstream BuildVersion.txt).
hearth codex extract dark-and-darker
hearth codex extract dark-and-darker --patch=0.16.135.8645-2663
# Import: run the adapter, then codex:import. Defaults to the most recently
# extracted patch on disk.
hearth codex import dark-and-darker
# One-shot: extract + import in sequence.
hearth codex update dark-and-darker
# Read: query a single entity from the local hearth api.
hearth codex get dark-and-darker id.item.longsword_5001
import and the underlying codex:import are idempotent — re-running with the same patch + branch yields added=0 changed=0 removed=0 unchanged=N.
Per-patch layout
Each patch is self-contained under the per-game dir:
~/.katforge/codex/dark-and-darker/
patches/
<version>/
source/ DepotDownloader output (paks, exe, dlls)
data/
json/ ~90k extracted .uasset → JSON
icons/ ~3k PNGs (icons + portraits)
locales/ .locres → flat namespace/key/value JSON
manifest/decode.json counts + paths
logs/decode.*.{log,err} per-pass diagnostics
logs/proton.log usmap stage stderr
Mappings.usmap UE5 type metadata
BuildVersion.txt the upstream version this patch was tagged with
import-envelope.json generated by codex:adapt; consumed by codex:import
steam-state/ persistent DepotDownloader cache (~10 GB)
dd-auth/ DepotDownloader account.config (Steam Guard token)
The mount inside the dark-and-darker container is ./dark-and-darker/patches:/work, so scripts see paths like /work/<version>/source/.... The hearth api container mounts ../codex:/codex so codex:adapt can read /codex/<slug>/patches/<version>/data/....
Sequence
sequenceDiagram
autonumber
participant Op as Operator
participant Hearth as hearth CLI
participant DnD as dark-and-darker container
participant API as hearth api container
participant Codex as codex DB
Op->>Hearth: hearth codex update dark-and-darker
Hearth->>DnD: docker exec extract <version>
DnD->>DnD: fetch (DepotDownloader)
DnD->>DnD: usmap (Proton + UE4SS)
DnD->>DnD: decode (CUE4Parse + locres)
DnD-->>Hearth: patches/<v>/{source, data, manifest, logs}
Hearth->>API: docker exec codex:adapt
API-->>API: walk data/json/, register icons, emit envelope
API-->>Hearth: patches/<v>/import-envelope.json
Hearth->>API: docker exec codex:import
API->>Codex: INSERT entities, assets, locale_strings, diffs
Codex-->>API: report
API-->>Hearth: added=N changed=M removed=K unchanged=LStages in detail
1. fetch (DepotDownloader)
scripts/fetch.sh invokes DepotDownloader against app 2016590, writing depots to <patch>/source/. Steam credentials are read from a darkerdb config.ini mounted at /run/secrets/darkerdb-config.ini. Steam Guard 2FA codes come from steamguard-cli against a maFile mounted at /run/secrets/steamguard; if neither is available the operator can pass STEAM_GUARD_CODE=XXXXX for one run and the cached token covers subsequent runs.
2. usmap (Proton + UE4SS)
scripts/usmap.sh boots Tavern.exe under Proton Experimental with xvfb and the headless-darker UE4SS payload. The AutoUSMAP Lua mod calls DumpUSMAP() on game-thread entry; Mappings.usmap lands at <patch>/Mappings.usmap. Proton stderr → <patch>/logs/proton.log. Default timeout 300s.
3. decode (CUE4Parse)
scripts/decode.sh runs the .NET exporter (/opt/unrealexporter/UnrealExporter, built from dark-and-darker/exporter/) in three configs:
dark-and-darker-json-{0..31}— chunked JSON dump of every.uasset/.umapunderDungeonCrawler/Content/DungeonCrawler/. Chunked because a single pass OOMs on the ~90k file working set; each chunk is a fresh PHP process keyed byFNV-1a(path) % 32.dark-and-darker-icons— PNG export of*Icon_*.uassetandPortrait_*.uassetfiles via the AssetRipper texture decoder (Linux-safe; Detex.dll isn't loadable here).dark-and-darker-locres— flat JSON dump of every.locresfile via CUE4Parse'sFTextLocalizationResource.
Per-pass output is sorted into data/{json,icons,locales}/ after the run; per-pass logs go to logs/decode.*.{log,err}.
The exporter is our thin .NET CLI under dark-and-darker/exporter/, not luk-gg/UnrealExporter — that upstream is unmaintained and pinned to a CUE4Parse version too stale to handle current DaD cooked data. We pin CUE4Parse directly via CUE4PARSE_REF in the Dockerfile.
4. adapt (codex:adapt)
bin/console codex:adapt dark-and-darker --dump-dir=<path> --output=<path> runs in the api container. Internally it instantiates KAT\Codex\Adapter\DarkAndDarker\Normalizer, walks every registered type's V2 directory, and writes a single envelope:
{
"entities": [ { "id": "id.item.longsword_5001", "type": "item", ... }, ... ],
"assets": [ { "hash": "...", "media_type": "image/png", "bytes": 8868, "path": "..." }, ... ],
"locales": [ { "locale": "en", "namespace": "DC", "key": "...", "value": "Longsword" }, ... ]
}
Per-type modeling lives in KAT\Codex\Adapter\DarkAndDarker\Type\<Name> — one class per V2 entity type. The base Type class handles everything generic (file glob, projection, locale lookup); subclasses override tagsFor(), linksFor(), iconFor() for per-type behavior. See Adding a type.
5. import (codex:import)
bin/console codex:import dark-and-darker --patch-version-raw=<v> --source=<envelope> lands the envelope. KAT\Codex\Import\Importer runs three sub-importers in order:
AssetRegistrar— copies icons intovar/codex-assets/<hash[0..2]>/<hash>.<ext>and upsertscodex.assets.- Entity loop — snapshot-on-change inserts into
codex.entities, plusentity_resolved_current,entity_links,entity_diffs,source_keys. LocaleImporter— bulk INSERT intocodex.locale_strings(1k-row batches; one prior-snapshot fetch).
A register_shutdown_function closes the codex.imports row as failed even on fatal errors (memory_limit, segfault), so an OOM mid-run leaves a diagnosable trail instead of a stuck running row.
Adding a type
V2 currently has ~55 entity types under Data/Generated/V2/<TypeDir>/<TypeDir>/Id_*.json. Each one becomes a Type/*.php class:
namespace KAT\Codex\Adapter\DarkAndDarker\Type;
use KAT\Codex\Adapter\DarkAndDarker\AssetRef;
use KAT\Codex\Adapter\DarkAndDarker\Id;
use KAT\Codex\Adapter\DarkAndDarker\IdTag;
use KAT\Codex\Import\NormalizedAsset;
final class MyThing extends Type
{
public function codexType (): string { return 'my_thing'; }
public function v2Subdir (): string { return 'MyThing'; }
protected function tagsFor (array $props): array
{
return IdTag::tagsFor ([ $props ['Tier'] ?? null, $props ['Category'] ?? null ]);
}
protected function linksFor (array $props): array
{
$links = [];
foreach ($props ['Abilities'] ?? [] as $entry) {
$name = AssetRef::name ($entry);
if ($name === null) continue;
$links [] = [ 'rel' => 'has_ability', 'target' => Id::for ('my_thing_ability', $name) ];
}
return $links;
}
protected function iconFor (array $props, string $assetName): ?NormalizedAsset
{
return $this->icons->viaArtData ($props ['ArtData'] ?? null);
}
}
Then register it in Normalizer::dump():
$types = [
...,
new MyThing ($dump, $icons, $locale)
];
For types whose upstream id prefix doesn't match the directory name (e.g. Id_DesignDataReligion_* vs religion):
public function v2Subdir (): string { return 'Religion'; }
protected function idPrefix (): string { return 'DesignDataReligion'; }
Id::for('religion', $name, 'DesignDataReligion') strips the right prefix.
Idempotency
Every step is idempotent on its own:
extractoverwrites the per-version dir; re-running on an unchanged version produces byte-identical output.codex:adaptis pure (same dump → same envelope; SHA-256s of icons are the dedup key).codex:importis snapshot-on-change keyed by per-entity content_hash. Re-running with no changes:added=0 changed=0 removed=0 unchanged=N.
Caveat: when an adapter change updates an entity's content (e.g. new tag derivation, more icons resolved), re-importing the same patch version is a no-op by design — the importer refuses to overwrite an existing row at the same patch (the "history rewrite" guard). To force a backfill, wipe the patch's rows in codex.entities (and the dependent tables) and re-import. The next patch import picks up the change naturally.
Diagnostics
# Pipeline state for a patch.
cat ~/.katforge/codex/dark-and-darker/patches/<v>/manifest/decode.json
# Per-pass diagnostics (if the importer reports non-zero failed counts).
ls ~/.katforge/codex/dark-and-darker/patches/<v>/logs/
# DB-side import log.
docker exec katforge-hearth-api-1 bin/console dbal:run-sql --force-fetch \
"SELECT id, status, added, changed, removed, unchanged, error, started_at
FROM codex.imports
ORDER BY id DESC LIMIT 10"
# Per-type counts on a branch.
docker exec katforge-hearth-api-1 bin/console dbal:run-sql --force-fetch \
"SELECT type, count(*) AS n
FROM codex.entity_resolved_current
WHERE game = 'dark-and-darker' AND branch = 'main'
GROUP BY type ORDER BY n DESC"
# Single entity via the public endpoint.
hearth codex get dark-and-darker id.item.longsword_5001
Auth
Steam credentials come from $DARKERDB_CONFIG (default ~/Projects/darkerdb/config.ini), parsed for [darkerdb.steam].username and .password. Steam Guard 2FA codes auto-generate from steamguard-cli against the maFile at $STEAMGUARD_MAFILES (default ~/.config/steamguard-cli/maFiles). After the first successful run the DepotDownloader token caches in dark-and-darker/dd-auth/ for weeks; subsequent fetches need no codes.
Protobuf extraction (protodump)
A separate sub-pipeline pulls the game's network proto definitions out of the binary so the
DarkerDB ghost bot can stay synchronized with patches:
bin/protodump <build>runsprotodumpagainstDungeonCrawler.exein the patch'ssource/tree.- Output lands in
codex/dark-and-darker/protos/<build>/*.proto— the full proto set (Account, Ranking, MarketPlace, Merchant, Friend, Party, etc.). - The ghost bot symlinks
realms/darkerdb.com/ghost/src/packets/to the current build's protos so a bot rebuild picks up the new schema.
The script logs a diff against the previous build's protos so patch-day reviews show exactly which packets changed.
3D model export (gltf)
The Unreal exporter (codex/dark-and-darker/exporter/) supports a gltf action alongside json | locres | png:
{
"Export": [
".*/Characters/Monster/.*SK_.*:gltf",
".*/Characters/Monster/.*SM_.*:gltf"
]
}
Skeletal and static meshes go through CUE4Parse's MeshExporter and land as binary glTF (.glb) files under patches/<build>/data/models/. The
DarkerDB API serves them directly via GET /v2/monsters/{id}/model with Content-Type: model/gltf-binary; frontends render with <model-viewer> or three.js's GLTFLoader. First release ships static T-pose meshes; animations land in a follow-up.