ARD-0012: dbx restore integration via the restore: profile schema¶
- Status: Accepted
- Date: 2026-05-23
- Deciders: Tom (Claude facilitating)
- Activates: ARD-0001 — the
data_sensitivity:machinery (parsed since v0.2 but designed-only, per ARD-0004's "parsed but a no-op for v1") becomes operationally meaningful for the first time. - Related: [[ard-0001-v1-architecture]], [[ard-0002-dbx-as-runtime-dependency]], [[ard-0004-shopify-first-as-dogfood-path]], [[ard-0007-django-node-and-multi-service-compose]], [[ard-0008-v03-to-v10-release-plan-and-thesis-evolution]], [[ard-0011-egress-enforcement-via-iptables]]
Context¶
ARD-0002 named dbx a runtime CLI dependency and listed two dbx-side PRs as prerequisites for boring's restore integration: dbx restore --transform=<script> (streaming sanitization) and dbx restore --into <container-name> (restore into a named running container). ARD-0004 deferred the whole restore path to v1.x while v1 shipped the Shopify case (which doesn't need it). ARD-0007 shipped the django-node preset but kept dbx integration deferred — content-infrastructure runs against an empty Postgres seeded by bootstrap_data.
ARD-0008 pins dbx restore to v0.5. The v1.0 thinking-medium demo gets meaningfully better when the marketer-designer-engineer-PM trio is iterating against real-shape data, not seed-data fixtures. "What if the buying-guide page had inline product comparisons" is a different conversation when you can see the comparison against an actual product catalog instead of three test products from bootstrap_data.
This ARD does three things:
- Names the new
restore:profile schema field that drives the integration (independent of the existingdata_sensitivity:field, but interlocked with it). - Pins the interlock with
data_sensitivity:so the original ARD-0001 design's safety contract is honored: nothing sensitive lands on disk unscrubbed. - Documents the two dbx-side PRs that gate the v0.5 ship (both Tom's own work).
Decision¶
1. New profile schema field: restore:¶
A new top-level profile field, declared as a list of restore sources:
restore:
- source: dbx://prod/content-infra-postgres@latest
target: postgres # compose service name from `services:`
transform: scripts/sanitize.py # required when data_sensitivity != public
when: first_up # default; alternatives below
Field semantics:
source:(required) — a dbx-resolvable URI naming the backup to restore. boring does not parse it beyond passing todbx; dbx owns the URI grammar. v0.5 supports any source dbx supports.target:(required) — the name of a compose service (declared inservices:per ARD-0007) into which dbx will restore. The named service must exist and beservice_healthybefore restore runs.transform:(optional in schema, required when the profile'sdata_sensitivityissanitized) — a host-side script path (relative to the repo root) that dbx invokes with--transform=<script>for streaming sanitization. The script reads SQL/data from stdin and writes sanitized output to stdout. boring does not author the script — it's per-project, lives in the repo, and is the team's responsibility. boring just wires it.when:(optional, defaultfirst_up) — when the restore runs in the container lifecycle. Enum:first_up— once, on firstboring openfor this profile (idempotency tracked via a marker file at/var/lib/boring/restore-complete.<source-hash>, same mechanism as thesetup:marker from ARD-0007).every_up— everyboring open. Useful for sources that change quickly or for demo scenarios where fresh data per session is the point.manual— never automatically; only when the user runsboring restore --refresh.
2. boring restore --refresh is the manual-override surface¶
A new top-level subcommand: boring restore --refresh [--source <uri>].
- No
--source— re-runs everyrestore:entry in the active profile, ignoring thewhen:and the idempotency marker. Pulls fresh data into each declared target. --source <uri>— re-runs only the matching entry. Useful when one source has updated and the rest don't need touching.
The container must be running for restore --refresh to work; the command fails clearly otherwise with an instruction to boring open first. It's safe to run while the workload is active (dbx restore semantics handle in-flight connections per --into's contract).
3. data_sensitivity: interlock — transform: is required when sensitivity is not public¶
The original ARD-0001 design encoded the safety contract: internal data never leaves prod (receiver gets empty DB), sanitized data must stream through a transform so unscrubbed bytes never land on disk, public data restores raw. v0.5 honors this contract by making transform: a required field when data_sensitivity is sanitized.
The validator behavior:
data_sensitivity |
restore: entries allowed |
transform: requirement |
|---|---|---|
internal (default) |
None. Validator rejects any restore: entries; "internal" means no real data in this container. |
n/a |
sanitized |
Any. | Required. Validator rejects entries without transform:. |
public |
Any. | Optional. |
This is enforced in lib/profile.sh's validator (the same validator at line 151 that already handles the services: and guardrails: checks). The error message names the required field and the rationale: "restore[0]: transform: required when data_sensitivity is 'sanitized' (per ARD-0012). Add 'transform: scripts/<your-sanitizer>' or set data_sensitivity: public if the data is non-sensitive."
This finally activates the data_sensitivity field the schema has parsed since v0.2. Until v0.5 it sat as design-only; v0.5 it becomes load-bearing for whether restore: validates.
4. Two upstream dbx PRs gate v0.5¶
dbx must ship two features before boring's v0.5 lands:
dbx restore --transform=<script>— streaming sanitization. Tom's own work in the dbx repo. Without this,transform:in the profile schema has nothing to invoke.dbx restore --into <container-name>— restore into a named running container (i.e., a compose sidecar). Tom's own work in the dbx repo. Without this,target:cannot point at aservices:entry.
These are not boring-side TODOs; they're upstream PRs. boring's v0.5 release ships after both dbx features are merged and released. If either slips, v0.5 slips with it (or the corresponding fields in the schema ship as unimplemented-but-validated, with cmd_open failing with a clear "requires dbx ≥ X.Y.Z" error if a profile actually declares restore — same pattern as ARD-0002 anticipated).
boring doctor already reports dbx version per ARD-0002; v0.5 raises the minimum-supported-dbx-version constant to the release that contains both PRs.
5. The restore integration uses the existing setup: lifecycle, not a new one¶
ARD-0007 shipped the setup: lifecycle hook with a /var/lib/boring/setup-complete marker. Restore reuses the same machinery:
lib/compose.shextendspostCreateCommandgeneration to emitboring-restore-run(a new host-side helper) after the existingsetup:shell concatenation and before thesetup-completemarker write.boring-restore-runwalks the profile'srestore:entries, checks eachwhen:against the idempotency marker, and invokesdbx restore --into <target-service> --transform <transform-script> <source-uri>for entries that should run.- Each successful restore writes its own marker at
/var/lib/boring/restore-complete.<source-hash>so subsequentboring opens withwhen: first_upskip the already-restored entry.
Restore happens before cmd_open verifies the setup-complete marker, so the ARD-0007 belt-and-suspenders re-run path catches restore failures the same way it catches setup: failures.
Consequences¶
Positive¶
- The thesis-pivot demo gets dramatically better. Iterating on a buying-guide page against the actual product catalog beats iterating against seed data; the conversation in the room shifts from "imagine the data" to "look at the data."
- The
data_sensitivityfield finally means something. Three years (of design conversation) of it being parsed-but-ignored ends here. The interlock turns it into an enforced safety boundary. - Loose coupling with dbx survives. boring still doesn't fork dbx, doesn't extract its libraries, doesn't reimplement restore — it just invokes the CLI with the right args per ARD-0002. All v0.5 adds is the wiring.
- The
restore:schema is reusable beyond Postgres. dbx's URI scheme handles whatever dbx handles (S3, GCS, postgres dumps, snapshot files); boring's role is to pass the URI and the target along. New backup sources land in dbx and become available to boring users with no boring release. boring restore --refreshmatches a real workflow. Tom's content-infrastructure work occasionally needs fresh data mid-session (when prod ships a content update that affects what the team is iterating on). Without--refresh, the only option is teardown + re-open, which loses session state.
Negative¶
- Upstream dependency on two dbx PRs. v0.5 cannot ship without both. If dbx work slips, boring's v0.5 slips. Mitigation: both PRs are Tom's own work; their schedule is the same person's schedule as boring's; risk is integration risk, not coordination risk.
transform:scripts are project-authored, not boring-provided. A team usingsanitizeddata has to write the scrubber. Mitigation: the scrubber is project-domain work (only the project knows what's PII in its schema); boring can't author it. v0.5 docs ship a template scrubber for Postgres + a "common patterns" reference (truncate email columns, hash user IDs, etc.).- Idempotency marker hashing of the
source:URI is heuristic. If a source URI is opaque (e.g., alwaysdbx://prod/latest) but the underlying snapshot changes,when: first_upwon't notice and won't re-restore. Mitigation:when: every_upandboring restore --refreshare both available for the "snapshot changed but URI didn't" case; documented inboring restore --help. - Restoring before container
setup:finishes (or interleaved with it) is a real failure mode.setup:for django-node runsmigratewhich assumes an empty (or migrated) DB; if restore lands a populated DB that's already at a different migration head,migratemight no-op or might do unexpected things. Mitigation: the restore step in the generatedpostCreateCommandruns aftersetup:'s explicit commands, before the marker write, and dbx's--intocontract gives us a usable DB at the end. Restore-then-migrate is the order; teams with weird flows can authorsetup:accordingly.
Neutral¶
- Sidecar credentials remain literal in the compose file. ARD-0007 deferred sidecar secret URI resolution to "when dbx-restore-into-sidecar lands"; now that it's landing, the natural follow-up is to wire sidecar credentials through the secret resolver too. Deferred to v1.x. The v0.5 scope is restore integration itself; layering sidecar-secret indirection on top of it is a separate ARD.
when: first_upvswhen: every_upmatches the v0.2setup:semantics.setup:is implicitly "first_up only" (gated by the marker);restore:'s explicitwhen:makes the gating choice visible per entry. Same mental model, more granular control.
Alternatives Considered (rejected)¶
- Make
restore:a sub-field ofservices:instead of a top-level field. Co-locate the restore source with the sidecar that receives it. Rejected: the target compose service might not be a boring-declared sidecar at all (it could bedatabase: { mode: external, dsn_secret: ... }per ARD-0001, or a future "restore into the dev container itself" use case). Top-levelrestore:with explicittarget:keeps the surface flexible. - Trigger restore via a separate
boring restoresubcommand only (no automaticwhen:). Rejected: loses the "git clone → boring open → working environment with real data" property that's the whole point. Manual-only restore means every contributor remembers to run a second command, every time, which is exactly the kind of step humans forget. - Auto-run restore on every
boring open(nowhen:field). Rejected: snapshot pulls aren't free (multi-GB backups take minutes). A team runningboring openfive times a day during active work doesn't want to wait for a restore every time.when: first_upis the right default;every_upis opt-in for cases where it's wanted. - Default
transform:to a no-op (identity passthrough) when missing. Rejected: silently passing through unsanitized prod data when the profile claimsdata_sensitivity: sanitizedis the exact safety violation the ARD-0001 contract exists to prevent. Required-field error is loud and correct. - Allow
restore:entries withdata_sensitivity: internal. Rejected: "internal" means "no real data ever in this container," per ARD-0001. Allowing restore here would contradict the field's meaning. If a profile needs real data, setdata_sensitivitytosanitized(with a transform) orpublic(if non-sensitive). - Defer the interlock; ship
restore:decoupled fromdata_sensitivity:in v0.5 and tighten the interlock later. Rejected: the interlock is the whole point ofdata_sensitivity. Shipping without it means users who declaresanitizedget no enforcement of the sanitization contract, and tightening later is a breaking schema change. Lock the safety in at the first ship. - Restore inline in
cmd_open's shell (Python/host-side), not via the in-containerpostCreateCommand. Rejected: dbx writes into the sidecar via the container network namespace; running it from the host either means dbx-on-host has network access to the sidecar (which Docker for Mac makes awkward) or means an extra network hop. In-containerpostCreateCommandkeeps dbx on the same network as its target. - Ship sidecar-secret URI resolution in v0.5. Rejected: distinct concern; expands v0.5 scope from "restore integration" to "restore + secret-indirected sidecars." Separate ARD when v1.x picks it up.
Implementation Order¶
Prerequisite (upstream, parallelizable with #1–#3 below):
- dbx PR #1:
dbx restore --transform=<script>— Tom's own dbx work. Streams the restore through the named script. - dbx PR #2:
dbx restore --into <container-name>— Tom's own dbx work. Targets a named running container instead of the local default.
boring-side, gated on the dbx PRs landing:
- Profile schema — extend
lib/profile.sh's_profile_validate_jsonvalidator (line 151) to handlerestore:(list of objects withsource,target,transform,when). Enforce thedata_sensitivityinterlock from §3. Extend_profile_normalize(line 309) to normalize the restore list. - Source-hash helper — small function in
lib/restore.sh(new module) that derives a stable hash of a restore entry for idempotency markers. SHA256 ofsource||target||transformtruncated to 12 hex chars. boring-restore-run— new host-side helper (or in-container script, depending on dbx invocation pattern). Walks the normalized profile'srestore:entries, checks eachwhen:against the marker file, invokesdbx restore --into <target> --transform <script> <source>for entries that should run, writes the per-entry marker on success.compose.shintegration —postCreateCommandgeneration grows a step between the existingsetup:concatenation and thesetup-completemarker write that callsboring-restore-run. The marker semantics remain:setup-completewritten last, re-verified post-up, re-run on missing.boring restore --refreshsubcommand — added to theboringdispatcher. Verifies container is up; callsboring-restore-runwith--force(which ignoreswhen:and markers); supports--source <uri>to scope to one entry.boring doctorupdates — verify minimum dbx version (raised to the version containing both PRs); for the active profile, list eachrestore:entry's last-restored timestamp if known.- content-infrastructure profile migration — author the actual
restore:block for content-infrastructure pointing at the real prod-postgres backup with a real sanitization transform (the transform is project work in content-infrastructure, not boring work; the schema additions in the profile are the boring-visible change). - End-to-end smoke — open content-infrastructure with the new
restore:block; verify Postgres sidecar comes up healthy; verify dbx restore runs aftersetup:'s migrate step; verify the marker is written and a secondboring opendoesn't re-restore; verifyboring restore --refreshdoes re-restore; verify a profile declaringdata_sensitivity: sanitizedwithout atransform:fails validation at parse time. - Docs — README section on the restore lifecycle;
boring restore --help; sample sanitization scripts for Postgres (Python + dbx-transform contract). - CHANGELOG entry referencing this ARD and the two dbx PRs.
lib/restore.sh is the new module; lib/dbx.sh (the existing 30-line wrapper) gets the dbx_restore_into helper added.