Design: Comprehensive HA Pre-Deploy Test Suite¶
Date: 2026-06-07 Status: Draft — awaiting review Goal: A layered, growable test suite that validates every change against a containerized Home Assistant instance (matching the live version) before it deploys to the live HAOS instance, so a bad change can never reach the live system. Tests gate deployment.
1. Background & current state¶
- Config is edited in this repo and auto-deployed to live HAOS by GitHub Actions on push to
main(deploy.yml,deploy-grafana.yml,deploy-lovelace.yml). - A
validate.ymlworkflow already runs containerizedcheck_config(viafrenck/action-home-assistant, with dummy secrets) — and it passes — but it runs in parallel with the deploy workflows. It does not block a broken config from deploying. This is the core gap. - Deployable YAML uses only core integrations (
default_config,frontend,automation,script,scene,homekit,influxdb,template,mqtt,input_boolean,input_button) — nocustom_components. So containerizedcheck_configworks against a clean image. - Live instance: HA 2026.6.1.
- The HA MCP server is connected in the authoring environment (
ha_list_entity_registry,ha_list_states), so the real entity registry can be snapshotted without new credentials.
2. Requirements¶
- Gate deploys on tests — no deploy job runs unless the test job is green.
- Containerized HA check before live deploy — run
check_configon the official HA image pinned to the live version. - Comprehensive + growable — organized so new checks are cheap to add over time.
- Catch real regressions — especially renamed/deleted entity references and the classes of bug seen during the Grafana work (unresolved datasource templates, hardcoded time windows).
- Run locally — a single command reproduces the CI suite before pushing.
- No new mandatory access for the initial scope (Tiers 0–3).
Out of scope (initial)¶
- Full-boot smoke test with a sanitized
.storagesnapshot (Tier 4 — future; needs a backup upload). - Runtime/behavioral assertions of automation logic (would require a booted instance with real or mocked entities + state injection).
3. Scope decisions (approved)¶
- Tiers 0–3.
- Consolidate
validate.yml+ the three deploy workflows into a singleci.yml(test → gated deploys). - Local runner:
scripts/test.sh+Makefile.
4. Architecture¶
4.1 File layout¶
.github/workflows/ci.yml # replaces validate.yml, deploy.yml, deploy-grafana.yml, deploy-lovelace.yml
tests/
conftest.py # fixtures: parse all yaml once; load entity snapshot
test_static.py # Tier 0
test_structure.py # Tier 2
test_entity_references.py # Tier 3
fixtures/
entities.txt # committed snapshot of real entity_ids (one per line)
ci_secrets.yaml # dummy secrets used by check_config
requirements.txt # pytest, pyyaml, jsonschema, yamllint
.yamllint # relaxed config (tolerates HA !include/!secret tags)
scripts/
test.sh # local: Tiers 0-3 incl. docker check_config
refresh-entity-snapshot.sh # regenerate entities.txt from live HA REST API (run occasionally w/ token)
Makefile # make test / make check-config / make snapshot / make lint
4.2 The tiers¶
Tier 0 — Static (pytest + yamllint, no container). - yamllint across all YAML (relaxed: line length off, HA custom tags allowed). - JSON validity: lovelace/home_command_dashboard.json, grafana/dashboard.json. - Grafana regression guards (codify today's bugs): - no literal ${DS_INFLUXDB} remaining in grafana/dashboard.json; - no __inputs import block in the provisioned dashboard; - no panel query containing now()- (must use $timeFilter); - every panel target has a non-empty datasource uid.
Tier 1 — Containerized HA check_config (version-pinned). - docker run --rm -v <repo>:/config homeassistant/home-assistant:${HA_VERSION} python -m homeassistant --script check_config --config /config - ${HA_VERSION} defined once (pinned to the live version, 2026.6.1); bump when the live instance upgrades. Falls back to the 2026.6 minor stream if a patch tag is unavailable. - tests/fixtures/ci_secrets.yaml is copied to secrets.yaml first; themes/ placeholder already exists in the repo. - Identical command in CI and scripts/test.sh so local == CI.
Tier 2 — Structural pytest (parses YAML, no container). - Every automation has a unique id and a non-empty alias; no duplicate ids/aliases. - Helpers referenced by automations/scripts exist in configuration.yaml (input_boolean.*, input_button.*). - Known-bad references fail the build (seeded from CLAUDE.md, e.g. person.lindsay_saady). - TOU template sensors (sensor.tou_period, sensor.tou_rate) and input_boolean.peak_mode are present. - Scenes/scripts referenced by automations exist.
Tier 3 — Entity-reference regression (pytest + committed snapshot). - tests/fixtures/entities.txt = snapshot of the live entity registry (generated via MCP now; refreshed via refresh-entity-snapshot.sh). - Extract every entity_id referenced across automations.yaml, scripts.yaml, configuration*.yaml templates, and the dashboards. - Assert each referenced entity exists in the snapshot. Failures list the offending entity + file. - Allowlist mechanism (tests/fixtures/entities_allow.txt) for intentional references not yet in the registry (e.g., not-yet-paired devices) so the suite never blocks legitimate forward work — but every allowlisted item is explicit and visible.
4.3 Pipeline (ci.yml)¶
Triggers: push to main, pull_request to main, workflow_dispatch.
jobs:
changes: # dorny/paths-filter@v3 → outputs: ha_config, grafana, lovelace
test: # ALWAYS runs. python setup → pip install tests/requirements.txt
# → pytest tests/ (Tiers 0,2,3) → docker check_config (Tier 1)
deploy-ha: needs: [changes, test] if: changes.ha_config # ports deploy.yml logic (reload vs restart)
deploy-grafana: needs: [changes, test] if: changes.grafana # ports deploy-grafana.yml
deploy-lovelace: needs: [changes, test] if: changes.lovelace # ports deploy-lovelace.yml
pull_request, only changes + test run (no deploy) — PRs get validated without deploying. - Concurrency group preserved so deploys never overlap. - All existing secrets reused (TAILSCALE_*, HAOS_SSH_KEY, HAOS_HOST, HA_TOKEN). 4.4 Local runner¶
make test→ runsscripts/test.sh: pip install (venv),yamllint,pytest tests/, then Dockercheck_config.make check-config→ Tier 1 only.make lint→ Tier 0 only.make snapshot→refresh-entity-snapshot.sh.scripts/test.shdegrades gracefully if Docker is absent (skips Tier 1 with a clear warning; CI always runs it).
5. What it does and does not catch (honesty)¶
Catches: YAML syntax errors; invalid automation/script/template/integration schemas (against the real HA version); JSON corruption in dashboards; the Grafana datasource/time-window regressions; duplicate/missing automation ids; references to helpers/entities that don't exist; known-bad entity names.
Does NOT catch (initially): automation logic correctness; entities that exist but are unavailable at runtime; anything configured only in .storage (UI integrations) — those aren't in the deployed YAML; behavior requiring a booted instance. Tier 4 (future) addresses the boot/runtime gap.
6. Access required¶
- Tiers 0–3: none. Entity snapshot is generated via the already-connected MCP;
refresh-entity-snapshot.sh(for later refreshes) uses the HA REST API with the existingHA_TOKEN(verified working against core/api/→ 200). - Grafana verification (add-on layer): a Grafana service-account token as repo secret
GRAFANA_TOKEN. - Verified by experiment: a Home Assistant token does not reach Grafana's API — Grafana is behind HA ingress + an auth-proxy IP allowlist, and a bearer token gets
401at ingress. So Grafana needs its own credential. - A Grafana service-account bearer token authenticates on Grafana's normal auth path and works directly against the internal API (
http://a0d7b954-grafana.local.hass.io:3000) from CI-over-SSH — no ingress, no IP whitelist. - Created in Grafana UI: Administration → Users and access → Service accounts → Add (role Viewer) → Add token; stored as repo secret
GRAFANA_TOKEN. - Future Tier 4: a sanitized HA backup/
.storagesnapshot to seed a full boot.
6a. Grafana post-deploy verification (add-on layer)¶
After deploy-grafana, a CI step SSHes to HAOS and queries Grafana's internal API with GRAFANA_TOKEN to assert the dashboard actually provisioned correctly: - dashboard home-intelligence-v1 exists and is current; - every panel target resolves to a real datasource uid (no unresolved ${DS_INFLUXDB}); - (optional) a sample panel query returns rows.
This catches the "No data / panels missing" class of regression automatically. It is not required for Tiers 0–3 and is skipped if GRAFANA_TOKEN is absent.
7. Verification plan¶
- Run
scripts/test.shlocally; confirm Tiers 0–3 pass on currentmain. - Confirm Tier 1
check_configpasses onhomeassistant/home-assistant:2026.6. - Negative tests: temporarily introduce (a) a YAML syntax error, (b) a duplicate automation id, (c) a
now()-24hquery, (d) a bogus entity reference — confirm each fails the right tier. Revert. - Push to a branch / open a PR; confirm
testruns and no deploy job runs on the PR. - Confirm on merge to
mainthat deploys run only aftertestis green.
8. Risks & mitigations¶
- Consolidation breaks deploys. Mitigation: port the exact deploy logic verbatim; verify each deploy job triggers via path filter; keep the old files in git history; first run watched.
- Tier 3 false positives (legit references to unpaired devices). Mitigation: explicit allowlist file.
- Snapshot staleness (entity renamed on the instance → repo references now "wrong"). Mitigation:
make snapshotrefresh + documented cadence; allowlist for transitional states. - HA version drift (container ≠ live after an upgrade). Mitigation: single
HA_VERSIONvariable; documented in CLAUDE.md to bump on upgrade. - check_config network/hardware integrations.
check_configvalidates schema without starting integrations; current config already passes, so low risk.
9. Growth path¶
- Tier 4 full-boot smoke (sanitized snapshot).
- Per-automation logic tests via
pytest-homeassistant-custom-componentor a booted container with state injection. - Lovelace deeper schema checks (card types, referenced entities per card).
- Auto-refresh of the entity snapshot on a schedule (cron workflow opening a PR).