Design: Comprehensive HA Pre-Deploy Test Suite¶

Date: 2026-06-07 Status: Draft — awaiting review Goal: A layered, growable test suite that validates every change against a containerized Home Assistant instance (matching the live version) before it deploys to the live HAOS instance, so a bad change can never reach the live system. Tests gate deployment.

1. Background & current state¶

Config is edited in this repo and auto-deployed to live HAOS by GitHub Actions on push to main (deploy.yml, deploy-grafana.yml, deploy-lovelace.yml).
A validate.yml workflow already runs containerized check_config (via frenck/action-home-assistant, with dummy secrets) — and it passes — but it runs in parallel with the deploy workflows. It does not block a broken config from deploying. This is the core gap.
Deployable YAML uses only core integrations (default_config, frontend, automation, script, scene, homekit, influxdb, template, mqtt, input_boolean, input_button) — no custom_components. So containerized check_config works against a clean image.
Live instance: HA 2026.6.1.
The HA MCP server is connected in the authoring environment (ha_list_entity_registry, ha_list_states), so the real entity registry can be snapshotted without new credentials.

2. Requirements¶

Gate deploys on tests — no deploy job runs unless the test job is green.
Containerized HA check before live deploy — run check_config on the official HA image pinned to the live version.
Comprehensive + growable — organized so new checks are cheap to add over time.
Catch real regressions — especially renamed/deleted entity references and the classes of bug seen during the Grafana work (unresolved datasource templates, hardcoded time windows).
Run locally — a single command reproduces the CI suite before pushing.
No new mandatory access for the initial scope (Tiers 0–3).

Out of scope (initial)¶

Full-boot smoke test with a sanitized .storage snapshot (Tier 4 — future; needs a backup upload).
Runtime/behavioral assertions of automation logic (would require a booted instance with real or mocked entities + state injection).

3. Scope decisions (approved)¶

Tiers 0–3.
Consolidate validate.yml + the three deploy workflows into a single ci.yml (test → gated deploys).
Local runner: scripts/test.sh + Makefile.

4. Architecture¶

4.1 File layout¶

.github/workflows/ci.yml          # replaces validate.yml, deploy.yml, deploy-grafana.yml, deploy-lovelace.yml
tests/
  conftest.py                      # fixtures: parse all yaml once; load entity snapshot
  test_static.py                   # Tier 0
  test_structure.py                # Tier 2
  test_entity_references.py        # Tier 3
  fixtures/
    entities.txt                   # committed snapshot of real entity_ids (one per line)
    ci_secrets.yaml                # dummy secrets used by check_config
  requirements.txt                 # pytest, pyyaml, jsonschema, yamllint
.yamllint                          # relaxed config (tolerates HA !include/!secret tags)
scripts/
  test.sh                          # local: Tiers 0-3 incl. docker check_config
  refresh-entity-snapshot.sh       # regenerate entities.txt from live HA REST API (run occasionally w/ token)
Makefile                           # make test / make check-config / make snapshot / make lint

4.2 The tiers¶

Tier 0 — Static (pytest + yamllint, no container). - yamllint across all YAML (relaxed: line length off, HA custom tags allowed). - JSON validity: lovelace/home_command_dashboard.json, grafana/dashboard.json. - Grafana regression guards (codify today's bugs): - no literal ${DS_INFLUXDB} remaining in grafana/dashboard.json; - no __inputs import block in the provisioned dashboard; - no panel query containing now()- (must use $timeFilter); - every panel target has a non-empty datasource uid.

Tier 1 — Containerized HA check_config (version-pinned). - docker run --rm -v <repo>:/config homeassistant/home-assistant:${HA_VERSION} python -m homeassistant --script check_config --config /config - ${HA_VERSION} defined once (pinned to the live version, 2026.6.1); bump when the live instance upgrades. Falls back to the 2026.6 minor stream if a patch tag is unavailable. - tests/fixtures/ci_secrets.yaml is copied to secrets.yaml first; themes/ placeholder already exists in the repo. - Identical command in CI and scripts/test.sh so local == CI.

Tier 2 — Structural pytest (parses YAML, no container). - Every automation has a unique id and a non-empty alias; no duplicate ids/aliases. - Helpers referenced by automations/scripts exist in configuration.yaml (input_boolean.*, input_button.*). - Known-bad references fail the build (seeded from CLAUDE.md, e.g. person.lindsay_saady). - TOU template sensors (sensor.tou_period, sensor.tou_rate) and input_boolean.peak_mode are present. - Scenes/scripts referenced by automations exist.

Tier 3 — Entity-reference regression (pytest + committed snapshot). - tests/fixtures/entities.txt = snapshot of the live entity registry (generated via MCP now; refreshed via refresh-entity-snapshot.sh). - Extract every entity_id referenced across automations.yaml, scripts.yaml, configuration*.yaml templates, and the dashboards. - Assert each referenced entity exists in the snapshot. Failures list the offending entity + file. - Allowlist mechanism (tests/fixtures/entities_allow.txt) for intentional references not yet in the registry (e.g., not-yet-paired devices) so the suite never blocks legitimate forward work — but every allowlisted item is explicit and visible.

4.3 Pipeline (`ci.yml`)¶

Triggers: push to main, pull_request to main, workflow_dispatch.

jobs:
  changes:          # dorny/paths-filter@v3 → outputs: ha_config, grafana, lovelace
  test:             # ALWAYS runs. python setup → pip install tests/requirements.txt
                    #   → pytest tests/ (Tiers 0,2,3) → docker check_config (Tier 1)
  deploy-ha:        needs: [changes, test]   if: changes.ha_config   # ports deploy.yml logic (reload vs restart)
  deploy-grafana:   needs: [changes, test]   if: changes.grafana     # ports deploy-grafana.yml
  deploy-lovelace:  needs: [changes, test]   if: changes.lovelace    # ports deploy-lovelace.yml

- On pull_request, only changes + test run (no deploy) — PRs get validated without deploying. - Concurrency group preserved so deploys never overlap. - All existing secrets reused (TAILSCALE_*, HAOS_SSH_KEY, HAOS_HOST, HA_TOKEN).

4.4 Local runner¶

make test → runs scripts/test.sh: pip install (venv), yamllint, pytest tests/, then Docker check_config.
make check-config → Tier 1 only. make lint → Tier 0 only. make snapshot → refresh-entity-snapshot.sh.
scripts/test.sh degrades gracefully if Docker is absent (skips Tier 1 with a clear warning; CI always runs it).

5. What it does and does not catch (honesty)¶

Catches: YAML syntax errors; invalid automation/script/template/integration schemas (against the real HA version); JSON corruption in dashboards; the Grafana datasource/time-window regressions; duplicate/missing automation ids; references to helpers/entities that don't exist; known-bad entity names.

Does NOT catch (initially): automation logic correctness; entities that exist but are unavailable at runtime; anything configured only in .storage (UI integrations) — those aren't in the deployed YAML; behavior requiring a booted instance. Tier 4 (future) addresses the boot/runtime gap.

6. Access required¶

Tiers 0–3: none. Entity snapshot is generated via the already-connected MCP; refresh-entity-snapshot.sh (for later refreshes) uses the HA REST API with the existing HA_TOKEN (verified working against core /api/ → 200).
Grafana verification (add-on layer): a Grafana service-account token as repo secret GRAFANA_TOKEN.
Verified by experiment: a Home Assistant token does not reach Grafana's API — Grafana is behind HA ingress + an auth-proxy IP allowlist, and a bearer token gets 401 at ingress. So Grafana needs its own credential.
A Grafana service-account bearer token authenticates on Grafana's normal auth path and works directly against the internal API (http://a0d7b954-grafana.local.hass.io:3000) from CI-over-SSH — no ingress, no IP whitelist.
Created in Grafana UI: Administration → Users and access → Service accounts → Add (role Viewer) → Add token; stored as repo secret GRAFANA_TOKEN.
Future Tier 4: a sanitized HA backup/.storage snapshot to seed a full boot.

6a. Grafana post-deploy verification (add-on layer)¶

After deploy-grafana, a CI step SSHes to HAOS and queries Grafana's internal API with GRAFANA_TOKEN to assert the dashboard actually provisioned correctly: - dashboard home-intelligence-v1 exists and is current; - every panel target resolves to a real datasource uid (no unresolved ${DS_INFLUXDB}); - (optional) a sample panel query returns rows.

This catches the "No data / panels missing" class of regression automatically. It is not required for Tiers 0–3 and is skipped if GRAFANA_TOKEN is absent.

7. Verification plan¶

Run scripts/test.sh locally; confirm Tiers 0–3 pass on current main.
Confirm Tier 1 check_config passes on homeassistant/home-assistant:2026.6.
Negative tests: temporarily introduce (a) a YAML syntax error, (b) a duplicate automation id, (c) a now()-24h query, (d) a bogus entity reference — confirm each fails the right tier. Revert.
Push to a branch / open a PR; confirm test runs and no deploy job runs on the PR.
Confirm on merge to main that deploys run only after test is green.

8. Risks & mitigations¶

Consolidation breaks deploys. Mitigation: port the exact deploy logic verbatim; verify each deploy job triggers via path filter; keep the old files in git history; first run watched.
Tier 3 false positives (legit references to unpaired devices). Mitigation: explicit allowlist file.
Snapshot staleness (entity renamed on the instance → repo references now "wrong"). Mitigation: make snapshot refresh + documented cadence; allowlist for transitional states.
HA version drift (container ≠ live after an upgrade). Mitigation: single HA_VERSION variable; documented in CLAUDE.md to bump on upgrade.
check_config network/hardware integrations. check_config validates schema without starting integrations; current config already passes, so low risk.

9. Growth path¶

Tier 4 full-boot smoke (sanitized snapshot).
Per-automation logic tests via pytest-homeassistant-custom-component or a booted container with state injection.
Lovelace deeper schema checks (card types, referenced entities per card).
Auto-refresh of the entity snapshot on a schedule (cron workflow opening a PR).