Schema Change Policy

How to change a production SQL schema without breaking the app mid-deploy or mid-rollback. Applies to PRGJSMES, project-explorer, erp-migration-tool, and any other PSI app running against Azure SQL.

Why this policy exists

Our deploy pattern is blue-green via slot swap: a new build goes to staging, we swap it into production atomically, and if anything goes wrong we swap back — which leaves the previous build still alive in the staging slot for ~30-second rollback.

This works perfectly for code changes. For schema changes, it only works if both the old and new code versions work against the new schema simultaneously. If a migration drops a column the old code still reads, the rollback slot will immediately crash when traffic hits it.

Schema changes are the single biggest cause of “we can’t roll back this release” incidents. This policy exists to keep rollback always available.

The two categories every change falls into

✅ Additive — one deploy, always safe

ChangeWhy it’s safe
CREATE TABLE NewThing (...)Old code never references it; new code does.
ALTER TABLE X ADD NewCol ... NULLNULL or default provided → old code doesn’t need to know about it.
CREATE INDEX IX_X_Y ON X(Y)Transparent to code; queries just get faster (or slower if badly designed).
New row in a seed tableReads pick it up; writes unaffected.

Ship it. No special ceremony.

❌ Destructive / type-changing — requires expand-contract across multiple deploys

ChangeWhy it’s dangerous
ALTER TABLE X DROP COLUMN BarAny code referencing Bar crashes instantly.
DROP TABLE XSame, at table level.
ALTER TABLE X ALTER COLUMN Bar NVARCHAR(50) → INTType change breaks reads and writes that used the old type.
EXEC sp_rename 'X.OldCol', 'NewCol'Same as drop from the old name’s perspective.
Adding a NOT NULL column without a defaultOld code doing INSERT without that column fails.
ALTER TABLE X ADD CONSTRAINT ... that rejects existing dataFails midway through deploy.

None of these can ship in one deploy without breaking rollback.

The expand-contract pattern

For any destructive or type-changing schema update, split it into at least two deploys:

┌──────────────┐       ┌──────────────┐       ┌──────────────┐
│ Deploy 1     │       │ Deploy 2     │       │ Deploy 3     │
│ EXPAND       │   →   │ MIGRATE      │   →   │ CONTRACT     │
│              │       │              │       │              │
│ Add the new, │       │ Backfill.    │       │ Drop the old │
│ keep the old.│       │ Flip reads.  │       │ thing.       │
│ Write both.  │       │ Still write  │       │              │
│              │       │ both.        │       │              │
└──────────────┘       └──────────────┘       └──────────────┘
   rollback-safe          rollback-safe          rollback-safe

Key property: at every point, both the old code and the new code can run against the schema. Rollback always works.

Worked example: renaming PartsReceivedQtyReceived

Deploy 1 (EXPAND):

-- Migration
ALTER TABLE Orders ADD QtyReceived INT NULL;   -- ADDITIVE: widening
UPDATE Orders SET QtyReceived = PartsReceived; -- backfill once
// Code writes to both, reads from the old column
order.PartsReceived = qty;
order.QtyReceived = qty;  // dual-write
// ... reads: ctx.Orders.Select(o => o.PartsReceived) ...

After Deploy 1: both columns exist and match. Old code still works (ignores QtyReceived); new code dual-writes. Rollback = re-swap slots; previous code still reads PartsReceived which is still populated. ✅

Deploy 2 (MIGRATE — switch reads):

// Code now reads from the new column; still writes to both
// ... reads: ctx.Orders.Select(o => o.QtyReceived) ...
// Still writes both (in case we need to roll back)
order.PartsReceived = qty;
order.QtyReceived = qty;

After Deploy 2: new code reads QtyReceived. Rollback = previous deploy, which also dual-writes. ✅

Deploy 3 (CONTRACT — drop dual-write, then drop column):

This deploy can be split further into 3a (stop writing the old column) and 3b (actually drop it), but often it’s fine to combine them after a stable period.

// Code only writes the new column
order.QtyReceived = qty;
-- BREAKING: removing old column after 2 deploys of dual-write
ALTER TABLE Orders DROP COLUMN PartsReceived;

After Deploy 3: column gone. Rollback to Deploy 2 code is still safe because Deploy 2’s dual-write writes to both, but now PartsReceived doesn’t exist — so the dual-write fails silently or throws. Rollback window closes at this point.

Minimum time between phases: at least one stable day per phase, preferably a week.

When in doubt, choose the safer path

SituationPolicy
”It’s just a typo fix in a column name”Still expand-contract. Code that reads the old name will break.
”Nobody uses that column”Are you sure? Check wwwroot search, audit logs, reports, Power BI queries, integrations. Then still do expand-contract in case your audit missed something.
”We need this change today for a P1 incident”Do the minimum additive fix now. Full migration later under normal process.
”The old code is the broken one — I want it gone”Expand-contract is still required. Broken-old-code rollback is still preferable to a hosed database.

The SQL linter enforces this

scripts/lint-sql-scripts.js (runs on every PR) fails CI on:

  • ALTER TABLE ... DROP COLUMN ... without a preceding -- BREAKING: <reason> comment
  • DROP TABLE ... without -- BREAKING:
  • ALTER TABLE ... ALTER COLUMN ... without either -- BREAKING: or -- ADDITIVE: widening only
  • EXEC sp_rename ... without -- BREAKING:
  • Missing IF [NOT] EXISTS / IF OBJECT_ID guards on CREATE/DROP

The -- BREAKING: comment is a forcing function — you can’t accidentally ship a destructive migration; you have to acknowledge it in the SQL file itself. Reviewers (and Copilot) then see the comment and can challenge whether the expand-contract plan is actually followed.

The PR template prompts for this

When you open a PR touching sql/, the PR template asks:

  • Is this additive-only, or breaking?
  • If breaking: list the expand-contract deploys
  • Is a rollback SQL script attached?
  • Has it been tested on a sandbox DB?

Answer these honestly — they’re the checklist that catches the kind of mistake that takes down Line 1.

CODEOWNERS forces extra review on schema changes

Any PR touching sql/**, PRGJSMES.API/Models/**, or PRGJSMES.API/Data/** auto-requests @ADevereaux as required reviewer (in addition to Dakota for default ownership). Schema changes can’t merge without extra eyes.

Special cases that have bitten us before

Permission seed drift

Adding a permission check in code ([RequirePermission("foo.bar")] or <ProtectedRoute permission="foo.bar">) without a matching SQL seed row causes admins to silently lose that capability. The scripts/check-permission-sync.js CI check catches this automatically — it fails if code references a permission code that isn’t in sql/fix_missing_permissions.sql + sql/FULL_MIGRATION.sql.

Rule: every new permission check needs 2 SQL updates (fix_missing_permissions.sql + FULL_MIGRATION.sql) + role assignments in both.

Permission seeds ARE seed data, not schema

Technically permission seeds are adding rows, not changing schema. But they’re treated as “schema-critical” because missing rows cause silent breakage. Same review level.

ASP.NET Core EF Core “shadow properties”

EF Core can silently infer columns that aren’t in the SQL schema (shadow properties — when you reference a nav without an FK column). If the migration doesn’t create the actual column, queries fail at runtime with cryptic “invalid column name” errors. Always verify EF model changes have matching SQL.

DELETE FROM X without WHERE

Treat bulk DELETEs as destructive — add a -- BREAKING: comment even though the linter doesn’t strictly require it for DELETE. Consider a soft-delete (add DeletedAt column) if the data might be needed for audit.

If you’re not sure, ask before coding

The checklist:

  1. Can this be done additively? Yes → ship.
  2. If no, can I decompose into 2–3 expand-contract deploys? Yes → plan them out in the PR description.
  3. If no, what am I missing? Bring it to Adam / Dakota before writing the migration.
  • PRGJSMES application
  • PRGJSMES/sql/README.md — where migrations live + naming convention
  • PRGJSMES/docs/DEVELOPMENT.md — full developer workflow
  • PRGJSMES/scripts/lint-sql-scripts.js — what the linter catches, with code

Last updated: 2026-04-24.