CSM Board

The PSI Claude Work Board. A hosted dashboard for every developer’s Claude Code work across all of PSI — open Issues / PRs / Azure DevOps, spawned Claude sessions, per-project todos, and one-click start Claude on any issue.

  • Live: board.progressivesurface.com
  • How to join: install PSI.CSM via WinGet, run csm agent.
  • Repos: hosted board → ProgressiveSurface/csm-board. Per-developer agent → ProgressiveSurface/claude-session-manager.

What it shows you

Sign in at board.progressivesurface.com with your PSI account. The top nav has Today (personal todos due now, cross-project), Projects, Epics, By Attention, and Mission (the master agent). The main ones:

Projects (default)

Repo-list sidebar (filter, sort by open work / name / activity, “hide empty” toggle) → detail pane for the selected project:

  1. Repo header — branch, dirty / unpushed counts, last commit
  2. Open Pull Requests — pulled live from GHE via gh pr list. Each PR row shows “opened by @<login>” alongside its draft flag, labels, and assignees.
  3. Open Issues — same, via gh issue list. If the agent can’t reach GHE (auth, rate-limit, gh not on PATH, timeout), the pane shows a ”⚠ Couldn’t fetch issues from GHE: <reason>” warning instead of a misleading empty list — so a genuine “0 open issues” reads differently from a failed fetch.
  4. Azure DevOps Work Items — active items from ProgSurface / Pro App development, matched to projects by tag / title (or via an optional ~/.claude/board-ado-map.json on the agent). ADO is best-effort: if az is missing or broken it degrades to empty silently and never blocks the Issues / PRs above it.
  5. Todos — per-project, per-user, persisted on the board
  6. Spawned Sessions — live status of any Claude sessions you’ve started on this project, with one-click Open in Claude Code ↗ and ↻ Reconnect ↗ buttons

By Attention

Five-column kanban of every Claude session across all your projects, ordered by attention needed:

Mid-task (interrupted) · Waiting on you (ended on a question) · Uncommitted (work not yet pushed) · Idle (project with no sessions yet) · Done

For Monday-morning triage.

Stuck sessions clean themselves up. The board automatically retires a session that’s gone quiet — one whose workstation stopped reporting it (~15 min), or one that’s still “running” but has made no progress for hours (e.g. spawned, printed one line, then parked). These flip to ended so they stop cluttering the live view. Nothing is killed on your machine: the session’s bridge URL survives, you can ↻ resume it, and if you reopen its bridge and it actually starts working again, the board brings it back to running on its own.

Epics

An epic is a goal or initiative that groups work spanning repos — one level above a project (which on this board means a single repo or a manual project). Use epics to organize related todos and GHE issues under a shared objective and rank them by what matters now.

Each epic has:

  • a priority (P0–P3, same scale as todos — click the chip to cycle),
  • a status lifecycle — planned · active · paused · done · cancelled,
  • a manual order (rank the list with the ▲▼ buttons; top = work first),
  • a progress roll-up — done vs. total across its todos and issues,
  • an optional primary repo — see “Giving an epic a repo” below.

Membership is set from where the work lives, not from the epic form:

  • Todos can be created directly on an epic — expand the epic and type a line into “Add a todo to this epic”; it’s captured under the goal with no repo required. These repo-less todos still schedule and surface in Today (with a (no repo) chip). They can ▶ start a session right away: with no repo bound the session runs in a scratch workspace; bind a primary repo to run in real code instead (see below).
  • Existing todos also join via the 🎯 epic chip on any todo row (in Today, in a project’s Todos, or inside the epic itself). Pick an epic to file it, or “remove from epic” to unfile. Deleting an epic never deletes its todos — they’re just unfiled.
  • Issues / PRs join via the ”+ epic” link on any issue row in a project’s issue list. The epic stores a reference; the board resolves each attached issue’s live state against the current overlay — a green dot means still open, grey means closed/gone, amber means “no agent currently reports that repo”.

Click anywhere on an epic’s header row to expand it (not just the little chevron) and see/curate its members; the same spawn / promote / priority / schedule controls work on the todo rows there. A hover renames it.

Starting work — with or without a repo

A Claude session needs a working directory, but a repo-less epic todo doesn’t need you to set one up first:

  • ▶ start on any repo-less epic todo starts a session immediately. With no repo bound to the epic it runs in a scratch workspace (a plain C:\git\<epic-slug>\ folder the agent creates — no GHE repo, no clone), pre-briefed with the epic goal as context plus the todo as the task.
  • ▶ start whole epic (in the Todos header) starts one session that works all open todos on the epic together under the goal.
  • Starting work on a planned epic auto-moves it to active; when every todo is closed, the board offers a one-click “mark epic done”.

Giving an epic a repo (optional)

Binding a primary repo is optional — it gives the epic a real code home so sessions run there instead of a scratch folder. Expand the epic and use the Repo row:

  • Create repo — makes a brand-new private GHE repo under ProgressiveSurface (name pre-filled from the epic title; edit freely). Your connected csm agent runs gh repo create … --clone, so the repo is created on GHE and cloned to your machine in one step, then bound. (Needs a recent PSI.CSM agent build; until that ships, use Link existing.)
  • Link existing — pick any repo the board already knows and bind it.
  • unlink — detach the repo (the repo itself is never deleted).

Once a repo is bound, ▶ start on the epic’s todos runs there instead of a scratch workspace.

Sessions on an epic

Expand an epic and it tracks the sessions spawned for it — both those started from its todos and the “whole epic” session. Each shows ↗ open (jump back into a live session’s bridge) or ↻ resume (relaunch an ended one via claude --resume). Because it reads board session state, the list survives page reloads and dropped bridges, so you can always get back into the work.

Spawning Claude on an issue

Click ”▶ start Claude ↗” on any open Issue or PR. The board sends a spawn request through your connected csm agent, which runs claude --remote-control "<project> #<issue>" in the repo directory on your machine, pre-briefed with the issue title and URL.

A new browser tab opens to https://claude.ai/code/session_<id> — that’s the live session, driveable from any device you’re signed into claude.ai on. The actual claude process runs on your workstation, under your Anthropic auth, with full access to your file system — the board never runs Claude itself.

Repo not cloned on your machine? The button reads ”↓ clone & start ↗”. Click it, and the agent runs gh repo clone ProgressiveSurface/<repo> into your project root (typically C:\git\<repo>) before spawning. Single click.

Alt-click for an unattended (claude -p) headless run instead — useful for “go fix this and report back” workflows. Bounded by --max-turns 15 and --max-budget-usd 2.00 by default.

What every spawn prompt carries

Whatever you (or the Board Manager) type as the task, the board wraps it with standing context before the worker sees it, so a cold session starts oriented:

  • About this project — a short, board-maintained blurb of current orientation a worker should know before planning (e.g. “active pilot — the agent half ships to main, not the release branch”). Edit it with the about button on a project card; the Board Manager can keep it current too (set_project_about). It’s deliberately short — a few sentences, not a second onboarding doc.
  • Read the repo’s docs first — a standing pointer telling the worker to read the repo’s CLAUDE_ONBOARDING.md (or CLAUDE.md) for orientation and BUILD_LOG.md for what shipped recently. The build log is pointed to, never pasted in — it belongs in the worker’s first read, not its prompt.
  • Ownership framing + PSI standards — chain-of-custody framing (own pre-existing code, don’t stop at “not my code”) and the GHE-only / compliance / finish-to-deployment baseline.

The “About” blurb is the half you maintain per project; the rest is automatic. Master-spawned workers additionally get the non-negotiable git guardrails (see below).

Picking a model (difficulty)

Next to every start / resume button is a small model picker. It sets the Claude model the spawned session runs on (claude --model), framed by how hard the task is:

PickModelUse for
Auto(Claude Code default)When you don’t care — leaves the choice to Claude Code
Fable · hardestFable 5Architectural / multi-repo / subtle work where a wrong call is expensive
Opus · complexOpusReal feature work or debugging
Sonnet · stdSonnetWell-scoped, clearly-specified work — a solid default
Haiku · simpleHaikuTrivial mechanical edits: renames, doc tweaks, one-liners

Your choice is remembered across buttons, so set it once and every spawn you start uses it until you change it. The session card shows which model a session was launched on. Picking a smaller model for simple work saves cost and time; reserve Fable/Opus for genuinely hard tasks. Resuming a session can pick a different model than the original run — handy to escalate to a bigger model when work turns out harder than expected.

The same picker drives the Board Manager launch in Mission Control: choose which model the manager session itself runs on. The Board Manager, in turn, picks an appropriate model for each worker it dispatches (it assesses the task’s difficulty and passes a model to its spawn tools).

Spawn progress

While a spawn is in flight, the row under the issue shows a live progress chip (routing → queued → cloning → launching → bridge-wait) plus a heartbeat (“Still waiting on <machine>… 32s”) so you can tell the difference between “the agent is busy cloning a 500 MB repo” and “the agent fell over”. A fresh clone of a large PSI repo takes the full ~30–60 s; the heartbeat updates every 5 s so you know the channel is alive.

The same progress signal surfaces on the Reconnect, Resume, and “Improve the Board” buttons.

Spawning Claude from a Todo

Click ”▶ start” on any open Todo to spawn a Claude session with the todo text as the first prompt. Once the session is live, the row flips to show a pulsing live session chip and the button switches to ”↗ open” linking straight to the live bridge — so the todo is visibly tied to its session instead of orphaned next to a “Spawned Sessions” panel below.

Bundling multiple items into one session

Each open Issue, PR, and Todo row has a small select checkbox (labeled so it isn’t confused with the row’s ✓ Close button, which marks a single todo done). Tick any combination within the same project and a green action bar appears at the top of the pane:

  • ▶ Start Claude (N) — spawns one session with a single onboarding prompt that lists every selected item and asks Claude for one combined plan. Faster than launching N separate sessions, and the agent only clones the repo once.
  • ↳ Assign to session ▾ — drops down a list of live sessions in the same project. Picking one routes the bundled items into that session (see “Assigning work into a running session” below). The button only appears when there’s at least one running session to target.
  • ✕ clear — drops the selection.

Bundling is single-project on purpose — a Claude session runs in one repo, so Today’s cross-project view doesn’t expose bundle checkboxes. The selection is per-project too; switching projects in the sidebar resets it.

Assigning work into a running session

The board has two paths for delivering new work into a session that’s already running:

  1. Resume-with-prompt (always available). Spawns claude --resume <id> "<bundled prompt>". The previous claude.ai/code bridge URL goes away, a fresh one opens with the conversation continued. Works whether the target session is busy or idle. This is what ”↳ Assign to session” does today.

  2. Stop-hook inbox (pilot, csm-board only at the moment). Each project that opts in has a .claude/hooks/check-inbox.py Stop hook plus a .claude-inbox/ directory. Writing a .md or .txt file there causes the running session to pick it up as a new user message at its next turn boundary — no new process, no bridge churn.

    Operable by hand today — echo "do the thing" > .claude-inbox/$(date -u +%Y%m%d-%H%M%S)-note.md in any csm-board checkout. The full board-driven flow (UI button → csm agent → file write on the right machine) is tracked work; see open Issues on csm-board.

The longer-term plan is for ”↳ Assign to session” to pick the path automatically: inbox when the target session is busy (no bridge disruption), resume-with-prompt when it’s idle (the Stop hook won’t re-fire on its own).

The agent-side write path landed in PSI.CSM 1.10.19POST /api/sessions/{id}/inject body {"text": "..."} routes an inbox_inject over WS to the user’s agent, which drops a timestamped .md into the right project’s .claude-inbox/. Sessions spawned from the board are told (when the target project has the inbox installed) to expect mid-conversation inbox messages and fold them into the current plan instead of treating them as a new conversation.

The ”↳ Assign to session” dropdown now picks the path automatically (shipped 2026-05-28). Each live session in the dropdown carries a small chip telling you up-front which path the click will use:

  • via inbox (blue) — target session’s status === "running". The agent writes into .claude-inbox/; the Stop hook re-prompts at the next turn boundary. The existing claude.ai/code bridge URL stays alive.
  • via resume (amber) — anything else (idle / starting / ended / unknown). Spawns claude --resume <id> "<prompt>" and opens a fresh bridge. The previous bridge URL goes away.

On an idle session inbox doesn’t work (the Stop hook can’t re-fire on its own to drain a queued file), so the UI deliberately routes through resume — predictable + immediate beats “queued indefinitely”. The same chip-and-tooltip pair appears in the Todos bundle bar.

All four delivery surfaces into a running session — this UI dropdown, the MCP inject_into_session(run_id, text) tool, the raw POST .../inject REST endpoint, and a hand-written file in the inbox dir — converge on the same .claude-inbox/<utc-ts>-<slug>.md write, drained by the same Stop hook.

Session-targeted drops (csm-board#13). The .claude-inbox/ is project-scoped: any running session in that repo drains an ordinary <ts>-<slug>.md file at its next turn boundary, which is fine when the work just needs some session in the project. But the master agent’s event-nudged wake targets one specific session — the master — and must not be intercepted by a sibling worker in the same repo. Those drops are named <ts>-FOR_<run_id>.md; the Stop hook only drains a FOR_<run_id> file when its own CSM_RUN_ID matches, and leaves it untouched otherwise so the addressed session still gets it. The FOR_ marker is uppercase + underscore — which the slug path can never emit — so ordinary messages are never mistaken for targeted ones. Agent-side writer shipped in PSI.CSM 1.11.12.

Background: Anthropic’s claude inject feature request (#24947) covers the same need with a first-party CLI flag. When it ships, the inbox pilot collapses into a thin wrapper around it.

Reporting work done — the outbox (.claude-outbox/)

The inbox delivers work into a session; the outbox is the return path — a finished session telling the board “I’m done.”

When a session completes, it drops a marker in its repo’s .claude-outbox/ directory. The csm agent watches every project’s outbox, drains the marker, and relays it to the board, which then:

  • flips the session card to a green ✓ Session reported done banner showing the one-paragraph summary the session wrote; and

  • if the marker asked to close the originating issue, surfaces a one-click ✓ Close <repo>#N button. Closing the GHE issue posts the session summary as a closing comment.

    Two paths to that close: the user clicks it (always available), or the master agent does it autonomously at spawn_capped+ tier — but only when the session’s outbox marker set close_issue: true (so the trust signal still comes from the session, not the master’s judgment). At the default Groom tier and below, the master never touches GHE — the click stays the human’s.

Close is a true close. Closing a session — either the row’s ✓ Close button or the ✓ Close <repo>#N issue button — now also reaps the session on your workstation: it kills the claude process and closes the terminal window that was opened for it, not just hides the card. (Each board-spawned session runs in a console the agent owns via conhost, so the window can be closed deterministically regardless of whether your default terminal is Windows Terminal or the classic console host.) If your agent is offline or too old to reap, the close still applies as a hide and the window may linger until you close it manually. Reopening a closed card brings the card back but does not respawn the process.

Sessions spawned from the board already know to do this: the spawn prompt tells them to write .claude-outbox/done.json as their final action, in the form:

{
  "summary": "One short paragraph on what got accomplished.",
  "close_issue": true,
  "issue_number": 2
}

(close_issue/issue_number are optional — included only when the issue is genuinely resolved. A plain .md/.txt file is also accepted as a summary-only marker.)

Git self-report (git_actions). Sessions spawned by the master agent are asked to add an honest summary of their git activity to done.json:

"git_actions": {"branch": "feature/x", "commits_added": 3,
                "pushed": true, "push_target": "feature/x", "force": false}

The board uses it as a master-autonomy guardrail (csm-board#10 layer 2): for master-spawned sessions only, a force-push (force: true), a push to main/master, or a commit onto main/master flags the card as a policy violation — the issue-close is withheld, the master may not auto-close or groom the card away, and a red policy_violation row lands in the WORM audit log. User-direct sessions report git_actions too but are never flagged (this is a master-autonomy guardrail, not a global lockout). The field is optional; omitting it just skips the check.

Blocked, not done. If a session stops on a genuine blocker it writes .claude-outbox/blocked.json instead — {"summary": "...", "blockers": ["..."]}. The board flags the card ⚠ Session reported blocked (amber) rather than “done” and leaves it open; no issue close is offered.

Completion self-check. A one-shot Stop hook (.claude/hooks/check-completion.py) nudges a session, the first time it tries to end without a marker, to self-review: was the original request carried to deployment standards? If done → run /wrap-up + write done.json; if stuck → write blocked.json. It fires once per session (a per-session sentinel guards it) — a reminder, not a gate, so it never loops or traps an interactive session.

Operable by hand for testing — echo '{"summary":"manual test"}' > .claude-outbox/done.json in a checkout the agent is watching; the session card flips to “reported done” within one agent state cycle (~30 s).

Fully wired end-to-end as of agent 1.10.18 (winget upgrade PSI.CSM --source PSI). The watcher (claude-session-manager board/outbox.py) drains every project’s outbox on each state-pump cycle (~30 s), so a marker dropped now surfaces as a “reported done” banner within one cycle. Older agents still work — they just never relay session_done, and the board handler stays dormant for that user until they upgrade.

Work-item lineage (which session served which issue/todo)

A session spawned from a Todo or a GHE Issue now carries a durable link back to the work item it serves (csm-board#14). The link is recorded board-side at spawn — it survives an agent disconnect, a board redeploy, and a session resume — so the connection is never lost.

What it buys you:

  • Session → work item. Each session card shows a chip naming the Todo or Issue it serves (the issue chip links straight to GHE).
  • Work item → session. A Todo or Issue currently being worked shows a pulsing live session chip with an ↗ open link to its bridge — the reverse of the card chip.
  • Resolve-on-done. When a session reports done (its .claude-outbox/done.json), a session linked to a Todo auto-completes that todo — finishing the session resolves its work item instead of silently leaving it open. A session linked to a GHE Issue keeps the existing rule: if its marker set close_issue, the ✓ Close <repo>#N button is staged for your (or the master’s, at spawn_capped+) confirm; the link just makes the association explicit even for summary-only markers. A blocked report resolves nothing.

Sessions started outside the spawn flow can be retro-linked right from the session card (csm-board#19): an unlinked card shows a ⛓ Link to… control that opens a picker — choose any project, then one of its open issues or todos — and a linked card shows change · unlink next to its chip. The picker defaults to the session’s own project but lets you target any project, so a mis-linked session can be relinked across repos or cleared entirely. (Under the hood it’s POST /api/sessions/{id}/attach; the reverse index is queryable at GET /api/sessions/for-work-item.) Sessions from before this shipped simply carry no link and behave exactly as before.

The Board Manager dispatch path links too (csm-board#17). The MCP spawn_session / resume_session tools take an optional work_item argument ({kind:"issue"|"todo", number|todo_id, project_key, slug?}); the Board Manager passes it whenever it dispatches a worker to serve a known issue or todo, so dispatched sessions show their connection just like UI-spawned ones. As a fallback, when no explicit work_item is given the board infers an <owner/repo>#N reference from the spawn prompt and links that — so even a dispatch that only names its issue in prose shows up under the issue.

Reconnecting to an existing session

Spawned-session rows show a ↻ Reconnect ↗ button only once the session is no longer live with a working bridge — i.e. it ended (completed / failed) or its bridge URL was lost. While the session is running or starting with a bridge, the row shows the single Open in Claude Code ↗ button; Reconnect would be confusing because the session is already reachable.

Reconnect tells the agent to claude --resume <id> --remote-control "<name>", which publishes a fresh claude.ai/code bridge URL and opens it.

A resumed session stays in the same Spawned Sessions row — the board treats records that share a claude_session_id as one logical session, so closing a terminal and reconnecting later doesn’t pile up duplicate cards.

Session status is board-authoritative

The board — not your workstation agent — owns each session’s lifecycle. A session moves spawning → running → ended/failed, and once it reaches a terminal state the board never lets a stale agent message flip it back to running. This fixes the old annoyance where a session that had clearly finished still showed running forever, and dead ones rotted to an unknown state.

Two things keep the board honest:

  • Heartbeat. Every agent update refreshes a “last seen” timestamp on the session. If a running session goes quiet for ~15 minutes (its terminal was closed, the machine slept, the process crashed), the board marks it ended on its own — no manual cleanup needed.
  • Reconnect reconciliation. When your agent reconnects (e.g. after a reboot or a network drop), it tells the board which sessions it still has. Any session the board thought was running but the agent no longer tracks is marked ended. Sessions still alive stay live — and any session the board had ended during a brief disconnect but that your agent still has running is revived back to running automatically.
  • Blackout-safe. A transient WebSocket drop used to look, to the board, like every one of your sessions going quiet at once — which could falsely end live sessions. The heartbeat cleanup now only ever ends a session when your agent is actually connected and simply isn’t reporting it; while your agent is offline, your sessions are left untouched until it reconnects and reconciles. Keep your csm agent on 1.11.11+ for the reconnect reconciliation to work both ways.

You don’t configure any of this — it just keeps the board’s session view true to reality. A session that ended this way can still be resumed with ↻ Reconnect ↗ like any other.

On mobile

The board is responsive and usable from a phone. The header wraps onto multiple rows instead of overflowing; the Projects view becomes a master/detail flow — you see the full-width project list, tap a project to open its detail, and tap ← All projects to go back; the By Attention kanban collapses from five columns down to one or two.

Open in Claude Code ↗ opens a normal browser tab, not the native Claude mobile app. The bridge is a claude.ai/code URL that the app would otherwise claim via deep linking, but the app can’t reach a developer’s locally-tunnelled bridge — so the board forces the link to stay in the browser.

Manual (non-repo) projects

Click ”+ new project (non-repo)” in the Projects sidebar to create a project that doesn’t correspond to a GHE repo — useful for proofs of concept, scratchpads, ad-hoc tracking. Manual projects show Todos in place of Issues/PRs. Delete them from the bottom of their detail pane.

All repos vs Local only

The header toggle 🌐 All repos / 📍 Local only controls whether the board shows every PSI GHE org repo (default — useful for PM / discovery) or just the ones cloned on at least one of your connected agents. Persisted in your browser; doesn’t affect what your agent streams.

Drive the board from Claude Code (MCP)

The board ships an MCP server (in the mcp/ directory of the csm-board repo) that exposes every board action as a tool, so a “project-manager” Claude Code session can do anything you’d do in the web UI — read and write todos, manage projects, list and close sessions, and even spawn or resume Claude sessions on your agent.

Setup (requires a logged-in csm agent on the same machine):

cd csm-board/mcp
pip install -e .

It’s pre-wired into Claude Code via the repo-root .mcp.json. The server reuses your agent’s token cache (~/.claude/csm-board-token.json) silent-only — it never prompts for a login of its own. Because Claude Code launches it as your OS user, it can only ever read your own token; no other user’s board is reachable. If you’re not signed in, tools fail with a hint to run csm agent first.

Configure (Claude Code)

Pre-wired via the repo-root .mcp.json:

{
  "mcpServers": {
    "csm-board": {
      "command": "python",
      "args": ["-m", "csm_board_mcp"],
      "env": { "CSM_BOARD_URL": "https://board.progressivesurface.com" }
    }
  }
}

CSM_BOARD_URL is optional — defaults to production; point it at http://localhost:8000 to develop against a local backend.

Tool reference

All ~45 tools, grouped by area. Everything is scoped to the signed-in user (the OID behind the reused token):

AreaTools
Identity / healthwhoami, board_health
Agentslist_agents, get_agent_logs
Projectslist_projects, list_manual_projects, create_manual_project, delete_manual_project
Project flagssnooze_project, pin_project, set_project_note
Todos (read)list_todos (per project), list_todos_today (cross-project)
Todos (write)add_todo, update_todo, complete_todo, delete_todo, promote_todo_to_issue
Epicslist_epics, get_epic, create_epic, update_epic, create_epic_repo, delete_epic, reorder_epics, assign_todo_to_epic, add_issue_to_epic, remove_issue_from_epic
Sessionslist_sessions, tail_session, close_session, close_session_issue, reopen_session, spawn_session, resume_session, terminate_session, inject_into_session
Change feed (WORM)get_changes, verify_audit
Master agentget_master, start_master, stop_master, set_master_caps, act_as_master, act_as_user, advance_master_cursor, get_master_log, post_master_log
Actions log (WORM)get_actions, verify_actions
Digestslist_digests, post_digest
Approvals queuelist_approvals, enqueue_approval, approve_proposal, deny_proposal, mark_proposal_executed

spawn_session / resume_session consume the board’s NDJSON progress stream and block until the agent returns a result or the board’s 5-minute deadline trips, then return {ok, run_id, bridge_url, status, stages}. bridge_url is the claude.ai/code link to drive the spawned session.

promote_todo_to_issue, spawn_session, and resume_session all route through a connected csm agent — they fail with a 503-style error if none is online (check list_agents).

tail_session(run_id, n=1) returns a running session’s latest assistant message — what it’s currently doing, not just its status — so you can confirm progress, spot a stuck or wandering session, or report what each session is working on without opening its bridge URL. It’s a cheap read of board state (no agent round-trip): each connected csm agent tails its live sessions’ transcripts and pushes the snapshot every ~30 s, so the response includes stale_seconds telling you how fresh it is. The same snapshot shows on the session card as a pulsing “Doing now” block while a session is live.

Master agent (autonomous grooming)

The board ships a Board Manager subagent at .claude/agents/board-manager.md in the csm-board repo. It’s a Claude Code agent that uses the MCP server to groom the board on a schedule — read the change feed, close outbox-confirmed-done sessions, reprioritize today’s todos, post a digest, and enqueue spawn proposals for human approval.

Autonomy tiers (set with start_master(tier=...)):

TierReadsReversible writesGHE writes + terminateSpawn
observe
groom (default)✓ close_session (no pending_issue), reopen, todo CRUD, pin/snooze/note, digest, log, cursor, enqueue approval
spawn_capped✓ close_session_issue (when pending_issue set), terminate (when closed/terminal), promote_todo_to_issue, inject_into_session✓ within parallelism + repo-allowlist caps
autonomous✓ without per-action approval

Trust-the-marker, not blanket bans. At spawn_capped+ the master can close a GHE issue or terminate a process — but only when the session itself set the trust signal:

  • close_session_issue requires the card to carry pending_issue — the session’s .claude-outbox/done.json asked for the close via close_issue: true. Master can’t fabricate that. The full round-trip (post done_summary as closing comment, close GHE issue, close card) runs in one call via the user’s gh auth on their agent.
  • terminate_session requires the card to be closed or in terminal status (ended/completed/failed). Never reaps live running work. The natural lifecycle: close (or close_session_issue), then terminate.
  • promote_todo_to_issue is tier-only; legitimate user-delegate work.
  • inject_into_session is tier-only (the inbox writer); lets the master push new turns into a running session via .claude-inbox/. Used to: answer a waiting-on-you session’s question, fold a new P0 todo into a running worker’s plan, or relay user redirection (“focus PRGJSMES today”) to active workers. Half of the master↔worker channel (the outbox is the other half); without this the master could only spawn new workers, never steer existing ones.

Groom tier is the conservative default — the master can leave it on indefinitely without GHE side effects. Opt up to spawn_capped to grant end-to-end session lifecycle ownership.

WORM accountability. Every master action lands in an append-only, hash-chained action log per user with actor=master:<oid> and authorized_by=tier:<tier> (or approval:<id>). get_actions(actor=...) / verify_actions give you full audit. The change-event log (events.jsonl) sits alongside; the master’s actions show in both — change for “what changed” with no actor, action for “who did it under what authorization”.

Approval queue. When the master wants to spawn, it enqueue_approval with a kind=spawn proposal (summary + est_cost + payload). You approve in the UI or via MCP (approve_proposal); on the master’s next wake it consumes the approved entry, spawns, and marks it executed. The board refuses any master spawn that would exceed parallelism_cap or target a repo outside repo_allowlist.

Scheduling. The user picks: /schedule for remote cron, /loop for in-session repeats, or ad-hoc. The master state persists per-user, so each wake reads master.cursor and get_changes(since=cursor) rather than the full session list. Cold-start is bounded (the feed returns a compact digest, not 200 records).

Orchestration tree. Mission Control renders the live session hierarchy: the master at the root, the workers it spawned beneath it, and — when a worker spawns its own sub-workers — those sub-spawns nested under their parent, n levels deep. The nesting is driven by a board-owned parent_run_id on each session: a spawning session advertises its own board run id (via the CSM_RUN_ID env the agent injects) on the spawn call, and the board threads it onto the new session. Sessions with no resolvable parent fall into the two top-level groups — under the master root (if master-spawned) or the “Your direct spawns” cluster.

Title-bar control (every view). You don’t have to open Mission Control to reach the Board Manager — the main title bar carries a persistent control (csm-board#30):

  • When a master is running it shows a green Board Manager indicator that links straight to the current run’s bridge URL — one click opens the live master session in Claude Code to chat with it or interject directions. It always follows the latest run.
  • When none is running it shows a Launch Board Manager button that spawns a real claude --remote-control master session (the same boot prompt and start path as Mission Control’s “spawn master session” — shared, never duplicated), then flips to the link state on the next poll.
  • If your csm agent is offline the launch can’t host the session and 503s; the bar surfaces a plain “agent offline — start csm agent” hint inline instead of breaking. If the master’s session dies under you, the control offers a Relaunch instead.

Design: docs/MASTER_AGENT.md in the csm-board repo.

Digest → Teams (PSI Notify Bot)

Each groom cycle the Board Manager posts an executive digest (the post_digest MCP tool → POST /api/digests). Optionally, the board pushes that same digest into a persistent Microsoft Teams chat — “CSM Board Manager” — via PSI’s shared Notify Bot, so exec summaries and Action needed items reach Adam in Teams without opening the board (csm-board#24).

  • Second sink, never the primary. The board-side digest is unchanged and authoritative; the Teams push is an additional, best-effort sink. It’s fail-safe: any notify error (flag off, missing secret, token/HTTP failure, timeout) is caught and logged — it can never break the groom or the board digest.
  • Config-gated by the app setting CSM_TEAMS_NOTIFY (defaults OFF in code). Provisioned and live in production since 2026-06-02 (=1).
  • Adaptive Card by default. The digest renders as an Adaptive Card: a title, an “as of seq N” subtitle, the repo/epic body, and the contract’s closing Action needed section lifted into its own attention-styled container. Set CSM_TEAMS_NOTIFY_FORMAT=markdown to fall back to a plain markdown message.
  • Deep-link buttons (csm-board#33, Tier 1). The card carries Action.OpenUrl buttons so you can act on a digest from Teams: “Open on the board ↗” (the Mission view, via ?view=mission) plus one button per GHE issue/PR the digest references (repo#N / owner/repo#N → the GHE issue). These are links only — no inbound bot endpoint, so the shared Notify Bot stays notification-only. Two-way (reply-to-steer) is Conversational Teams below (csm-board#41). Board base URL is CSM_BOARD_PUBLIC_URL.
  • De-duplicated. An identical back-to-back digest is not re-sent (grooms are ~2 h apart, so volume is low).
  • Python, not PowerShell. Every other Notify-Bot consumer uses the PSI.Notify PowerShell module; csm-board is Python/FastAPI on Linux and cannot, so api/notify.py ports the token flow + Bot Framework send call to Python (httpx). It is send-only — the persistent chat is bootstrapped out of band by an operator (the three Graph chat-creation pitfalls live in that PowerShell bootstrap, not here).
  • Secrets come from ps-certificates-kv (psi-notify--*) as Key Vault references on App Service app settings, resolved by the csm-board App Service managed identity. Nothing is hardcoded and the bot token is never logged.

Conversational Teams (two-way) — csm-board#41

Takes the integration from send-only to a bi-directional chat loop: message the Board Manager from inside the “CSM Board Manager” chat — ask questions, steer it, act on Action-needed items — and get its replies back in the same thread. LIVE as of 2026-06-24 — the dedicated bot is provisioned and the round-trip is verified end-to-end (DM the Board Manager, it replies in-thread). The dedicated bot is csm-board-bot (App ID 2bb0664c-5ec8-4390-95af-1d08c12fb6a8), single-tenant, messaging endpoint POST /api/teams/messages, gated by the app setting CSM_TEAMS_INBOUND=1 (returns 404 when unset). Setup steps in the operator runbook.

  • Dedicated interactive bot, not the shared one. The shared PSI Notify Bot stays isNotificationOnly; csm-board stands up its own bot so making it interactive can’t regress deploy-proapps/egnyte/prgjsmes. The dedicated bot owns csm-board’s chat in both directions (so Action.Submit card replies route to our endpoint). Decision rationale in the #41 design comment.
  • Inbound endpoint POST /api/teams/messages on the existing FastAPI app — not Entra-authed. Each activity carries a Bot Framework JWT validated per request (api/teams_auth.py: RS256/BF-JWKS, audience = our bot app id, BF issuer, serviceUrl host allowlist + channel binding so the bot’s bearer token can never be sent to a spoofed host). A third non-Entra client class alongside the SPA and the agent — see ADR-0028; it lives at the app layer, never behind an Entra edge pre-auth.
  • Identity = from.aadObjectId. The Teams sender’s Entra Object ID is the board OID, so an inbound action only ever steers that user’s master and touches that user’s partition (blast-radius scoping for free). No aadObjectId → declined.
  • Routing + active-wake. Delivery depends on the master’s liveness (mirrors the UI’s chooseAssignPath): a running master gets a FOR_<run_id> file in its .claude-inbox/ (drained at the next turn; bridge survives); a dormant master is resumed with the message (claude --resume <claude_session_id> …) so it wakes and replies in seconds — an idle session’s Stop hook won’t re-fire to drain an inbox file, and you can’t inject input into a live detached remote-control session, so resume is the only re-prompt lever. Either way the master answers via the reply_to_teams MCP tool → POST /api/teams/reply → the reply threads back into the captured conversation. The conversation reference is stored on master.teams_conversation (survives run_id churn); resume keys off the stable claude_session_id, not the volatile board run_id.
  • Structured quick-reply verbs (Phase B). Action.Submit buttons on digest cards (only when the interactive bot is provisioned). focus <repo> is a master steer (→ inbox); snooze <project> is a direct board write that mirrors the UI click (attributed to the user, authorized_by=teams). Unknown verbs fall back to the master inbox. resolve + spawn-on-item are Phase C.
  • Fail-safe + dark. Disabled/unprovisioned ⇒ the endpoint returns 404 (reveals nothing); the reply path never raises.

Provisioning (one-time, Azure/Teams-side — done 2026-06-02 for csm-board#24):

  1. Key Vault access — access policy, not RBAC. ps-certificates-kv runs in access-policy mode (enableRbacAuthorization: false), so the “Key Vault Secrets User” RBAC role has no effect; the csm-board App Service managed identity was granted get/list on secrets via az keyvault set-policy.
  2. Network path to the vault. ps-certificates-kv is defaultAction: Deny and (per the 2026-06-01 KV/winget incident) App Service KV-reference resolution does not get the AzureServices firewall bypass — it needs to reach the vault’s private endpoint (ps-certificates-kv-pe, 10.160.140.23). csm-board was given regional VNet integration into ps-vnmain/ps-webapps (the subnet allowlisted on the vault); RFC1918 traffic routes through the VNet by default and DNS resolves the vault to the PE IP via the privatelink.vaultcore.azure.net zone (Azure DNS + the AD-integrated zone on PS-AZ-DC01). KV references then show Resolved.
  3. Teams chat. The “CSM Board Manager” persistent group chat was bootstrapped with psi-notify-bot/scripts/Initialize-NotifyChat.ps1 (members: ADevereaux + PowerOperative — a group chat needs ≥2 humans), the Notify Bot installed in it, and its id stored as psi-notify--chat-csm-board.
  4. App settings. Five settings on the csm-board App Service: CSM_TEAMS_NOTIFY=1 plus the four PSI_NOTIFY_* Key Vault references (tenant-id / client-id / client-secret / chat-csm-board).

Example: a PM-Claude triage pass

You:  Use the csm-board tools to plan my morning.
Claude→ list_todos_today()           # 20 todos across projects
      → list_sessions()              # what's mid-task / waiting on me
      → "3 sessions are waiting on you. redbook-web has 4 unscheduled
         P1 todos. Want me to schedule them and spawn the top one?"
You:  Yes.
Claude→ update_todo(scheduled_for=today, priority=1) ×4
      → spawn_session(project_key="redbook-web", prompt="<top todo>")
      → "Spawned — bridge: https://claude.ai/code/session_…"

How to join (for a PSI developer)

# 1. Install (or upgrade) the agent
winget install PSI.CSM --source PSI
# or:  winget upgrade PSI.CSM --source PSI
 
# 2. Start the agent (first run prompts for a device code)
csm agent

Complete the device-code prompt in any browser; the token caches to ~/.claude/csm-board-token.json and subsequent runs are silent. The agent connects over outbound WSS to board.progressivesurface.com, registers under your Entra identity, and starts streaming your projects + sessions every 30 s.

Then open board.progressivesurface.com in any browser; your projects appear within ~30 s.

Optional: opt out of org-wide discovery

By default the agent reports every PSI GHE repo (whether cloned on your machine or not) so the team’s PM view is complete. To restrict your agent to just what you’ve cloned:

  • BOARD_LOCAL_ONLY=1 in the agent’s environment, or
  • "local_only": true in ~/.claude/board.json

(Equivalent to the 📍 Local only UI toggle, but applied at the agent level.)

Architecture

Azure · PS-WEBAPPS RG                          Your workstation
┌─────────────────────────────────┐
│  csm-board App Service (Linux)   │           ┌──────────────────────────┐
│  React 19 + Vite + Tailwind v4   │ ◄──WSS───┤  csm agent (PSI.CSM)     │
│  Python 3.11 + FastAPI           │           │  scans repos, sessions,  │
│  MSAL/Entra auth                 │           │  fetches GHE+ADO,        │
│  /home/data state                │           │  spawns claude,          │
│  board.progressivesurface.com    │           │  device-code MSAL auth   │
└─────────────────────────────────┘           └──────────────────────────┘
        ▲
        │  HTTPS · MSAL redirect
        │  (claude.ai/code in a new tab for spawned sessions)
   [PSI devs]
  • Federated. Each developer’s projects and sessions live on their machine. Sessions bill to their Claude subscription. Nothing leaves their box except metadata over an authenticated WSS.
  • Agent ↔ board speaks newline-JSON over a single long-lived WSS. Bearer token in the upgrade URL.
  • Spawn requests flow from the hosted board → agent → local claude --remote-control → bridge URL → back to the browser.
  • MCP server (csm-board/mcp/) is a fourth client of the same REST/NDJSON API the web UI uses — a local stdio process that reuses the agent’s token cache, giving a Claude Code session full UI parity.

Auth

  • App Registration: CSM Board — Client ID 9eeff376-82ba-40cf-a4b9-d2ed4970d82d (see Azure Resource Map).
  • Sign-in audience: PSI tenant only.
  • Frontend: MSAL.js v4, SPA platform, PKCE, redirect flow, localStorage cache. No “Sign In” button — auto-redirect on unauthenticated load, per the PSI auth standard.
  • Backend: validates V2 access tokens against the tenant JWKS; audience = bare Client ID; required scope access_as_user.
  • Agent: the same App Registration, with isFallbackPublicClient: true so it can use device-code flow on first run. Token cached to ~/.claude/csm-board-token.json.

Network exposure & external-access hardening

The App Service is publicly reachable at the network layer (publicNetworkAccess=Enabled, inbound rules = Allow-all; re-verified 2026-06-24). Identity, however, is already gated by Entra: a 2026-06-24 audit of live Conditional Access found the tenant-wide “Require MFA” policy (All apps, browser + mobile/desktop clients) covers csm-board — it is not in the policy’s excluded apps — so every interactive browser sign-in already requires MFA, and the federated agent’s device-code flow completes MFA in-browser at redemption (which is why it works today). No new CA policy and no device-code carve-out are required — no authentication-flows policy blocks device code in this tenant. The gate the original plan would have created already exists.

The remaining gap addressed by ADR-0028 (issue #28) is therefore edge exposure — the open origin has no WAF or rate limiting:

  • Azure Front Door Standard + WAF fronts the site, with the App Service origin locked to Front Door (AzureFrontDoor.Backend service tag + X-Azure-FDID header) so *.azurewebsites.net is no longer openly reachable. Front Door passes the agent’s WebSocket through unchanged.
  • Conditional Access is already satisfied by the tenant baseline; the only optional add is an app-specific require-compliant-device policy scoped to browser sign-ins (never the device-code path).

Entra auth stays at the app layer — no edge pre-authentication (App Proxy / Easy Auth), because the headless agent (device-code over WSS) can’t satisfy an interactive edge challenge. Private Endpoint / publicNetworkAccess=Disabled is not the path here: the shared asp-erp-migration-tool plan is B3 Basic (Private Endpoints unsupported, re-verified 2026-06-24) and a fully-private board would break external access and off-network agents. csm-board is therefore a documented public-exposure exception (see azure-security → exceptions). Design, Bicep, the live-posture verification script, and the operator runbook live in csm-board/docs/ADR-0028-secure-external-access.md, csm-board/infra/ (front-door.bicep, scripts/verify-entra-posture.sh), and csm-board/docs/runbooks/.

Stack

LayerChoice
FrontendReact 19 + TypeScript + Vite 6 + Tailwind v4 + @azure/msal-react v3. Visual language follows psi-design-system: green primary, light surface, Inter + JetBrains Mono, semantic status palette. Tokens mirrored into web/src/index.css @theme rather than importing ps.css directly (csm-board keeps dev-dashboard density).
BackendPython 3.11 + FastAPI + uvicorn (WebSocket-native). Documented exception to PSI’s ASP.NET Core 8 standard — see csm-board/docs/IMPLEMENTATION_PLAN.md § “Exceptions to PSI standards”
HostingAzure App Service Linux, shared asp-erp-migration-tool plan (B3), PS-WEBAPPS RG, North Central US
AuthEntra ID app-layer MSAL
DNSAzure DNS zone progressivesurface.com; board CNAME → csm-board.azurewebsites.net
SSLWildcard cert *.progressivesurface.com from ps-certificates-kv, bound SNI
StatePer-user JSON under /home/data/users/<oid>.json (Azure Files-backed). Phase 5e target: Azure Table Storage
CI/CDGitHub Actions on psi-internal self-hosted runner; identity-based deploy via az login --identity
MCPcsm-board-mcp (Python 3.11+, mcp SDK, stdio) in csm-board/mcp/; reuses the agent’s MSAL cache silent-only

Status

PhaseWhatState
1Repo scaffold
2Entra app reg + MSAL frontend
3Backend JWT validation + csm agent harness
4UI: Projects + By Attention; spawn proxy; auto-clone; org-wide GHE discovery; manual projects + todos
5a/bAzure App Service + identity-based GHA deploy
5cCustom domain board.progressivesurface.com + wildcard SSL
5dExternal-access hardening — Conditional Access + Front Door/WAF origin-lock (ADR-0028, #28). Supersedes the old private-endpoint plan🚧 design done, provisioning operator-gated
5eState store → Azure Table Storage (currently /home/data JSON)🚧
5gMS Planner overlay⏸ deferred — local Todos cover the personal use case today

Privacy

Each developer’s data is scoped to their Entra OID server-side; one user’s projects are NOT visible to other users today. The agent never sends Claude transcript contents to the board — only metadata (project names, git state, session IDs, prompts as the user wrote them).

If a “team view” is added later (everyone sees what’s open across the team), it’ll be opt-in at the agent and per-user-data level — not a default surface.