CSM Board
The PSI Claude Work Board. A hosted dashboard for every developer’s Claude Code work across all of PSI — open Issues / PRs / Azure DevOps, spawned Claude sessions, per-project todos, and one-click start Claude on any issue.
- Live: board.progressivesurface.com
- How to join: install
PSI.CSMvia WinGet, runcsm agent. - Repos: hosted board →
ProgressiveSurface/csm-board. Per-developer agent →ProgressiveSurface/claude-session-manager.
What it shows you
Sign in at board.progressivesurface.com with your PSI account. The top nav has Today (personal todos due now, cross-project), Projects, Epics, By Attention, and Mission (the master agent). The main ones:
Projects (default)
Repo-list sidebar (filter, sort by open work / name / activity, “hide empty” toggle) → detail pane for the selected project:
- Repo header — branch, dirty / unpushed counts, last commit
- Open Pull Requests — pulled live from GHE via
gh pr list. Each PR row shows “opened by @<login>” alongside its draft flag, labels, and assignees. - Open Issues — same, via
gh issue list. If the agent can’t reach GHE (auth, rate-limit,ghnot on PATH, timeout), the pane shows a ”⚠ Couldn’t fetch issues from GHE: <reason>” warning instead of a misleading empty list — so a genuine “0 open issues” reads differently from a failed fetch. - Azure DevOps Work Items — active items from
ProgSurface / Pro App development, matched to projects by tag / title (or via an optional~/.claude/board-ado-map.jsonon the agent). ADO is best-effort: ifazis missing or broken it degrades to empty silently and never blocks the Issues / PRs above it. - Todos — per-project, per-user, persisted on the board
- Spawned Sessions — live status of any Claude sessions you’ve started on this project, with one-click Open in Claude Code ↗ and ↻ Reconnect ↗ buttons
By Attention
Five-column kanban of every Claude session across all your projects, ordered by attention needed:
Mid-task (interrupted) · Waiting on you (ended on a question) · Uncommitted (work not yet pushed) · Idle (project with no sessions yet) · Done
For Monday-morning triage.
Stuck sessions clean themselves up. The board automatically retires a session that’s gone quiet — one whose workstation stopped reporting it (~15 min), or one that’s still “running” but has made no progress for hours (e.g. spawned, printed one line, then parked). These flip to ended so they stop cluttering the live view. Nothing is killed on your machine: the session’s bridge URL survives, you can ↻ resume it, and if you reopen its bridge and it actually starts working again, the board brings it back to running on its own.
Epics
An epic is a goal or initiative that groups work spanning repos — one level above a project (which on this board means a single repo or a manual project). Use epics to organize related todos and GHE issues under a shared objective and rank them by what matters now.
Each epic has:
- a priority (P0–P3, same scale as todos — click the chip to cycle),
- a status lifecycle — planned · active · paused · done · cancelled,
- a manual order (rank the list with the ▲▼ buttons; top = work first),
- a progress roll-up — done vs. total across its todos and issues,
- an optional primary repo — see “Giving an epic a repo” below.
Membership is set from where the work lives, not from the epic form:
- Todos can be created directly on an epic — expand the epic and type a
line into “Add a todo to this epic”; it’s captured under the goal with no repo
required. These repo-less todos still schedule and surface in Today (with
a
(no repo)chip). They can ▶ start a session right away: with no repo bound the session runs in a scratch workspace; bind a primary repo to run in real code instead (see below). - Existing todos also join via the 🎯 epic chip on any todo row (in Today, in a project’s Todos, or inside the epic itself). Pick an epic to file it, or “remove from epic” to unfile. Deleting an epic never deletes its todos — they’re just unfiled.
- Issues / PRs join via the ”+ epic” link on any issue row in a project’s issue list. The epic stores a reference; the board resolves each attached issue’s live state against the current overlay — a green dot means still open, grey means closed/gone, amber means “no agent currently reports that repo”.
Click anywhere on an epic’s header row to expand it (not just the little chevron) and see/curate its members; the same spawn / promote / priority / schedule controls work on the todo rows there. A hover ✎ renames it.
Starting work — with or without a repo
A Claude session needs a working directory, but a repo-less epic todo doesn’t need you to set one up first:
- ▶ start on any repo-less epic todo starts a session immediately. With no
repo bound to the epic it runs in a scratch workspace (a plain
C:\git\<epic-slug>\folder the agent creates — no GHE repo, no clone), pre-briefed with the epic goal as context plus the todo as the task. - ▶ start whole epic (in the Todos header) starts one session that works all open todos on the epic together under the goal.
- Starting work on a planned epic auto-moves it to active; when every todo is closed, the board offers a one-click “mark epic done”.
Giving an epic a repo (optional)
Binding a primary repo is optional — it gives the epic a real code home so sessions run there instead of a scratch folder. Expand the epic and use the Repo row:
- Create repo — makes a brand-new private GHE repo under
ProgressiveSurface(name pre-filled from the epic title; edit freely). Your connectedcsm agentrunsgh repo create … --clone, so the repo is created on GHE and cloned to your machine in one step, then bound. (Needs a recentPSI.CSMagent build; until that ships, use Link existing.) - Link existing — pick any repo the board already knows and bind it.
- unlink — detach the repo (the repo itself is never deleted).
Once a repo is bound, ▶ start on the epic’s todos runs there instead of a scratch workspace.
Sessions on an epic
Expand an epic and it tracks the sessions spawned for it — both those started
from its todos and the “whole epic” session. Each shows ↗ open (jump back
into a live session’s bridge) or ↻ resume (relaunch an ended one via
claude --resume). Because it reads board session state, the list survives
page reloads and dropped bridges, so you can always get back into the work.
Spawning Claude on an issue
Click ”▶ start Claude ↗” on any open Issue or PR. The board sends a
spawn request through your connected csm agent, which runs
claude --remote-control "<project> #<issue>" in the repo directory on
your machine, pre-briefed with the issue title and URL.
A new browser tab opens to https://claude.ai/code/session_<id> —
that’s the live session, driveable from any device you’re signed into
claude.ai on. The actual claude process runs on your workstation,
under your Anthropic auth, with full access to your file system —
the board never runs Claude itself.
Repo not cloned on your machine? The button reads
”↓ clone & start ↗”. Click it, and the agent runs
gh repo clone ProgressiveSurface/<repo> into your project root
(typically C:\git\<repo>) before spawning. Single click.
Alt-click for an unattended (claude -p) headless run instead —
useful for “go fix this and report back” workflows. Bounded by
--max-turns 15 and --max-budget-usd 2.00 by default.
What every spawn prompt carries
Whatever you (or the Board Manager) type as the task, the board wraps it with standing context before the worker sees it, so a cold session starts oriented:
- About this project — a short, board-maintained blurb of current
orientation a worker should know before planning (e.g. “active pilot —
the agent half ships to
main, not the release branch”). Edit it with the about button on a project card; the Board Manager can keep it current too (set_project_about). It’s deliberately short — a few sentences, not a second onboarding doc. - Read the repo’s docs first — a standing pointer telling the worker
to read the repo’s
CLAUDE_ONBOARDING.md(orCLAUDE.md) for orientation andBUILD_LOG.mdfor what shipped recently. The build log is pointed to, never pasted in — it belongs in the worker’s first read, not its prompt. - Ownership framing + PSI standards — chain-of-custody framing (own pre-existing code, don’t stop at “not my code”) and the GHE-only / compliance / finish-to-deployment baseline.
The “About” blurb is the half you maintain per project; the rest is automatic. Master-spawned workers additionally get the non-negotiable git guardrails (see below).
Picking a model (difficulty)
Next to every start / resume button is a small model picker. It
sets the Claude model the spawned session runs on (claude --model),
framed by how hard the task is:
| Pick | Model | Use for |
|---|---|---|
| Auto | (Claude Code default) | When you don’t care — leaves the choice to Claude Code |
| Fable · hardest | Fable 5 | Architectural / multi-repo / subtle work where a wrong call is expensive |
| Opus · complex | Opus | Real feature work or debugging |
| Sonnet · std | Sonnet | Well-scoped, clearly-specified work — a solid default |
| Haiku · simple | Haiku | Trivial mechanical edits: renames, doc tweaks, one-liners |
Your choice is remembered across buttons, so set it once and every spawn you start uses it until you change it. The session card shows which model a session was launched on. Picking a smaller model for simple work saves cost and time; reserve Fable/Opus for genuinely hard tasks. Resuming a session can pick a different model than the original run — handy to escalate to a bigger model when work turns out harder than expected.
The same picker drives the Board Manager launch in Mission Control:
choose which model the manager session itself runs on. The Board Manager,
in turn, picks an appropriate model for each worker it dispatches (it
assesses the task’s difficulty and passes a model to its spawn tools).
Spawn progress
While a spawn is in flight, the row under the issue shows a live
progress chip (routing → queued → cloning → launching → bridge-wait)
plus a heartbeat (“Still waiting on <machine>… 32s”) so you can tell
the difference between “the agent is busy cloning a 500 MB repo” and
“the agent fell over”. A fresh clone of a large PSI repo takes the
full ~30–60 s; the heartbeat updates every 5 s so you know the channel
is alive.
The same progress signal surfaces on the Reconnect, Resume, and “Improve the Board” buttons.
Spawning Claude from a Todo
Click ”▶ start” on any open Todo to spawn a Claude session with the todo text as the first prompt. Once the session is live, the row flips to show a pulsing live session chip and the button switches to ”↗ open” linking straight to the live bridge — so the todo is visibly tied to its session instead of orphaned next to a “Spawned Sessions” panel below.
Bundling multiple items into one session
Each open Issue, PR, and Todo row has a small select checkbox (labeled so it isn’t confused with the row’s ✓ Close button, which marks a single todo done). Tick any combination within the same project and a green action bar appears at the top of the pane:
- ▶ Start Claude (N) — spawns one session with a single onboarding prompt that lists every selected item and asks Claude for one combined plan. Faster than launching N separate sessions, and the agent only clones the repo once.
- ↳ Assign to session ▾ — drops down a list of live sessions in the same project. Picking one routes the bundled items into that session (see “Assigning work into a running session” below). The button only appears when there’s at least one running session to target.
- ✕ clear — drops the selection.
Bundling is single-project on purpose — a Claude session runs in one repo, so Today’s cross-project view doesn’t expose bundle checkboxes. The selection is per-project too; switching projects in the sidebar resets it.
Assigning work into a running session
The board has two paths for delivering new work into a session that’s already running:
-
Resume-with-prompt (always available). Spawns
claude --resume <id> "<bundled prompt>". The previousclaude.ai/codebridge URL goes away, a fresh one opens with the conversation continued. Works whether the target session is busy or idle. This is what ”↳ Assign to session” does today. -
Stop-hook inbox (pilot, csm-board only at the moment). Each project that opts in has a
.claude/hooks/check-inbox.pyStop hook plus a.claude-inbox/directory. Writing a.mdor.txtfile there causes the running session to pick it up as a new user message at its next turn boundary — no new process, no bridge churn.Operable by hand today —
echo "do the thing" > .claude-inbox/$(date -u +%Y%m%d-%H%M%S)-note.mdin any csm-board checkout. The full board-driven flow (UI button → csm agent → file write on the right machine) is tracked work; see open Issues oncsm-board.
The longer-term plan is for ”↳ Assign to session” to pick the path
automatically: inbox when the target session is busy (no bridge
disruption), resume-with-prompt when it’s idle (the Stop hook
won’t re-fire on its own).
The agent-side write path landed in PSI.CSM 1.10.19 —
POST /api/sessions/{id}/inject body {"text": "..."} routes an
inbox_inject over WS to the user’s agent, which drops a timestamped
.md into the right project’s .claude-inbox/. Sessions spawned from
the board are told (when the target project has the inbox installed)
to expect mid-conversation inbox messages and fold them into the
current plan instead of treating them as a new conversation.
The ”↳ Assign to session” dropdown now picks the path automatically (shipped 2026-05-28). Each live session in the dropdown carries a small chip telling you up-front which path the click will use:
- via inbox (blue) — target session’s
status === "running". The agent writes into.claude-inbox/; the Stop hook re-prompts at the next turn boundary. The existingclaude.ai/codebridge URL stays alive. - via resume (amber) — anything else (idle / starting / ended /
unknown). Spawns
claude --resume <id> "<prompt>"and opens a fresh bridge. The previous bridge URL goes away.
On an idle session inbox doesn’t work (the Stop hook can’t re-fire on its own to drain a queued file), so the UI deliberately routes through resume — predictable + immediate beats “queued indefinitely”. The same chip-and-tooltip pair appears in the Todos bundle bar.
All four delivery surfaces into a running session — this UI dropdown,
the MCP inject_into_session(run_id, text) tool, the raw POST .../inject REST endpoint, and a hand-written file in the inbox dir —
converge on the same .claude-inbox/<utc-ts>-<slug>.md write, drained
by the same Stop hook.
Session-targeted drops (csm-board#13). The .claude-inbox/ is
project-scoped: any running session in that repo drains an ordinary
<ts>-<slug>.md file at its next turn boundary, which is fine when the
work just needs some session in the project. But the master agent’s
event-nudged wake targets one
specific session — the master — and must not be intercepted by a sibling
worker in the same repo. Those drops are named <ts>-FOR_<run_id>.md;
the Stop hook only drains a FOR_<run_id> file when its own
CSM_RUN_ID matches, and leaves it untouched otherwise so the addressed
session still gets it. The FOR_ marker is uppercase + underscore — which
the slug path can never emit — so ordinary messages are never mistaken
for targeted ones. Agent-side writer shipped in PSI.CSM 1.11.12.
Background: Anthropic’s claude inject feature request
(#24947)
covers the same need with a first-party CLI flag. When it ships, the
inbox pilot collapses into a thin wrapper around it.
Reporting work done — the outbox (.claude-outbox/)
The inbox delivers work into a session; the outbox is the return path — a finished session telling the board “I’m done.”
When a session completes, it drops a marker in its repo’s
.claude-outbox/ directory. The csm agent watches every project’s
outbox, drains the marker, and relays it to the board, which then:
-
flips the session card to a green ✓ Session reported done banner showing the one-paragraph summary the session wrote; and
-
if the marker asked to close the originating issue, surfaces a one-click ✓ Close
<repo>#Nbutton. Closing the GHE issue posts the session summary as a closing comment.Two paths to that close: the user clicks it (always available), or the master agent does it autonomously at spawn_capped+ tier — but only when the session’s outbox marker set
close_issue: true(so the trust signal still comes from the session, not the master’s judgment). At the default Groom tier and below, the master never touches GHE — the click stays the human’s.
Close is a true close. Closing a session — either the row’s ✓ Close
button or the ✓ Close <repo>#N issue button — now also reaps the
session on your workstation: it kills the claude process and closes the
terminal window that was opened for it, not just hides the card. (Each
board-spawned session runs in a console the agent owns via conhost, so the
window can be closed deterministically regardless of whether your default
terminal is Windows Terminal or the classic console host.) If your agent is
offline or too old to reap, the close still applies as a hide and the window
may linger until you close it manually. Reopening a closed card brings the
card back but does not respawn the process.
Sessions spawned from the board already know to do this: the spawn
prompt tells them to write .claude-outbox/done.json as their final
action, in the form:
{
"summary": "One short paragraph on what got accomplished.",
"close_issue": true,
"issue_number": 2
}(close_issue/issue_number are optional — included only when the
issue is genuinely resolved. A plain .md/.txt file is also accepted
as a summary-only marker.)
Git self-report (git_actions). Sessions spawned by the master
agent are asked to add an honest summary of their git activity to
done.json:
"git_actions": {"branch": "feature/x", "commits_added": 3,
"pushed": true, "push_target": "feature/x", "force": false}The board uses it as a master-autonomy guardrail (csm-board#10 layer 2):
for master-spawned sessions only, a force-push (force: true), a push
to main/master, or a commit onto main/master flags the card as a
policy violation — the issue-close is withheld, the master may not
auto-close or groom the card away, and a red policy_violation row lands
in the WORM audit log. User-direct sessions report git_actions too but
are never flagged (this is a master-autonomy guardrail, not a global
lockout). The field is optional; omitting it just skips the check.
Blocked, not done. If a session stops on a genuine blocker it writes
.claude-outbox/blocked.json instead — {"summary": "...", "blockers": ["..."]}. The board flags the card ⚠ Session reported blocked (amber)
rather than “done” and leaves it open; no issue close is offered.
Completion self-check. A one-shot Stop hook
(.claude/hooks/check-completion.py) nudges a session, the first time it
tries to end without a marker, to self-review: was the original request
carried to deployment standards? If done → run /wrap-up + write
done.json; if stuck → write blocked.json. It fires once per session
(a per-session sentinel guards it) — a reminder, not a gate, so it never
loops or traps an interactive session.
Operable by hand for testing —
echo '{"summary":"manual test"}' > .claude-outbox/done.json in a
checkout the agent is watching; the session card flips to “reported
done” within one agent state cycle (~30 s).
Fully wired end-to-end as of agent 1.10.18
(winget upgrade PSI.CSM --source PSI). The watcher
(claude-session-manager board/outbox.py) drains every project’s
outbox on each state-pump cycle (~30 s), so a marker dropped now
surfaces as a “reported done” banner within one cycle. Older agents
still work — they just never relay session_done, and the board
handler stays dormant for that user until they upgrade.
Work-item lineage (which session served which issue/todo)
A session spawned from a Todo or a GHE Issue now carries a durable link back to the work item it serves (csm-board#14). The link is recorded board-side at spawn — it survives an agent disconnect, a board redeploy, and a session resume — so the connection is never lost.
What it buys you:
- Session → work item. Each session card shows a chip naming the Todo or Issue it serves (the issue chip links straight to GHE).
- Work item → session. A Todo or Issue currently being worked shows a pulsing live session chip with an ↗ open link to its bridge — the reverse of the card chip.
- Resolve-on-done. When a session reports done (its
.claude-outbox/done.json), a session linked to a Todo auto-completes that todo — finishing the session resolves its work item instead of silently leaving it open. A session linked to a GHE Issue keeps the existing rule: if its marker setclose_issue, the ✓ Close<repo>#Nbutton is staged for your (or the master’s, atspawn_capped+) confirm; the link just makes the association explicit even for summary-only markers. A blocked report resolves nothing.
Sessions started outside the spawn flow can be retro-linked right from the
session card (csm-board#19): an unlinked card shows a ⛓ Link to… control
that opens a picker — choose any project, then one of its open issues or todos
— and a linked card shows change · unlink next to its chip. The picker
defaults to the session’s own project but lets you target any project, so a
mis-linked session can be relinked across repos or cleared entirely. (Under the
hood it’s POST /api/sessions/{id}/attach; the reverse index is queryable at
GET /api/sessions/for-work-item.) Sessions from before this shipped simply
carry no link and behave exactly as before.
The Board Manager dispatch path links too (csm-board#17). The MCP
spawn_session / resume_session tools take an optional work_item
argument ({kind:"issue"|"todo", number|todo_id, project_key, slug?}); the
Board Manager passes it whenever it dispatches a worker to serve a known
issue or todo, so dispatched sessions show their connection just like
UI-spawned ones. As a fallback, when no explicit work_item is given the
board infers an <owner/repo>#N reference from the spawn prompt and links
that — so even a dispatch that only names its issue in prose shows up under
the issue.
Reconnecting to an existing session
Spawned-session rows show a ↻ Reconnect ↗ button only once the
session is no longer live with a working bridge — i.e. it ended
(completed / failed) or its bridge URL was lost. While the session
is running or starting with a bridge, the row shows the single
Open in Claude Code ↗ button; Reconnect would be confusing
because the session is already reachable.
Reconnect tells the agent to
claude --resume <id> --remote-control "<name>", which publishes a
fresh claude.ai/code bridge URL and opens it.
A resumed session stays in the same Spawned Sessions row — the
board treats records that share a claude_session_id as one logical
session, so closing a terminal and reconnecting later doesn’t pile up
duplicate cards.
Session status is board-authoritative
The board — not your workstation agent — owns each session’s lifecycle.
A session moves spawning → running → ended/failed, and once it reaches
a terminal state the board never lets a stale agent message flip it back
to running. This fixes the old annoyance where a session that had
clearly finished still showed running forever, and dead ones rotted
to an unknown state.
Two things keep the board honest:
- Heartbeat. Every agent update refreshes a “last seen” timestamp on
the session. If a
runningsession goes quiet for ~15 minutes (its terminal was closed, the machine slept, the process crashed), the board marks it ended on its own — no manual cleanup needed. - Reconnect reconciliation. When your agent reconnects (e.g. after a
reboot or a network drop), it tells the board which sessions it still
has. Any session the board thought was
runningbut the agent no longer tracks is marked ended. Sessions still alive stay live — and any session the board had ended during a brief disconnect but that your agent still has running is revived back torunningautomatically. - Blackout-safe. A transient WebSocket drop used to look, to the board,
like every one of your sessions going quiet at once — which could falsely
end live sessions. The heartbeat cleanup now only ever ends a session when
your agent is actually connected and simply isn’t reporting it; while
your agent is offline, your sessions are left untouched until it
reconnects and reconciles. Keep your
csm agenton 1.11.11+ for the reconnect reconciliation to work both ways.
You don’t configure any of this — it just keeps the board’s session view true to reality. A session that ended this way can still be resumed with ↻ Reconnect ↗ like any other.
On mobile
The board is responsive and usable from a phone. The header wraps onto multiple rows instead of overflowing; the Projects view becomes a master/detail flow — you see the full-width project list, tap a project to open its detail, and tap ← All projects to go back; the By Attention kanban collapses from five columns down to one or two.
Open in Claude Code ↗ opens a normal browser tab, not the native
Claude mobile app. The bridge is a claude.ai/code URL that the app
would otherwise claim via deep linking, but the app can’t reach a
developer’s locally-tunnelled bridge — so the board forces the link to
stay in the browser.
Manual (non-repo) projects
Click ”+ new project (non-repo)” in the Projects sidebar to create a project that doesn’t correspond to a GHE repo — useful for proofs of concept, scratchpads, ad-hoc tracking. Manual projects show Todos in place of Issues/PRs. Delete them from the bottom of their detail pane.
All repos vs Local only
The header toggle 🌐 All repos / 📍 Local only controls whether the board shows every PSI GHE org repo (default — useful for PM / discovery) or just the ones cloned on at least one of your connected agents. Persisted in your browser; doesn’t affect what your agent streams.
Drive the board from Claude Code (MCP)
The board ships an MCP server (in the mcp/ directory of the
csm-board repo) that exposes every board action as a tool, so a
“project-manager” Claude Code session can do anything you’d do in the
web UI — read and write todos, manage projects, list and close
sessions, and even spawn or resume Claude sessions on your agent.
Setup (requires a logged-in csm agent on the same machine):
cd csm-board/mcp
pip install -e .It’s pre-wired into Claude Code via the repo-root .mcp.json. The
server reuses your agent’s token cache (~/.claude/csm-board-token.json)
silent-only — it never prompts for a login of its own. Because Claude
Code launches it as your OS user, it can only ever read your own token;
no other user’s board is reachable. If you’re not signed in, tools fail
with a hint to run csm agent first.
Configure (Claude Code)
Pre-wired via the repo-root .mcp.json:
{
"mcpServers": {
"csm-board": {
"command": "python",
"args": ["-m", "csm_board_mcp"],
"env": { "CSM_BOARD_URL": "https://board.progressivesurface.com" }
}
}
}CSM_BOARD_URL is optional — defaults to production; point it at
http://localhost:8000 to develop against a local backend.
Tool reference
All ~45 tools, grouped by area. Everything is scoped to the signed-in user (the OID behind the reused token):
| Area | Tools |
|---|---|
| Identity / health | whoami, board_health |
| Agents | list_agents, get_agent_logs |
| Projects | list_projects, list_manual_projects, create_manual_project, delete_manual_project |
| Project flags | snooze_project, pin_project, set_project_note |
| Todos (read) | list_todos (per project), list_todos_today (cross-project) |
| Todos (write) | add_todo, update_todo, complete_todo, delete_todo, promote_todo_to_issue |
| Epics | list_epics, get_epic, create_epic, update_epic, create_epic_repo, delete_epic, reorder_epics, assign_todo_to_epic, add_issue_to_epic, remove_issue_from_epic |
| Sessions | list_sessions, tail_session, close_session, close_session_issue, reopen_session, spawn_session, resume_session, terminate_session, inject_into_session |
| Change feed (WORM) | get_changes, verify_audit |
| Master agent | get_master, start_master, stop_master, set_master_caps, act_as_master, act_as_user, advance_master_cursor, get_master_log, post_master_log |
| Actions log (WORM) | get_actions, verify_actions |
| Digests | list_digests, post_digest |
| Approvals queue | list_approvals, enqueue_approval, approve_proposal, deny_proposal, mark_proposal_executed |
spawn_session / resume_session consume the board’s NDJSON progress
stream and block until the agent returns a result or the board’s
5-minute deadline trips, then return
{ok, run_id, bridge_url, status, stages}. bridge_url is the
claude.ai/code link to drive the spawned session.
promote_todo_to_issue, spawn_session, and resume_session all
route through a connected csm agent — they fail with a 503-style
error if none is online (check list_agents).
tail_session(run_id, n=1) returns a running session’s latest
assistant message — what it’s currently doing, not just its status —
so you can confirm progress, spot a stuck or wandering session, or report
what each session is working on without opening its bridge URL. It’s a
cheap read of board state (no agent round-trip): each connected csm agent tails its live sessions’ transcripts and pushes the snapshot every
~30 s, so the response includes stale_seconds telling you how fresh it
is. The same snapshot shows on the session card as a pulsing “Doing
now” block while a session is live.
Master agent (autonomous grooming)
The board ships a Board Manager subagent at
.claude/agents/board-manager.md in the csm-board repo. It’s a Claude Code
agent that uses the MCP server to groom the board on a schedule — read the
change feed, close outbox-confirmed-done sessions, reprioritize today’s
todos, post a digest, and enqueue spawn proposals for human approval.
Autonomy tiers (set with start_master(tier=...)):
| Tier | Reads | Reversible writes | GHE writes + terminate | Spawn |
|---|---|---|---|---|
observe | ✓ | ✗ | ✗ | ✗ |
groom (default) | ✓ | ✓ close_session (no pending_issue), reopen, todo CRUD, pin/snooze/note, digest, log, cursor, enqueue approval | ✗ | ✗ |
spawn_capped | ✓ | ✓ | ✓ close_session_issue (when pending_issue set), terminate (when closed/terminal), promote_todo_to_issue, inject_into_session | ✓ within parallelism + repo-allowlist caps |
autonomous | ✓ | ✓ | ✓ | ✓ without per-action approval |
Trust-the-marker, not blanket bans. At spawn_capped+ the master can close a GHE issue or terminate a process — but only when the session itself set the trust signal:
close_session_issuerequires the card to carrypending_issue— the session’s.claude-outbox/done.jsonasked for the close viaclose_issue: true. Master can’t fabricate that. The full round-trip (postdone_summaryas closing comment, close GHE issue, close card) runs in one call via the user’sghauth on their agent.terminate_sessionrequires the card to beclosedor in terminal status (ended/completed/failed). Never reaps liverunningwork. The natural lifecycle: close (or close_session_issue), then terminate.promote_todo_to_issueis tier-only; legitimate user-delegate work.inject_into_sessionis tier-only (the inbox writer); lets the master push new turns into a running session via.claude-inbox/. Used to: answer awaiting-on-yousession’s question, fold a new P0 todo into a running worker’s plan, or relay user redirection (“focus PRGJSMES today”) to active workers. Half of the master↔worker channel (the outbox is the other half); without this the master could only spawn new workers, never steer existing ones.
Groom tier is the conservative default — the master can leave it on indefinitely without GHE side effects. Opt up to spawn_capped to grant end-to-end session lifecycle ownership.
WORM accountability. Every master action lands in an append-only,
hash-chained action log per user with actor=master:<oid> and
authorized_by=tier:<tier> (or approval:<id>). get_actions(actor=...) /
verify_actions give you full audit. The change-event log (events.jsonl)
sits alongside; the master’s actions show in both — change for “what
changed” with no actor, action for “who did it under what authorization”.
Approval queue. When the master wants to spawn, it enqueue_approval
with a kind=spawn proposal (summary + est_cost + payload). You approve
in the UI or via MCP (approve_proposal); on the master’s next wake it
consumes the approved entry, spawns, and marks it executed. The board
refuses any master spawn that would exceed parallelism_cap or target a
repo outside repo_allowlist.
Scheduling. The user picks: /schedule for remote cron, /loop for
in-session repeats, or ad-hoc. The master state persists per-user, so each
wake reads master.cursor and get_changes(since=cursor) rather than the
full session list. Cold-start is bounded (the feed returns a compact
digest, not 200 records).
Orchestration tree. Mission Control renders the live session hierarchy:
the master at the root, the workers it spawned beneath it, and — when a
worker spawns its own sub-workers — those sub-spawns nested under their
parent, n levels deep. The nesting is driven by a board-owned
parent_run_id on each session: a spawning session advertises its own board
run id (via the CSM_RUN_ID env the agent injects) on the spawn call, and the
board threads it onto the new session. Sessions with no resolvable parent fall
into the two top-level groups — under the master root (if master-spawned) or
the “Your direct spawns” cluster.
Title-bar control (every view). You don’t have to open Mission Control to reach the Board Manager — the main title bar carries a persistent control (csm-board#30):
- When a master is running it shows a green Board Manager indicator that links straight to the current run’s bridge URL — one click opens the live master session in Claude Code to chat with it or interject directions. It always follows the latest run.
- When none is running it shows a Launch Board Manager button that
spawns a real
claude --remote-controlmaster session (the same boot prompt and start path as Mission Control’s “spawn master session” — shared, never duplicated), then flips to the link state on the next poll. - If your
csm agentis offline the launch can’t host the session and 503s; the bar surfaces a plain “agent offline — startcsm agent” hint inline instead of breaking. If the master’s session dies under you, the control offers a Relaunch instead.
Design: docs/MASTER_AGENT.md in the csm-board repo.
Digest → Teams (PSI Notify Bot)
Each groom cycle the Board Manager posts an executive digest (the
post_digest MCP tool → POST /api/digests). Optionally, the board pushes that
same digest into a persistent Microsoft Teams chat — “CSM Board Manager” —
via PSI’s shared Notify Bot, so exec
summaries and Action needed items reach Adam in Teams without opening the
board (csm-board#24).
- Second sink, never the primary. The board-side digest is unchanged and authoritative; the Teams push is an additional, best-effort sink. It’s fail-safe: any notify error (flag off, missing secret, token/HTTP failure, timeout) is caught and logged — it can never break the groom or the board digest.
- Config-gated by the app setting
CSM_TEAMS_NOTIFY(defaults OFF in code). Provisioned and live in production since 2026-06-02 (=1). - Adaptive Card by default. The digest renders as an Adaptive Card: a title,
an “as of seq N” subtitle, the repo/epic body, and the contract’s closing
Action needed section lifted into its own attention-styled container.
Set
CSM_TEAMS_NOTIFY_FORMAT=markdownto fall back to a plain markdown message. - Deep-link buttons (csm-board#33, Tier 1). The card carries
Action.OpenUrlbuttons so you can act on a digest from Teams: “Open on the board ↗” (the Mission view, via?view=mission) plus one button per GHE issue/PR the digest references (repo#N/owner/repo#N→ the GHE issue). These are links only — no inbound bot endpoint, so the shared Notify Bot stays notification-only. Two-way (reply-to-steer) is Conversational Teams below (csm-board#41). Board base URL isCSM_BOARD_PUBLIC_URL. - De-duplicated. An identical back-to-back digest is not re-sent (grooms are ~2 h apart, so volume is low).
- Python, not PowerShell. Every other Notify-Bot consumer uses the
PSI.NotifyPowerShell module; csm-board is Python/FastAPI on Linux and cannot, soapi/notify.pyports the token flow + Bot Framework send call to Python (httpx). It is send-only — the persistent chat is bootstrapped out of band by an operator (the three Graph chat-creation pitfalls live in that PowerShell bootstrap, not here). - Secrets come from
ps-certificates-kv(psi-notify--*) as Key Vault references on App Service app settings, resolved by the csm-board App Service managed identity. Nothing is hardcoded and the bot token is never logged.
Conversational Teams (two-way) — csm-board#41
Takes the integration from send-only to a bi-directional chat loop: message
the Board Manager from inside the “CSM Board Manager” chat — ask questions,
steer it, act on Action-needed items — and get its replies back in the same
thread. LIVE as of 2026-06-24 — the dedicated bot is provisioned and the
round-trip is verified end-to-end (DM the Board Manager, it replies in-thread).
The dedicated bot is csm-board-bot (App ID 2bb0664c-5ec8-4390-95af-1d08c12fb6a8),
single-tenant, messaging endpoint POST /api/teams/messages, gated by the app
setting CSM_TEAMS_INBOUND=1 (returns 404 when unset). Setup steps in the
operator runbook.
- Dedicated interactive bot, not the shared one. The shared PSI Notify Bot
stays
isNotificationOnly; csm-board stands up its own bot so making it interactive can’t regress deploy-proapps/egnyte/prgjsmes. The dedicated bot owns csm-board’s chat in both directions (soAction.Submitcard replies route to our endpoint). Decision rationale in the #41 design comment. - Inbound endpoint
POST /api/teams/messageson the existing FastAPI app — not Entra-authed. Each activity carries a Bot Framework JWT validated per request (api/teams_auth.py: RS256/BF-JWKS, audience = our bot app id, BF issuer, serviceUrl host allowlist + channel binding so the bot’s bearer token can never be sent to a spoofed host). A third non-Entra client class alongside the SPA and the agent — see ADR-0028; it lives at the app layer, never behind an Entra edge pre-auth. - Identity =
from.aadObjectId. The Teams sender’s Entra Object ID is the board OID, so an inbound action only ever steers that user’s master and touches that user’s partition (blast-radius scoping for free). NoaadObjectId→ declined. - Routing + active-wake. Delivery depends on the master’s liveness (mirrors
the UI’s
chooseAssignPath): arunningmaster gets aFOR_<run_id>file in its.claude-inbox/(drained at the next turn; bridge survives); a dormant master is resumed with the message (claude --resume <claude_session_id> …) so it wakes and replies in seconds — an idle session’s Stop hook won’t re-fire to drain an inbox file, and you can’t inject input into a live detached remote-control session, so resume is the only re-prompt lever. Either way the master answers via thereply_to_teamsMCP tool →POST /api/teams/reply→ the reply threads back into the captured conversation. The conversation reference is stored onmaster.teams_conversation(survives run_id churn); resume keys off the stableclaude_session_id, not the volatile board run_id. - Structured quick-reply verbs (Phase B).
Action.Submitbuttons on digest cards (only when the interactive bot is provisioned).focus <repo>is a master steer (→ inbox);snooze <project>is a direct board write that mirrors the UI click (attributed to the user,authorized_by=teams). Unknown verbs fall back to the master inbox.resolve+spawn-on-itemare Phase C. - Fail-safe + dark. Disabled/unprovisioned ⇒ the endpoint returns 404 (reveals nothing); the reply path never raises.
Provisioning (one-time, Azure/Teams-side — done 2026-06-02 for csm-board#24):
- Key Vault access — access policy, not RBAC.
ps-certificates-kvruns in access-policy mode (enableRbacAuthorization: false), so the “Key Vault Secrets User” RBAC role has no effect; the csm-board App Service managed identity was grantedget/liston secrets viaaz keyvault set-policy. - Network path to the vault.
ps-certificates-kvisdefaultAction: Denyand (per the 2026-06-01 KV/winget incident) App Service KV-reference resolution does not get theAzureServicesfirewall bypass — it needs to reach the vault’s private endpoint (ps-certificates-kv-pe,10.160.140.23). csm-board was given regional VNet integration intops-vnmain/ps-webapps(the subnet allowlisted on the vault); RFC1918 traffic routes through the VNet by default and DNS resolves the vault to the PE IP via theprivatelink.vaultcore.azure.netzone (Azure DNS + the AD-integrated zone on PS-AZ-DC01). KV references then showResolved. - Teams chat. The “CSM Board Manager” persistent group chat was
bootstrapped with
psi-notify-bot/scripts/Initialize-NotifyChat.ps1(members: ADevereaux + PowerOperative — a group chat needs ≥2 humans), the Notify Bot installed in it, and its id stored aspsi-notify--chat-csm-board. - App settings. Five settings on the csm-board App Service:
CSM_TEAMS_NOTIFY=1plus the fourPSI_NOTIFY_*Key Vault references (tenant-id / client-id / client-secret / chat-csm-board).
Example: a PM-Claude triage pass
You: Use the csm-board tools to plan my morning.
Claude→ list_todos_today() # 20 todos across projects
→ list_sessions() # what's mid-task / waiting on me
→ "3 sessions are waiting on you. redbook-web has 4 unscheduled
P1 todos. Want me to schedule them and spawn the top one?"
You: Yes.
Claude→ update_todo(scheduled_for=today, priority=1) ×4
→ spawn_session(project_key="redbook-web", prompt="<top todo>")
→ "Spawned — bridge: https://claude.ai/code/session_…"
How to join (for a PSI developer)
# 1. Install (or upgrade) the agent
winget install PSI.CSM --source PSI
# or: winget upgrade PSI.CSM --source PSI
# 2. Start the agent (first run prompts for a device code)
csm agentComplete the device-code prompt in any browser; the token caches to
~/.claude/csm-board-token.json and subsequent runs are silent. The
agent connects over outbound WSS to board.progressivesurface.com,
registers under your Entra identity, and starts streaming your
projects + sessions every 30 s.
Then open board.progressivesurface.com in any browser; your projects appear within ~30 s.
Optional: opt out of org-wide discovery
By default the agent reports every PSI GHE repo (whether cloned on your machine or not) so the team’s PM view is complete. To restrict your agent to just what you’ve cloned:
BOARD_LOCAL_ONLY=1in the agent’s environment, or"local_only": truein~/.claude/board.json
(Equivalent to the 📍 Local only UI toggle, but applied at the agent level.)
Architecture
Azure · PS-WEBAPPS RG Your workstation
┌─────────────────────────────────┐
│ csm-board App Service (Linux) │ ┌──────────────────────────┐
│ React 19 + Vite + Tailwind v4 │ ◄──WSS───┤ csm agent (PSI.CSM) │
│ Python 3.11 + FastAPI │ │ scans repos, sessions, │
│ MSAL/Entra auth │ │ fetches GHE+ADO, │
│ /home/data state │ │ spawns claude, │
│ board.progressivesurface.com │ │ device-code MSAL auth │
└─────────────────────────────────┘ └──────────────────────────┘
▲
│ HTTPS · MSAL redirect
│ (claude.ai/code in a new tab for spawned sessions)
[PSI devs]
- Federated. Each developer’s projects and sessions live on their machine. Sessions bill to their Claude subscription. Nothing leaves their box except metadata over an authenticated WSS.
- Agent ↔ board speaks newline-JSON over a single long-lived WSS. Bearer token in the upgrade URL.
- Spawn requests flow from the hosted board → agent → local
claude --remote-control→ bridge URL → back to the browser. - MCP server (
csm-board/mcp/) is a fourth client of the same REST/NDJSON API the web UI uses — a local stdio process that reuses the agent’s token cache, giving a Claude Code session full UI parity.
Auth
- App Registration:
CSM Board— Client ID9eeff376-82ba-40cf-a4b9-d2ed4970d82d(see Azure Resource Map). - Sign-in audience: PSI tenant only.
- Frontend: MSAL.js v4, SPA platform, PKCE, redirect flow,
localStoragecache. No “Sign In” button — auto-redirect on unauthenticated load, per the PSI auth standard. - Backend: validates V2 access tokens against the tenant JWKS;
audience = bare Client ID; required scope
access_as_user. - Agent: the same App Registration, with
isFallbackPublicClient: trueso it can use device-code flow on first run. Token cached to~/.claude/csm-board-token.json.
Network exposure & external-access hardening
The App Service is publicly reachable at the network layer
(publicNetworkAccess=Enabled, inbound rules = Allow-all; re-verified
2026-06-24). Identity, however, is already gated by Entra: a 2026-06-24
audit of live Conditional Access found the tenant-wide “Require MFA” policy
(All apps, browser + mobile/desktop clients) covers csm-board — it is not in
the policy’s excluded apps — so every interactive browser sign-in already
requires MFA, and the federated agent’s device-code flow completes MFA
in-browser at redemption (which is why it works today). No new CA policy and
no device-code carve-out are required — no authentication-flows policy blocks
device code in this tenant. The gate the original plan would have created
already exists.
The remaining gap addressed by ADR-0028 (issue #28) is therefore edge exposure — the open origin has no WAF or rate limiting:
- Azure Front Door Standard + WAF fronts the site, with the App Service
origin locked to Front Door (
AzureFrontDoor.Backendservice tag +X-Azure-FDIDheader) so*.azurewebsites.netis no longer openly reachable. Front Door passes the agent’s WebSocket through unchanged. - Conditional Access is already satisfied by the tenant baseline; the only optional add is an app-specific require-compliant-device policy scoped to browser sign-ins (never the device-code path).
Entra auth stays at the app layer — no edge pre-authentication (App Proxy /
Easy Auth), because the headless agent (device-code over WSS) can’t satisfy an
interactive edge challenge. Private Endpoint / publicNetworkAccess=Disabled
is not the path here: the shared asp-erp-migration-tool plan is B3
Basic (Private Endpoints unsupported, re-verified 2026-06-24) and a
fully-private board would break external access and off-network agents.
csm-board is therefore a documented public-exposure exception (see
azure-security → exceptions). Design, Bicep,
the live-posture verification script, and the operator runbook live in
csm-board/docs/ADR-0028-secure-external-access.md, csm-board/infra/
(front-door.bicep, scripts/verify-entra-posture.sh), and
csm-board/docs/runbooks/.
Stack
| Layer | Choice |
|---|---|
| Frontend | React 19 + TypeScript + Vite 6 + Tailwind v4 + @azure/msal-react v3. Visual language follows psi-design-system: green primary, light surface, Inter + JetBrains Mono, semantic status palette. Tokens mirrored into web/src/index.css @theme rather than importing ps.css directly (csm-board keeps dev-dashboard density). |
| Backend | Python 3.11 + FastAPI + uvicorn (WebSocket-native). Documented exception to PSI’s ASP.NET Core 8 standard — see csm-board/docs/IMPLEMENTATION_PLAN.md § “Exceptions to PSI standards” |
| Hosting | Azure App Service Linux, shared asp-erp-migration-tool plan (B3), PS-WEBAPPS RG, North Central US |
| Auth | Entra ID app-layer MSAL |
| DNS | Azure DNS zone progressivesurface.com; board CNAME → csm-board.azurewebsites.net |
| SSL | Wildcard cert *.progressivesurface.com from ps-certificates-kv, bound SNI |
| State | Per-user JSON under /home/data/users/<oid>.json (Azure Files-backed). Phase 5e target: Azure Table Storage |
| CI/CD | GitHub Actions on psi-internal self-hosted runner; identity-based deploy via az login --identity |
| MCP | csm-board-mcp (Python 3.11+, mcp SDK, stdio) in csm-board/mcp/; reuses the agent’s MSAL cache silent-only |
Status
| Phase | What | State |
|---|---|---|
| 1 | Repo scaffold | ✅ |
| 2 | Entra app reg + MSAL frontend | ✅ |
| 3 | Backend JWT validation + csm agent harness | ✅ |
| 4 | UI: Projects + By Attention; spawn proxy; auto-clone; org-wide GHE discovery; manual projects + todos | ✅ |
| 5a/b | Azure App Service + identity-based GHA deploy | ✅ |
| 5c | Custom domain board.progressivesurface.com + wildcard SSL | ✅ |
| 5d | External-access hardening — Conditional Access + Front Door/WAF origin-lock (ADR-0028, #28). Supersedes the old private-endpoint plan | 🚧 design done, provisioning operator-gated |
| 5e | State store → Azure Table Storage (currently /home/data JSON) | 🚧 |
| 5g | MS Planner overlay | ⏸ deferred — local Todos cover the personal use case today |
Privacy
Each developer’s data is scoped to their Entra OID server-side; one user’s projects are NOT visible to other users today. The agent never sends Claude transcript contents to the board — only metadata (project names, git state, session IDs, prompts as the user wrote them).
If a “team view” is added later (everyone sees what’s open across the team), it’ll be opt-in at the agent and per-user-data level — not a default surface.
Related
- PSI.CSM in WinGet — install / upgrade flow
- PSI webapp compliance standard
- Azure deploy patterns
- csm-board Azure resources
claude-session-managerrepo — agent + local board + TUI sourcecsm-boardrepo — hosted board source