ARGO Analytics Dashboard

Interactive analytics dashboard for PSI’s ARGO inspection machines — yield monitoring, SPC control charts, defect analysis, surface heatmaps, predictive analytics, and shift correlation across 22M+ inspected SOFC interconnect plates from 3 machines.

Overview

The ARGO Analytics Dashboard provides real-time visibility into the quality inspection process for solid oxide fuel cell (SOFC) interconnect plates. Three ARGO machines (ARGO4, ARGO5, ARGO6) collectively inspect ~38,000 plates per day across 382 measurement dimensions including surface curvature, thickness, defect detection, and coating quality. The dashboard replaces manual CSV spot-checking with interactive visualizations, statistical process control, and machine learning-driven insights.

Feature	Description
Production URL	https://argo.progressivesurface.com
Backend	Python FastAPI + DuckDB on Azure Linux VM
Authentication	Azure AD (PSI credentials) — planned
Hosting	Azure VM `ps-argo-etl` (B2as_v2, Ubuntu 24.04) behind nginx with SSL
Repository	ProgressiveSurface/argo-analytics
Data Source	Azure File Share `psargostorage/argodatastore` (ARGO4, ARGO5, ARGO6)
Data Volume	22M parts across 3 machines, 2,500 daily stat files, May 2023–present
Access	VPN/onsite only (private IP `10.160.0.20`)

Features

Dashboard Pages

Page	Route	Description
Executive Overview	`/`	KPI cards (yield, throughput, errors), yield trend chart, defect Pareto, hourly yield heatmap
Time-Series & SPC	`/spc`	X-bar control charts with Western Electric rules, Cp/Cpk capability, parameter selector
Batch Analysis	`/batch`	Batch comparison table (color-coded yield), recipe timeline
Defect Deep Dive	`/defects`	Failure mode breakdown, defects by hour (shift patterns), defects by batch
Part Lookup	`/parts`	Search by part/batch, full measurement profile, 57x57 surface heatmaps (canvas-rendered)
Predictive Analytics	`/predictive`	Anomaly detection (Isolation Forest), feature importance (Random Forest), yield forecast
Shift & Labor	`/shifts`	Day vs Night yield comparison, 24-hour yield profile with shift boundary markers
Multi-Machine	`/machines`	Cross-machine yield comparison, defect comparison, per-machine KPI cards

All pages include a machine picker (ARGO4, ARGO5, ARGO6, or All) and date range picker (presets: 7d, 30d, 90d, or custom).

Machines

Machine	Parquet Files	Date Range	Parts (total)	Notes
ARGO4	872	Oct 2022–present	~8M	Legacy `Daily stats dump/` path, some schema errors in old files
ARGO5	854	Feb 2023–present	~7.6M	Primary machine, cleanest data
ARGO6	773	Jul 2023–present	~6.5M	Lowest yield (~86%) — under investigation

ETL Pipeline

Parses 382-column DailyStats CSVs with latin-1 encoding and schema version detection (v1–v4)
Writes compressed Parquet files (5:1 compression ratio) partitioned by machine and date
Builds DuckDB materialized aggregation tables for fast dashboard queries
Reads 57x57 surface grid files on-demand for Part Lookup
Runs every 4 hours via cron on the ETL VM
union_by_name handles schema mismatches across machines and time periods

Analytics

SPC: X-bar/R control charts with Western Electric sensitizing rules (all 4 rules), Cp/Cpk process capability
PCA: Principal Component Analysis on failed parts for multivariate defect clustering
Anomaly Detection: Isolation Forest on 6 core measurement columns, flags unusual parts even if they pass individual checks
Feature Importance: Random Forest classifier identifies which measurements most predict pass/fail (curvature = 55% of predictive power)
Yield Prediction: Trend-based forecast using weighted recent hourly data

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                         AZURE CLOUD                                     │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ┌─────────────────────────────────────────┐                           │
│  │  Azure File Share (psargostorage)        │                           │
│  │  ├── ARGO4/Daily stats dump/*.txt        │  Raw CSVs from machines   │
│  │  ├── ARGO5/DailyStats/*.txt              │  382 cols, ~33MB/day each │
│  │  ├── ARGO5/Curvature57/**/*.csv          │  57x57 grids, ~39KB each │
│  │  ├── ARGO5/Thickness57/**/*.csv          │  57x57 grids, ~29KB each │
│  │  ├── ARGO6/DailyStats/*.txt              │                           │
│  │  └── argo-analytics-data/parquet/        │  Processed Parquet files  │
│  │      ├── argo4/daily_stats/date=*/       │  872 files                │
│  │      ├── argo5/daily_stats/date=*/       │  854 files                │
│  │      └── argo6/daily_stats/date=*/       │  773 files                │
│  └──────────────────┬──────────────────────┘                           │
│                     │ CIFS mount (/mnt/argodatastore)                   │
│                     ▼                                                   │
│  ┌─────────────────────────────────────────┐                           │
│  │  VM: ps-argo-etl (10.160.0.20)          │                           │
│  │  Ubuntu 24.04, B2as_v2 (2 vCPU, 8GB)   │                           │
│  │                                          │                           │
│  │  ┌──────────────────────────────────┐   │                           │
│  │  │ ETL Pipeline (cron, every 4 hrs) │   │                           │
│  │  │ Python 3.12 + Polars + PyArrow   │   │                           │
│  │  │ Reads CSVs → Writes Parquet      │   │                           │
│  │  │ Rebuilds DuckDB cache            │   │                           │
│  │  └──────────────────────────────────┘   │                           │
│  │                                          │                           │
│  │  ┌──────────────────────────────────┐   │                           │
│  │  │ DuckDB (local disk, ~25MB)       │   │                           │
│  │  │ 22M rows, 16 columns             │   │                           │
│  │  │ Materialized: daily_yield,        │   │                           │
│  │  │   hourly_yield, defect_pareto,    │   │                           │
│  │  │   batch_summary                   │   │                           │
│  │  └──────────────────────────────────┘   │                           │
│  │                                          │                           │
│  │  ┌──────────────────────────────────┐   │                           │
│  │  │ FastAPI (uvicorn, port 8000)     │   │                           │
│  │  │ 28 REST API endpoints             │   │                           │
│  │  │ DuckDB read-only mode             │   │                           │
│  │  │ scikit-learn analytics modules    │   │                           │
│  │  │ Serves React frontend (dist/)     │   │                           │
│  │  └──────────────────────────────────┘   │                           │
│  │                                          │                           │
│  │  ┌──────────────────────────────────┐   │                           │
│  │  │ nginx (port 80/443)              │   │                           │
│  │  │ SSL: *.progressivesurface.com    │   │                           │
│  │  │ Reverse proxy → localhost:8000   │   │                           │
│  │  │ HTTP → HTTPS redirect            │   │                           │
│  │  └──────────────────────────────────┘   │                           │
│  └──────────────────┬──────────────────────┘                           │
│                     │                                                   │
│  ┌──────────────────┴──────────────────────┐                           │
│  │  DNS: argo.progressivesurface.com        │                           │
│  │  A record → 10.160.0.20                  │                           │
│  │  SSL: Wildcard cert from ps-certificates-kv                         │
│  └─────────────────────────────────────────┘                           │
│                                                                         │
└──────────────────────────────┬──────────────────────────────────────────┘
                               │ VPN / Site-to-Site
┌──────────────────────────────┴──────────────────────────────────────────┐
│  PSI On-Premises / VPN Users                                            │
│  Browser → https://argo.progressivesurface.com                          │
└─────────────────────────────────────────────────────────────────────────┘

Technology Stack

Component	Technology
ETL	Python 3.12, Polars, PyArrow
Data Lake	Apache Parquet (Snappy compression) on Azure File Share
Query Engine	DuckDB 1.5 (embedded, columnar, read-only for API)
Backend	FastAPI 0.115, Pydantic, uvicorn
Analytics	scikit-learn (Random Forest, Isolation Forest, PCA), scipy
Frontend	React 18, TypeScript, Vite 5, TailwindCSS 3
Charts	Recharts 2.12 (SVG), HTML Canvas (57x57 heatmaps)
State	Zustand 4 (filters/machine), TanStack React Query 5 (data)
Auth	Azure AD / MSAL (planned)
Hosting	Azure VM `ps-argo-etl` (B2as_v2, Ubuntu 24.04)
Reverse Proxy	nginx 1.24 with wildcard SSL
CI/CD	GitHub Actions on `[self-hosted, psi-internal]` runner

Key Architecture Decisions

Decision	Rationale
VM instead of App Service	VM has CIFS mount access to Azure File Share; App Service cannot mount file shares for ETL reads
DuckDB on local disk	CIFS-mounted DuckDB files have locking issues; local disk eliminates lock conflicts
DuckDB read-only for API	Prevents lock conflicts between ETL writes and API reads
Parquet on file share	Persistent storage survives VM restarts; shareable across services
`union_by_name` in DuckDB	ARGO4’s older Parquet files have different schemas; DuckDB reconciles automatically
Single uvicorn worker	DuckDB’s single-writer model conflicts with gunicorn’s multi-process workers on network storage
Symlinks for Parquet	`/opt/argo-analytics/data/parquet/{machine}` → file share; keeps config simple

Data Sources

DailyStats (382 Columns)

Each row is one inspected part. The 382 columns are organized into 17 groups:

Group	Columns	Description
Identity	1–6	Part number, timestamp, result, barcode
Pass/Fail Flags	7–21	Individual quality gate results (`+`/`-` for 15 defect types)
Plenum Depth	22–24	Plenum depth results
IC Geometry	25–33	Interconnect dimensional checks
Inspection Status	34–40	System health (timing, image acquisition, belt stop)
Defect Counts	41–60	Counts and dimensions of detected defects
Curvature Top	61–112	50 measurement points + summary (µm)
Curvature Bottom	113–164	50 measurement points + summary (µm)
S7 Topology	165–182	Regional flatness metrics (Air/Fuel sides)
Thickness	183–255	50 measurement points + averages (µm)
A² / Inertia	256–296	Structural stiffness measurements
Plenum Depth Stats	297–302	Per-area average, min, max
IC Dimensions	303–325	Hole positions, diameters, symmetry (mm)
Production Meta	326–329	Recipe, batch, cage, powder lot
Coating Quality	330–350	Stddev/offset of coating registration per region
System Validation	351–372	3D rotation, temperatures (7 sensors), eval times
Edge/Calibration	373–382	Calibration offsets, z-score alarms

Schema evolved from 375 columns (v1, 2023) to 382 (v4, 2026). The ETL auto-detects version by column count and normalizes to the canonical v4 schema.

Data Locations

Machine	Raw Data Path	Parquet Output
ARGO4	`/mnt/argodatastore/ARGO4/Daily stats dump/`	`parquet/argo4/daily_stats/date=*/`
ARGO5	`/mnt/argodatastore/ARGO5/DailyStats/`	`parquet/argo5/daily_stats/date=*/`
ARGO6	`/mnt/argodatastore/ARGO6/DailyStats/`	`parquet/argo6/daily_stats/date=*/`

Note: ARGO4 uses the legacy directory name Daily stats dump/ (with spaces).

Per-Part Surface Grids (57x57)

Each part has three high-resolution surface scan files (ARGO5 only currently):

Grid Type	Location	Values	Range
Curvature Top	`Curvature57/{date}/CurvatureTop/`	Surface height (µm)	~4800–5100
Curvature Bottom	`Curvature57/{date}/CurvatureBottom/`	Surface height (µm)	~4800–5100
Thickness	`Thickness57/{date}/Thickness/`	Plate thickness (mm)	~1.3–2.1

Filename pattern: N000{PartNo}BID{BatchID}CID{CageID}D{YYYYMMDD}U{HHMMSS}.csv

External Data

Source	Table	Description	Status
Azure SQL `PSI_Analytics`	`labor_detail`	Timesheet data (1.37M rows)	Planned
Azure SQL `PSI_Analytics`	`tslabor2`	All timesheets (5.25M rows)	Planned

Currently shift analysis uses simulated shift windows (Day 06:00–17:59, Night 18:00–05:59) derived from ARGO inspection timestamps.

API Endpoints

All endpoints accept an optional machine query parameter (argo4, argo5, argo6, or all; default: all).

Overview (`/api/overview`)

Method	Endpoint	Description
GET	`/kpi?date_start&date_end&machine`	Aggregate KPIs (yield, throughput, errors)
GET	`/yield-trend?date_start&date_end&machine`	Daily yield time series
GET	`/defect-pareto?date_start&date_end&machine`	Defect type counts ranked
GET	`/hourly-heatmap?date_start&date_end&machine`	Yield by date x hour grid
GET	`/throughput?date_start&date_end&machine`	Parts per hour by date

SPC (`/api/spc`)

Method	Endpoint	Description
GET	`/parameters`	Available measurement parameters for SPC
GET	`/control-chart?parameter&date_start&date_end&subgroup_size&machine`	X-bar/R chart with WE rules
GET	`/cpk?parameter&date_start&date_end&usl&lsl&machine`	Process capability indices

Batch (`/api/batch`)

Method	Endpoint	Description
GET	`/list?date_start&date_end&machine`	Batch summaries with yield and measurements
GET	`/{batch_no}/detail`	Single batch with percentile distributions
POST	`/compare`	Side-by-side batch comparison
GET	`/recipe-timeline?date_start&date_end&machine`	Recipe changes over time

Defect (`/api/defect`)

Method	Endpoint	Description
GET	`/failure-modes?date_start&date_end&machine`	Failure type stats with trends
GET	`/by-hour?date_start&date_end&machine`	Defect counts by hour of day
GET	`/by-batch?date_start&date_end&machine`	Defect counts by batch

Part Lookup (`/api/part`)

Method	Endpoint	Description
GET	`/search?q&limit&machine`	Search by part number or batch ID
GET	`/{part_no}`	Full measurement profile
GET	`/{part_no}/grid/{grid_type}`	57x57 surface grid data (on-demand read)
GET	`/{part_no}/neighbors?n`	Adjacent parts (±n sequential)

Predictive (`/api/predictive`)

Method	Endpoint	Description
GET	`/anomalies?date_start&date_end&contamination&machine`	Isolation Forest anomaly detection
GET	`/feature-importance?date_start&date_end&machine`	Random Forest feature importance
GET	`/yield-forecast?date_start&date_end&machine`	Trend-based yield prediction

Labor/Shift (`/api/labor`)

Method	Endpoint	Description
GET	`/shift-yield?date_start&date_end&machine`	Yield by Day/Night shift
GET	`/hourly-profile?date_start&date_end&machine`	Average yield by hour (0–23)
GET	`/day-vs-night?date_start&date_end&machine`	Day vs Night aggregate comparison

Cross-Machine Comparison (`/api/compare`)

Method	Endpoint	Description
GET	`/summary?date_start&date_end`	One-row aggregate per machine
GET	`/yield?date_start&date_end`	Yield by machine by date
GET	`/defects?date_start&date_end`	Top defects per machine
GET	`/throughput?date_start&date_end`	Parts per hour per machine

Meta (`/api`)

Method	Endpoint	Description
GET	`/health`	System health, DuckDB status, row counts
GET	`/meta/machines`	List of available machine identifiers

VM Infrastructure

VM: `ps-argo-etl`

Property	Value
Name	ps-argo-etl
Resource Group	PS-RG-01
Size	Standard_B2as_v2 (2 vCPU, 8GB RAM)
OS	Ubuntu 24.04 LTS
Private IP	10.160.0.20 (PS-SERVERS subnet)
Disk	64GB OS disk

File System Layout

/opt/argo-analytics/                    # Application root (git repo)
├── backend/                            # Python backend + ETL
│   ├── .venv/                          # Python virtual environment
│   ├── .env                            # Environment config
│   ├── etl/                            # ETL pipeline modules
│   ├── api/                            # FastAPI application
│   └── analytics/                      # ML/SPC modules
├── frontend/dist/                      # Built React frontend
├── data/
│   ├── duckdb/argo.duckdb              # DuckDB cache (~25MB, LOCAL disk)
│   └── parquet/                        # Symlinks to file share:
│       ├── argo4 → /mnt/argodatastore/argo-analytics-data/parquet/argo4
│       ├── argo5 → /mnt/argodatastore/argo-analytics-data/parquet/argo5
│       └── argo6 → /mnt/argodatastore/argo-analytics-data/parquet/argo6
├── run-etl.sh                          # ETL execution script (cron)
└── backfill-a4a6.py                    # Multi-machine backfill script

/mnt/argodatastore/                     # Azure File Share (CIFS mount, persistent)
├── ARGO4/Daily stats dump/             # Raw CSVs from ARGO4
├── ARGO5/DailyStats/                   # Raw CSVs from ARGO5
├── ARGO5/Curvature57/                  # 57x57 surface grids
├── ARGO5/Thickness57/                  # 57x57 thickness grids
├── ARGO6/DailyStats/                   # Raw CSVs from ARGO6
└── argo-analytics-data/parquet/        # Processed Parquet output

Services

Service	Type	Config
`argo-api`	systemd	`/etc/systemd/system/argo-api.service` — uvicorn on port 8000, auto-restart
nginx	systemd	Reverse proxy 80/443 → 8000, SSL with wildcard cert
ETL cron	crontab	`0 /4 * *` — runs `/opt/argo-analytics/run-etl.sh`

ETL Cron Script (`run-etl.sh`)

#!/bin/bash
cd /opt/argo-analytics/backend
export PATH="/opt/argo-analytics/backend/.venv/bin:$PATH"
 
# Backfill last 2 days (catches today + late-arriving yesterday data)
python3 -m etl.backfill --days 2
 
# Rebuild DuckDB cache from all machine Parquet files
python3 -m etl.duckdb_cache
 
# Restart API to pick up fresh DuckDB (read_only mode needs reconnect)
systemctl restart argo-api

Credentials stored at /etc/smbcredentials/psargostorage.cred. Persistent mount via /etc/fstab:

//psargostorage.file.core.windows.net/argodatastore /mnt/argodatastore cifs nofail,credentials=/etc/smbcredentials/psargostorage.cred,dir_mode=0755,file_mode=0644,serverino,nosharesock,actimeo=30

Project Structure

argo-analytics/
├── CLAUDE.md                           # Claude Code instructions
├── .env.example                        # Environment variable template
├── startup.sh                          # Azure App Service startup (legacy)
├── requirements.txt                    # Root requirements (→ backend)
├── deploy.sh                           # Azure resource creation script
├── run-etl.sh                          # ETL cron execution script
├── .github/workflows/deploy.yml        # CI/CD via self-hosted runner
│
├── backend/
│   ├── pyproject.toml                  # Python project metadata + deps
│   ├── requirements.txt                # Pinned production dependencies
│   ├── etl/                            # ETL Pipeline
│   │   ├── config.py                   # Paths, constants, env loading
│   │   ├── schema.py                   # 382-column canonical schema (v1–v4)
│   │   ├── daily_stats_parser.py       # CSV → normalized DataFrame
│   │   ├── parquet_writer.py           # DataFrame → partitioned Parquet
│   │   ├── duckdb_cache.py             # Multi-machine DuckDB aggregation
│   │   ├── grid_parser.py              # 57x57 surface grid CSV reader
│   │   ├── file_scanner.py             # New file discovery
│   │   ├── backfill.py                 # Historical ingest CLI
│   │   └── logging_config.py           # Structured logging setup
│   ├── api/                            # FastAPI Application
│   │   ├── main.py                     # App entry, lifespan, SPA serving
│   │   ├── deps.py                     # DuckDB + machine validation helpers
│   │   ├── models.py                   # Pydantic response models
│   │   └── routers/
│   │       ├── health.py               # /api/health + /api/meta/machines
│   │       ├── overview.py             # /api/overview/* (5 endpoints)
│   │       ├── timeseries.py           # /api/spc/* (3 endpoints)
│   │       ├── batch.py                # /api/batch/* (4 endpoints)
│   │       ├── defect.py               # /api/defect/* (3 endpoints)
│   │       ├── part_lookup.py          # /api/part/* (4 endpoints)
│   │       ├── predictive.py           # /api/predictive/* (3 endpoints)
│   │       ├── labor.py                # /api/labor/* (3 endpoints)
│   │       ├── compare.py              # /api/compare/* (4 endpoints)
│   │       └── analytics.py            # /api/analytics/pca
│   └── analytics/                      # ML/Statistics Modules
│       ├── spc.py                      # Control charts, Cp/Cpk, WE rules
│       ├── pca.py                      # PCA dimensionality reduction
│       ├── anomaly.py                  # Isolation Forest
│       ├── prediction.py               # Yield forecasting
│       └── feature_importance.py       # Random Forest importances
│
├── frontend/
│   ├── package.json
│   ├── vite.config.ts                  # Dev proxy /api → :8000
│   ├── tailwind.config.js              # PSI brand colors
│   ├── playwright.config.ts            # E2E test config
│   ├── e2e/                            # 12 Playwright E2E tests
│   └── src/
│       ├── main.tsx                    # React root + QueryClient
│       ├── App.tsx                     # Routes (8 pages)
│       ├── api/client.ts              # fetchApi wrapper
│       ├── stores/filterStore.ts      # Zustand: date range + machine
│       ├── hooks/                      # 8 TanStack React Query hook files
│       ├── components/
│       │   ├── layout/                 # AppLayout, Sidebar, PageHeader
│       │   ├── shared/                 # KpiCard, DateRangePicker, MachinePicker
│       │   └── charts/                 # 15 chart components (Recharts + Canvas)
│       ├── pages/                      # 8 route pages
│       └── types/                      # TypeScript interfaces
│
└── data/                               # Data lake (gitignored, on VM)
    ├── parquet/argo{4,5,6} → symlinks  # → Azure File Share
    └── duckdb/argo.duckdb              # ~25MB local aggregation cache

Development

Prerequisites

Python 3.11+
Node.js 20+
Access to W:\ARGO5\ network share (maps to psargostorage/argodatastore)
Azure CLI (for deployment)

Local Development

# Clone
git clone https://progressivesurface.ghe.com/ProgressiveSurface/argo-analytics.git
cd argo-analytics
 
# Backend
cd backend
python -m venv .venv
.venv/Scripts/activate      # Windows
pip install -e ".[dev]"
 
# Initial data backfill (90 days, ARGO5 only)
python -m etl.backfill --days 90
 
# Refresh DuckDB cache
python -m etl.duckdb_cache --machines argo5
 
# Start API server
uvicorn api.main:app --reload --port 8000
 
# Frontend (separate terminal)
cd frontend
npm install
npm run dev                 # Vite dev server on :5173, proxies /api → :8000

Environment Variables

# Data paths (local dev)
ARGO5_DATA_ROOT=W:/ARGO5
DATA_LAKE_ROOT=C:/GIT/argo-analytics/data
 
# Data paths (VM production)
# ARGO5_DATA_ROOT=/mnt/argodatastore/ARGO5
# DATA_LAKE_ROOT=/opt/argo-analytics/data
 
# Logging
LOG_LEVEL=INFO

Running Tests

# Frontend E2E (requires both servers running)
cd frontend
npx playwright install chromium
npm run test:e2e

Deployment

VM Deployment (Production)

The app runs on ps-argo-etl (10.160.0.20). To update:

# SSH or use az vm run-command
cd /opt/argo-analytics
git pull origin main
 
# Rebuild frontend
cd frontend && npm run build
 
# Restart API
sudo systemctl restart argo-api

CI/CD Pipeline

GitHub Actions workflow (.github/workflows/deploy.yml):

Triggered on push to main (path-filtered)
Runs on [self-hosted, psi-internal] runner
Builds React frontend
Packages backend + frontend dist into zip
Deploys via az webapp deployment source config-zip (App Service) or direct git pull (VM)

Azure Resources

Resource	Name	IP / Config
VM	`ps-argo-etl`	10.160.0.20, B2as_v2, Ubuntu 24.04
Storage	`psargostorage`	File share `argodatastore`
DNS	`argo.progressivesurface.com`	A record → 10.160.0.20
SSL	Wildcard cert	From `ps-certificates-kv`
App Service	`ps-argo-analytics`	Created but unused (VM serves instead)
Private Endpoint	`ps-argo-analytics-pe`	10.160.140.16 (App Service, unused)

How to Use the Dashboard

Getting Started

Connect to the PSI VPN
Navigate to https://argo.progressivesurface.com
The Executive Overview page loads by default showing the last 30 days across all machines

Navigating

The left sidebar provides access to all 8 pages. Every page has:

Machine picker (top right) — filter to ARGO4, ARGO5, ARGO6, or All Machines
Date range picker — presets (7d, 30d, 90d) or custom start/end dates
Changing either filter instantly refreshes all charts on the page

Common Tasks

“What’s our yield today?” Go to Executive Overview (/). The KPI cards at the top show total parts, yield %, parts/hour, and inspection errors for the selected date range and machine.

“Which defect is causing the most rejects?” Go to Executive Overview → Defect Pareto chart (bottom left). Bars are ranked by frequency. Highpoints is typically the dominant defect. For more detail, go to Defect Deep Dive (/defects) to see defects by hour and by batch.

“Is one machine worse than the others?” Go to Multi-Machine (/machines). The summary cards show yield per machine. The yield timeline chart shows daily trends side-by-side. ARGO6 currently has the lowest yield (~86%).

“Did something change on a specific shift?” Go to Shift & Labor (/shifts). The hourly profile chart shows average yield for each hour (0–23) with shift boundaries marked at 06:00 and 18:00. The Day vs Night KPI cards show the aggregate comparison.

“What does a specific part look like?” Go to Part Lookup (/parts). Type a part number (minimum 3 digits) in the search bar. Click a result to see the full measurement profile and three 57x57 surface heatmaps (curvature top, curvature bottom, thickness). The neighbors table shows surrounding parts — if many neighbors also failed, it suggests a batch-level issue.

“Is the process in control?” Go to Time-Series & SPC (/spc). Select a measurement parameter (Curvature Top, Thickness, etc.) from the dropdown. The X-bar control chart shows subgroup means with UCL/LCL limits. Red dots indicate Western Electric rule violations. Enter USL and LSL values to compute Cp/Cpk process capability.

“Which batches are problematic?” Go to Batch Analysis (/batch). The table shows all batches sorted by yield (worst first). Red badges indicate yield below 85%, yellow for 85–92%, green for 92%+. Click column headers to sort.

“What predicts whether a part will pass or fail?” Go to Predictive Analytics (/predictive). The Feature Importance chart shows which measurements matter most (curvature = 55% of predictive power). The Anomaly Timeline shows parts flagged by the Isolation Forest model. The yield forecast card shows the predicted next-period yield.

Operations Guide

Access Control

Operational access is managed through the ARGO Analytics Admins Entra ID security group. Members get:

VM Contributor on ps-argo-etl (start/stop, run commands)
Reader on psargostorage storage account
Reader on PS-RG-01 resource group

To add a new team member: add them to the ARGO Analytics Admins group in Entra ID. They also need collaborator access on the GHE repo.

Current Members	UPN
Adam Devereaux	ADevereaux@progressivesurface.com
Dakota Cooper	DCooper@progressivesurface.com

Managing the VM

All management is done via az vm run-command (no SSH configured):

# Run a command on the VM
az vm run-command invoke \
  --resource-group PS-RG-01 \
  --name ps-argo-etl \
  --command-id RunShellScript \
  --scripts "your-command-here"
 
# Check service status
az vm run-command invoke --resource-group PS-RG-01 --name ps-argo-etl \
  --command-id RunShellScript \
  --scripts "systemctl status argo-api --no-pager && systemctl status nginx --no-pager"
 
# View recent API logs
az vm run-command invoke --resource-group PS-RG-01 --name ps-argo-etl \
  --command-id RunShellScript \
  --scripts "journalctl -u argo-api --no-pager -n 30"
 
# View ETL cron logs
az vm run-command invoke --resource-group PS-RG-01 --name ps-argo-etl \
  --command-id RunShellScript \
  --scripts "journalctl -t argo-etl --no-pager -n 20"
 
# Check data status
az vm run-command invoke --resource-group PS-RG-01 --name ps-argo-etl \
  --command-id RunShellScript \
  --scripts "curl -s http://localhost:8000/api/health"

Important: Only one az vm run-command can execute at a time per VM. If you get a “Conflict” error, a previous command is still running — wait and retry.

Deploying Code Updates

# Pull latest code and rebuild (via run-command)
az vm run-command invoke --resource-group PS-RG-01 --name ps-argo-etl \
  --command-id RunShellScript \
  --scripts '#!/bin/bash
    export PATH="/opt/argo-analytics/backend/.venv/bin:$PATH"
    cd /opt/argo-analytics && git pull origin main
    cd frontend && npm run build
    sudo systemctl restart argo-api
    sleep 3 && curl -s http://localhost:8000/api/health'

Running a Manual ETL

az vm run-command invoke --resource-group PS-RG-01 --name ps-argo-etl \
  --command-id RunShellScript \
  --scripts '#!/bin/bash
    export PATH="/opt/argo-analytics/backend/.venv/bin:$PATH"
    cd /opt/argo-analytics/backend
    python3 -m etl.backfill --days 2
    systemctl stop argo-api
    python3 -m etl.duckdb_cache
    systemctl start argo-api'

Adding a New ARGO Machine

Verify the machine’s data exists on the Azure file share (/mnt/argodatastore/ARGO{N}/)
Create symlink on the VM: ln -sfn /mnt/argodatastore/argo-analytics-data/parquet/argo{n} /opt/argo-analytics/data/parquet/argo{n}
Run backfill: ARGO5_DATA_ROOT=/mnt/argodatastore/ARGO{N} python3 -m etl.backfill --days 9999 --machine argo{n}
Add argo{n} to VALID_MACHINES in backend/api/deps.py
Rebuild DuckDB: python3 -m etl.duckdb_cache --machines argo4 argo5 argo6 argo{n}
Update the ETL cron script to include the new machine
Restart the API

Troubleshooting

Symptom	Likely Cause	Fix
Dashboard shows no data	DuckDB cache empty or API not running	Check `systemctl status argo-api` and `curl localhost:8000/api/health`
”500 Internal Server Error” on page load	DuckDB lock conflict during ETL	Wait for ETL to finish, or restart API: `systemctl restart argo-api`
ETL cron not running	Cron job lost after VM restart	Verify: `crontab -l`, re-add if missing
File share not mounted	VM restarted, mount failed	Check: `mount \| grep argodatastore`, re-mount: `mount /mnt/argodatastore`
Yield shows 0% for a machine	No Parquet files for that machine	Check: `find /mnt/argodatastore/argo-analytics-data/parquet/{machine} -name '*.parquet' \| wc -l`
”machine not found” error	Machine not in VALID_MACHINES	Add to `backend/api/deps.py` and redeploy
Old data (>4 hours stale)	Cron job failed or stuck	Check ETL logs: `journalctl -t argo-etl -n 20`
VM unresponsive	Out of memory (DuckDB rebuild with too many machines)	Restart VM: `az vm restart --resource-group PS-RG-01 --name ps-argo-etl`

Key Findings

Analysis of 22M parts across 3 machines (May 2023–March 2026):

Finding	Detail
Total parts	22,047,056 across ARGO4/5/6
Best machine	ARGO4 (90.17% yield)
Worst machine	ARGO6 (86.08% yield) — 4 points below ARGO4
Top defect	Highpoints (dominant across all machines)
Predictive power	Curvature measurements = 55% of pass/fail prediction
Shift effect	Day shift ~1% better than Night shift
Worst hour	Hour 23 (86.0%) — end of night shift
Best hour	Hour 13 (93.1%) — mid-day shift
Schema evolution	375 cols (v1, 2023) → 382 cols (v4, 2026)

PSI Data Brain — Master data reference including labor_detail tables
Terminology — PSI glossary (Redbook, AFTEC, BOM, etc.)
Deploy to Azure — PSI deployment guide and checklist
Azure Resources — VM, storage, DNS inventory

Last updated: March 2026

PSI Knowledge Base

Explorer

argo-analytics