ARGO Analytics Dashboard

Interactive analytics dashboard for PSI’s ARGO inspection machines — yield monitoring, SPC control charts, defect analysis, surface heatmaps, predictive analytics, and shift correlation across 22M+ inspected SOFC interconnect plates from 3 machines.


Overview

The ARGO Analytics Dashboard provides real-time visibility into the quality inspection process for solid oxide fuel cell (SOFC) interconnect plates. Three ARGO machines (ARGO4, ARGO5, ARGO6) collectively inspect ~38,000 plates per day across 382 measurement dimensions including surface curvature, thickness, defect detection, and coating quality. The dashboard replaces manual CSV spot-checking with interactive visualizations, statistical process control, and machine learning-driven insights.

FeatureDescription
Production URLhttps://argo.progressivesurface.com
BackendPython FastAPI + DuckDB on Azure Linux VM
AuthenticationAzure AD (PSI credentials) — planned
HostingAzure VM ps-argo-etl (B2as_v2, Ubuntu 24.04) behind nginx with SSL
RepositoryProgressiveSurface/argo-analytics
Data SourceAzure File Share psargostorage/argodatastore (ARGO4, ARGO5, ARGO6)
Data Volume22M parts across 3 machines, 2,500 daily stat files, May 2023–present
AccessVPN/onsite only (private IP 10.160.0.20)

Features

Dashboard Pages

PageRouteDescription
Executive Overview/KPI cards (yield, throughput, errors), yield trend chart, defect Pareto, hourly yield heatmap
Time-Series & SPC/spcX-bar control charts with Western Electric rules, Cp/Cpk capability, parameter selector
Batch Analysis/batchBatch comparison table (color-coded yield), recipe timeline
Defect Deep Dive/defectsFailure mode breakdown, defects by hour (shift patterns), defects by batch
Part Lookup/partsSearch by part/batch, full measurement profile, 57x57 surface heatmaps (canvas-rendered)
Predictive Analytics/predictiveAnomaly detection (Isolation Forest), feature importance (Random Forest), yield forecast
Shift & Labor/shiftsDay vs Night yield comparison, 24-hour yield profile with shift boundary markers
Multi-Machine/machinesCross-machine yield comparison, defect comparison, per-machine KPI cards

All pages include a machine picker (ARGO4, ARGO5, ARGO6, or All) and date range picker (presets: 7d, 30d, 90d, or custom).

Machines

MachineParquet FilesDate RangeParts (total)Notes
ARGO4872Oct 2022–present~8MLegacy Daily stats dump/ path, some schema errors in old files
ARGO5854Feb 2023–present~7.6MPrimary machine, cleanest data
ARGO6773Jul 2023–present~6.5MLowest yield (~86%) — under investigation

ETL Pipeline

  • Parses 382-column DailyStats CSVs with latin-1 encoding and schema version detection (v1–v4)
  • Writes compressed Parquet files (5:1 compression ratio) partitioned by machine and date
  • Builds DuckDB materialized aggregation tables for fast dashboard queries
  • Reads 57x57 surface grid files on-demand for Part Lookup
  • Runs every 4 hours via cron on the ETL VM
  • union_by_name handles schema mismatches across machines and time periods

Analytics

  • SPC: X-bar/R control charts with Western Electric sensitizing rules (all 4 rules), Cp/Cpk process capability
  • PCA: Principal Component Analysis on failed parts for multivariate defect clustering
  • Anomaly Detection: Isolation Forest on 6 core measurement columns, flags unusual parts even if they pass individual checks
  • Feature Importance: Random Forest classifier identifies which measurements most predict pass/fail (curvature = 55% of predictive power)
  • Yield Prediction: Trend-based forecast using weighted recent hourly data

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                         AZURE CLOUD                                     │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ┌─────────────────────────────────────────┐                           │
│  │  Azure File Share (psargostorage)        │                           │
│  │  ├── ARGO4/Daily stats dump/*.txt        │  Raw CSVs from machines   │
│  │  ├── ARGO5/DailyStats/*.txt              │  382 cols, ~33MB/day each │
│  │  ├── ARGO5/Curvature57/**/*.csv          │  57x57 grids, ~39KB each │
│  │  ├── ARGO5/Thickness57/**/*.csv          │  57x57 grids, ~29KB each │
│  │  ├── ARGO6/DailyStats/*.txt              │                           │
│  │  └── argo-analytics-data/parquet/        │  Processed Parquet files  │
│  │      ├── argo4/daily_stats/date=*/       │  872 files                │
│  │      ├── argo5/daily_stats/date=*/       │  854 files                │
│  │      └── argo6/daily_stats/date=*/       │  773 files                │
│  └──────────────────┬──────────────────────┘                           │
│                     │ CIFS mount (/mnt/argodatastore)                   │
│                     ▼                                                   │
│  ┌─────────────────────────────────────────┐                           │
│  │  VM: ps-argo-etl (10.160.0.20)          │                           │
│  │  Ubuntu 24.04, B2as_v2 (2 vCPU, 8GB)   │                           │
│  │                                          │                           │
│  │  ┌──────────────────────────────────┐   │                           │
│  │  │ ETL Pipeline (cron, every 4 hrs) │   │                           │
│  │  │ Python 3.12 + Polars + PyArrow   │   │                           │
│  │  │ Reads CSVs → Writes Parquet      │   │                           │
│  │  │ Rebuilds DuckDB cache            │   │                           │
│  │  └──────────────────────────────────┘   │                           │
│  │                                          │                           │
│  │  ┌──────────────────────────────────┐   │                           │
│  │  │ DuckDB (local disk, ~25MB)       │   │                           │
│  │  │ 22M rows, 16 columns             │   │                           │
│  │  │ Materialized: daily_yield,        │   │                           │
│  │  │   hourly_yield, defect_pareto,    │   │                           │
│  │  │   batch_summary                   │   │                           │
│  │  └──────────────────────────────────┘   │                           │
│  │                                          │                           │
│  │  ┌──────────────────────────────────┐   │                           │
│  │  │ FastAPI (uvicorn, port 8000)     │   │                           │
│  │  │ 28 REST API endpoints             │   │                           │
│  │  │ DuckDB read-only mode             │   │                           │
│  │  │ scikit-learn analytics modules    │   │                           │
│  │  │ Serves React frontend (dist/)     │   │                           │
│  │  └──────────────────────────────────┘   │                           │
│  │                                          │                           │
│  │  ┌──────────────────────────────────┐   │                           │
│  │  │ nginx (port 80/443)              │   │                           │
│  │  │ SSL: *.progressivesurface.com    │   │                           │
│  │  │ Reverse proxy → localhost:8000   │   │                           │
│  │  │ HTTP → HTTPS redirect            │   │                           │
│  │  └──────────────────────────────────┘   │                           │
│  └──────────────────┬──────────────────────┘                           │
│                     │                                                   │
│  ┌──────────────────┴──────────────────────┐                           │
│  │  DNS: argo.progressivesurface.com        │                           │
│  │  A record → 10.160.0.20                  │                           │
│  │  SSL: Wildcard cert from ps-certificates-kv                         │
│  └─────────────────────────────────────────┘                           │
│                                                                         │
└──────────────────────────────┬──────────────────────────────────────────┘
                               │ VPN / Site-to-Site
┌──────────────────────────────┴──────────────────────────────────────────┐
│  PSI On-Premises / VPN Users                                            │
│  Browser → https://argo.progressivesurface.com                          │
└─────────────────────────────────────────────────────────────────────────┘

Technology Stack

ComponentTechnology
ETLPython 3.12, Polars, PyArrow
Data LakeApache Parquet (Snappy compression) on Azure File Share
Query EngineDuckDB 1.5 (embedded, columnar, read-only for API)
BackendFastAPI 0.115, Pydantic, uvicorn
Analyticsscikit-learn (Random Forest, Isolation Forest, PCA), scipy
FrontendReact 18, TypeScript, Vite 5, TailwindCSS 3
ChartsRecharts 2.12 (SVG), HTML Canvas (57x57 heatmaps)
StateZustand 4 (filters/machine), TanStack React Query 5 (data)
AuthAzure AD / MSAL (planned)
HostingAzure VM ps-argo-etl (B2as_v2, Ubuntu 24.04)
Reverse Proxynginx 1.24 with wildcard SSL
CI/CDGitHub Actions on [self-hosted, psi-internal] runner

Key Architecture Decisions

DecisionRationale
VM instead of App ServiceVM has CIFS mount access to Azure File Share; App Service cannot mount file shares for ETL reads
DuckDB on local diskCIFS-mounted DuckDB files have locking issues; local disk eliminates lock conflicts
DuckDB read-only for APIPrevents lock conflicts between ETL writes and API reads
Parquet on file sharePersistent storage survives VM restarts; shareable across services
union_by_name in DuckDBARGO4’s older Parquet files have different schemas; DuckDB reconciles automatically
Single uvicorn workerDuckDB’s single-writer model conflicts with gunicorn’s multi-process workers on network storage
Symlinks for Parquet/opt/argo-analytics/data/parquet/{machine} → file share; keeps config simple

Data Sources

DailyStats (382 Columns)

Each row is one inspected part. The 382 columns are organized into 17 groups:

GroupColumnsDescription
Identity1–6Part number, timestamp, result, barcode
Pass/Fail Flags7–21Individual quality gate results (+/- for 15 defect types)
Plenum Depth22–24Plenum depth results
IC Geometry25–33Interconnect dimensional checks
Inspection Status34–40System health (timing, image acquisition, belt stop)
Defect Counts41–60Counts and dimensions of detected defects
Curvature Top61–11250 measurement points + summary (µm)
Curvature Bottom113–16450 measurement points + summary (µm)
S7 Topology165–182Regional flatness metrics (Air/Fuel sides)
Thickness183–25550 measurement points + averages (µm)
A² / Inertia256–296Structural stiffness measurements
Plenum Depth Stats297–302Per-area average, min, max
IC Dimensions303–325Hole positions, diameters, symmetry (mm)
Production Meta326–329Recipe, batch, cage, powder lot
Coating Quality330–350Stddev/offset of coating registration per region
System Validation351–3723D rotation, temperatures (7 sensors), eval times
Edge/Calibration373–382Calibration offsets, z-score alarms

Schema evolved from 375 columns (v1, 2023) to 382 (v4, 2026). The ETL auto-detects version by column count and normalizes to the canonical v4 schema.

Data Locations

MachineRaw Data PathParquet Output
ARGO4/mnt/argodatastore/ARGO4/Daily stats dump/parquet/argo4/daily_stats/date=*/
ARGO5/mnt/argodatastore/ARGO5/DailyStats/parquet/argo5/daily_stats/date=*/
ARGO6/mnt/argodatastore/ARGO6/DailyStats/parquet/argo6/daily_stats/date=*/

Note: ARGO4 uses the legacy directory name Daily stats dump/ (with spaces).

Per-Part Surface Grids (57x57)

Each part has three high-resolution surface scan files (ARGO5 only currently):

Grid TypeLocationValuesRange
Curvature TopCurvature57/{date}/CurvatureTop/Surface height (µm)~4800–5100
Curvature BottomCurvature57/{date}/CurvatureBottom/Surface height (µm)~4800–5100
ThicknessThickness57/{date}/Thickness/Plate thickness (mm)~1.3–2.1

Filename pattern: N000{PartNo}BID{BatchID}CID{CageID}D{YYYYMMDD}U{HHMMSS}.csv

External Data

SourceTableDescriptionStatus
Azure SQL PSI_Analyticslabor_detailTimesheet data (1.37M rows)Planned
Azure SQL PSI_Analyticstslabor2All timesheets (5.25M rows)Planned

Currently shift analysis uses simulated shift windows (Day 06:00–17:59, Night 18:00–05:59) derived from ARGO inspection timestamps.


API Endpoints

All endpoints accept an optional machine query parameter (argo4, argo5, argo6, or all; default: all).

Overview (/api/overview)

MethodEndpointDescription
GET/kpi?date_start&date_end&machineAggregate KPIs (yield, throughput, errors)
GET/yield-trend?date_start&date_end&machineDaily yield time series
GET/defect-pareto?date_start&date_end&machineDefect type counts ranked
GET/hourly-heatmap?date_start&date_end&machineYield by date x hour grid
GET/throughput?date_start&date_end&machineParts per hour by date

SPC (/api/spc)

MethodEndpointDescription
GET/parametersAvailable measurement parameters for SPC
GET/control-chart?parameter&date_start&date_end&subgroup_size&machineX-bar/R chart with WE rules
GET/cpk?parameter&date_start&date_end&usl&lsl&machineProcess capability indices

Batch (/api/batch)

MethodEndpointDescription
GET/list?date_start&date_end&machineBatch summaries with yield and measurements
GET/{batch_no}/detailSingle batch with percentile distributions
POST/compareSide-by-side batch comparison
GET/recipe-timeline?date_start&date_end&machineRecipe changes over time

Defect (/api/defect)

MethodEndpointDescription
GET/failure-modes?date_start&date_end&machineFailure type stats with trends
GET/by-hour?date_start&date_end&machineDefect counts by hour of day
GET/by-batch?date_start&date_end&machineDefect counts by batch

Part Lookup (/api/part)

MethodEndpointDescription
GET/search?q&limit&machineSearch by part number or batch ID
GET/{part_no}Full measurement profile
GET/{part_no}/grid/{grid_type}57x57 surface grid data (on-demand read)
GET/{part_no}/neighbors?nAdjacent parts (±n sequential)

Predictive (/api/predictive)

MethodEndpointDescription
GET/anomalies?date_start&date_end&contamination&machineIsolation Forest anomaly detection
GET/feature-importance?date_start&date_end&machineRandom Forest feature importance
GET/yield-forecast?date_start&date_end&machineTrend-based yield prediction

Labor/Shift (/api/labor)

MethodEndpointDescription
GET/shift-yield?date_start&date_end&machineYield by Day/Night shift
GET/hourly-profile?date_start&date_end&machineAverage yield by hour (0–23)
GET/day-vs-night?date_start&date_end&machineDay vs Night aggregate comparison

Cross-Machine Comparison (/api/compare)

MethodEndpointDescription
GET/summary?date_start&date_endOne-row aggregate per machine
GET/yield?date_start&date_endYield by machine by date
GET/defects?date_start&date_endTop defects per machine
GET/throughput?date_start&date_endParts per hour per machine

Meta (/api)

MethodEndpointDescription
GET/healthSystem health, DuckDB status, row counts
GET/meta/machinesList of available machine identifiers

VM Infrastructure

VM: ps-argo-etl

PropertyValue
Nameps-argo-etl
Resource GroupPS-RG-01
SizeStandard_B2as_v2 (2 vCPU, 8GB RAM)
OSUbuntu 24.04 LTS
Private IP10.160.0.20 (PS-SERVERS subnet)
Disk64GB OS disk

File System Layout

/opt/argo-analytics/                    # Application root (git repo)
├── backend/                            # Python backend + ETL
│   ├── .venv/                          # Python virtual environment
│   ├── .env                            # Environment config
│   ├── etl/                            # ETL pipeline modules
│   ├── api/                            # FastAPI application
│   └── analytics/                      # ML/SPC modules
├── frontend/dist/                      # Built React frontend
├── data/
│   ├── duckdb/argo.duckdb              # DuckDB cache (~25MB, LOCAL disk)
│   └── parquet/                        # Symlinks to file share:
│       ├── argo4 → /mnt/argodatastore/argo-analytics-data/parquet/argo4
│       ├── argo5 → /mnt/argodatastore/argo-analytics-data/parquet/argo5
│       └── argo6 → /mnt/argodatastore/argo-analytics-data/parquet/argo6
├── run-etl.sh                          # ETL execution script (cron)
└── backfill-a4a6.py                    # Multi-machine backfill script

/mnt/argodatastore/                     # Azure File Share (CIFS mount, persistent)
├── ARGO4/Daily stats dump/             # Raw CSVs from ARGO4
├── ARGO5/DailyStats/                   # Raw CSVs from ARGO5
├── ARGO5/Curvature57/                  # 57x57 surface grids
├── ARGO5/Thickness57/                  # 57x57 thickness grids
├── ARGO6/DailyStats/                   # Raw CSVs from ARGO6
└── argo-analytics-data/parquet/        # Processed Parquet output

Services

ServiceTypeConfig
argo-apisystemd/etc/systemd/system/argo-api.service — uvicorn on port 8000, auto-restart
nginxsystemdReverse proxy 80/443 → 8000, SSL with wildcard cert
ETL croncrontab0 */4 * * * — runs /opt/argo-analytics/run-etl.sh

ETL Cron Script (run-etl.sh)

#!/bin/bash
cd /opt/argo-analytics/backend
export PATH="/opt/argo-analytics/backend/.venv/bin:$PATH"
 
# Backfill last 2 days (catches today + late-arriving yesterday data)
python3 -m etl.backfill --days 2
 
# Rebuild DuckDB cache from all machine Parquet files
python3 -m etl.duckdb_cache
 
# Restart API to pick up fresh DuckDB (read_only mode needs reconnect)
systemctl restart argo-api

Azure File Share Mount

Credentials stored at /etc/smbcredentials/psargostorage.cred. Persistent mount via /etc/fstab:

//psargostorage.file.core.windows.net/argodatastore /mnt/argodatastore cifs nofail,credentials=/etc/smbcredentials/psargostorage.cred,dir_mode=0755,file_mode=0644,serverino,nosharesock,actimeo=30

Project Structure

argo-analytics/
├── CLAUDE.md                           # Claude Code instructions
├── .env.example                        # Environment variable template
├── startup.sh                          # Azure App Service startup (legacy)
├── requirements.txt                    # Root requirements (→ backend)
├── deploy.sh                           # Azure resource creation script
├── run-etl.sh                          # ETL cron execution script
├── .github/workflows/deploy.yml        # CI/CD via self-hosted runner
│
├── backend/
│   ├── pyproject.toml                  # Python project metadata + deps
│   ├── requirements.txt                # Pinned production dependencies
│   ├── etl/                            # ETL Pipeline
│   │   ├── config.py                   # Paths, constants, env loading
│   │   ├── schema.py                   # 382-column canonical schema (v1–v4)
│   │   ├── daily_stats_parser.py       # CSV → normalized DataFrame
│   │   ├── parquet_writer.py           # DataFrame → partitioned Parquet
│   │   ├── duckdb_cache.py             # Multi-machine DuckDB aggregation
│   │   ├── grid_parser.py              # 57x57 surface grid CSV reader
│   │   ├── file_scanner.py             # New file discovery
│   │   ├── backfill.py                 # Historical ingest CLI
│   │   └── logging_config.py           # Structured logging setup
│   ├── api/                            # FastAPI Application
│   │   ├── main.py                     # App entry, lifespan, SPA serving
│   │   ├── deps.py                     # DuckDB + machine validation helpers
│   │   ├── models.py                   # Pydantic response models
│   │   └── routers/
│   │       ├── health.py               # /api/health + /api/meta/machines
│   │       ├── overview.py             # /api/overview/* (5 endpoints)
│   │       ├── timeseries.py           # /api/spc/* (3 endpoints)
│   │       ├── batch.py                # /api/batch/* (4 endpoints)
│   │       ├── defect.py               # /api/defect/* (3 endpoints)
│   │       ├── part_lookup.py          # /api/part/* (4 endpoints)
│   │       ├── predictive.py           # /api/predictive/* (3 endpoints)
│   │       ├── labor.py                # /api/labor/* (3 endpoints)
│   │       ├── compare.py              # /api/compare/* (4 endpoints)
│   │       └── analytics.py            # /api/analytics/pca
│   └── analytics/                      # ML/Statistics Modules
│       ├── spc.py                      # Control charts, Cp/Cpk, WE rules
│       ├── pca.py                      # PCA dimensionality reduction
│       ├── anomaly.py                  # Isolation Forest
│       ├── prediction.py               # Yield forecasting
│       └── feature_importance.py       # Random Forest importances
│
├── frontend/
│   ├── package.json
│   ├── vite.config.ts                  # Dev proxy /api → :8000
│   ├── tailwind.config.js              # PSI brand colors
│   ├── playwright.config.ts            # E2E test config
│   ├── e2e/                            # 12 Playwright E2E tests
│   └── src/
│       ├── main.tsx                    # React root + QueryClient
│       ├── App.tsx                     # Routes (8 pages)
│       ├── api/client.ts              # fetchApi wrapper
│       ├── stores/filterStore.ts      # Zustand: date range + machine
│       ├── hooks/                      # 8 TanStack React Query hook files
│       ├── components/
│       │   ├── layout/                 # AppLayout, Sidebar, PageHeader
│       │   ├── shared/                 # KpiCard, DateRangePicker, MachinePicker
│       │   └── charts/                 # 15 chart components (Recharts + Canvas)
│       ├── pages/                      # 8 route pages
│       └── types/                      # TypeScript interfaces
│
└── data/                               # Data lake (gitignored, on VM)
    ├── parquet/argo{4,5,6} → symlinks  # → Azure File Share
    └── duckdb/argo.duckdb              # ~25MB local aggregation cache

Development

Prerequisites

  • Python 3.11+
  • Node.js 20+
  • Access to W:\ARGO5\ network share (maps to psargostorage/argodatastore)
  • Azure CLI (for deployment)

Local Development

# Clone
git clone https://progressivesurface.ghe.com/ProgressiveSurface/argo-analytics.git
cd argo-analytics
 
# Backend
cd backend
python -m venv .venv
.venv/Scripts/activate      # Windows
pip install -e ".[dev]"
 
# Initial data backfill (90 days, ARGO5 only)
python -m etl.backfill --days 90
 
# Refresh DuckDB cache
python -m etl.duckdb_cache --machines argo5
 
# Start API server
uvicorn api.main:app --reload --port 8000
 
# Frontend (separate terminal)
cd frontend
npm install
npm run dev                 # Vite dev server on :5173, proxies /api → :8000

Environment Variables

# Data paths (local dev)
ARGO5_DATA_ROOT=W:/ARGO5
DATA_LAKE_ROOT=C:/GIT/argo-analytics/data
 
# Data paths (VM production)
# ARGO5_DATA_ROOT=/mnt/argodatastore/ARGO5
# DATA_LAKE_ROOT=/opt/argo-analytics/data
 
# Logging
LOG_LEVEL=INFO

Running Tests

# Frontend E2E (requires both servers running)
cd frontend
npx playwright install chromium
npm run test:e2e

Deployment

VM Deployment (Production)

The app runs on ps-argo-etl (10.160.0.20). To update:

# SSH or use az vm run-command
cd /opt/argo-analytics
git pull origin main
 
# Rebuild frontend
cd frontend && npm run build
 
# Restart API
sudo systemctl restart argo-api

CI/CD Pipeline

GitHub Actions workflow (.github/workflows/deploy.yml):

  1. Triggered on push to main (path-filtered)
  2. Runs on [self-hosted, psi-internal] runner
  3. Builds React frontend
  4. Packages backend + frontend dist into zip
  5. Deploys via az webapp deployment source config-zip (App Service) or direct git pull (VM)

Azure Resources

ResourceNameIP / Config
VMps-argo-etl10.160.0.20, B2as_v2, Ubuntu 24.04
StoragepsargostorageFile share argodatastore
DNSargo.progressivesurface.comA record → 10.160.0.20
SSLWildcard certFrom ps-certificates-kv
App Serviceps-argo-analyticsCreated but unused (VM serves instead)
Private Endpointps-argo-analytics-pe10.160.140.16 (App Service, unused)

How to Use the Dashboard

Getting Started

  1. Connect to the PSI VPN
  2. Navigate to https://argo.progressivesurface.com
  3. The Executive Overview page loads by default showing the last 30 days across all machines

The left sidebar provides access to all 8 pages. Every page has:

  • Machine picker (top right) — filter to ARGO4, ARGO5, ARGO6, or All Machines
  • Date range picker — presets (7d, 30d, 90d) or custom start/end dates
  • Changing either filter instantly refreshes all charts on the page

Common Tasks

“What’s our yield today?” Go to Executive Overview (/). The KPI cards at the top show total parts, yield %, parts/hour, and inspection errors for the selected date range and machine.

“Which defect is causing the most rejects?” Go to Executive Overview → Defect Pareto chart (bottom left). Bars are ranked by frequency. Highpoints is typically the dominant defect. For more detail, go to Defect Deep Dive (/defects) to see defects by hour and by batch.

“Is one machine worse than the others?” Go to Multi-Machine (/machines). The summary cards show yield per machine. The yield timeline chart shows daily trends side-by-side. ARGO6 currently has the lowest yield (~86%).

“Did something change on a specific shift?” Go to Shift & Labor (/shifts). The hourly profile chart shows average yield for each hour (0–23) with shift boundaries marked at 06:00 and 18:00. The Day vs Night KPI cards show the aggregate comparison.

“What does a specific part look like?” Go to Part Lookup (/parts). Type a part number (minimum 3 digits) in the search bar. Click a result to see the full measurement profile and three 57x57 surface heatmaps (curvature top, curvature bottom, thickness). The neighbors table shows surrounding parts — if many neighbors also failed, it suggests a batch-level issue.

“Is the process in control?” Go to Time-Series & SPC (/spc). Select a measurement parameter (Curvature Top, Thickness, etc.) from the dropdown. The X-bar control chart shows subgroup means with UCL/LCL limits. Red dots indicate Western Electric rule violations. Enter USL and LSL values to compute Cp/Cpk process capability.

“Which batches are problematic?” Go to Batch Analysis (/batch). The table shows all batches sorted by yield (worst first). Red badges indicate yield below 85%, yellow for 85–92%, green for 92%+. Click column headers to sort.

“What predicts whether a part will pass or fail?” Go to Predictive Analytics (/predictive). The Feature Importance chart shows which measurements matter most (curvature = 55% of predictive power). The Anomaly Timeline shows parts flagged by the Isolation Forest model. The yield forecast card shows the predicted next-period yield.


Operations Guide

Access Control

Operational access is managed through the ARGO Analytics Admins Entra ID security group. Members get:

  • VM Contributor on ps-argo-etl (start/stop, run commands)
  • Reader on psargostorage storage account
  • Reader on PS-RG-01 resource group

To add a new team member: add them to the ARGO Analytics Admins group in Entra ID. They also need collaborator access on the GHE repo.

Current MembersUPN
Adam DevereauxADevereaux@progressivesurface.com
Dakota CooperDCooper@progressivesurface.com

Managing the VM

All management is done via az vm run-command (no SSH configured):

# Run a command on the VM
az vm run-command invoke \
  --resource-group PS-RG-01 \
  --name ps-argo-etl \
  --command-id RunShellScript \
  --scripts "your-command-here"
 
# Check service status
az vm run-command invoke --resource-group PS-RG-01 --name ps-argo-etl \
  --command-id RunShellScript \
  --scripts "systemctl status argo-api --no-pager && systemctl status nginx --no-pager"
 
# View recent API logs
az vm run-command invoke --resource-group PS-RG-01 --name ps-argo-etl \
  --command-id RunShellScript \
  --scripts "journalctl -u argo-api --no-pager -n 30"
 
# View ETL cron logs
az vm run-command invoke --resource-group PS-RG-01 --name ps-argo-etl \
  --command-id RunShellScript \
  --scripts "journalctl -t argo-etl --no-pager -n 20"
 
# Check data status
az vm run-command invoke --resource-group PS-RG-01 --name ps-argo-etl \
  --command-id RunShellScript \
  --scripts "curl -s http://localhost:8000/api/health"

Important: Only one az vm run-command can execute at a time per VM. If you get a “Conflict” error, a previous command is still running — wait and retry.

Deploying Code Updates

# Pull latest code and rebuild (via run-command)
az vm run-command invoke --resource-group PS-RG-01 --name ps-argo-etl \
  --command-id RunShellScript \
  --scripts '#!/bin/bash
    export PATH="/opt/argo-analytics/backend/.venv/bin:$PATH"
    cd /opt/argo-analytics && git pull origin main
    cd frontend && npm run build
    sudo systemctl restart argo-api
    sleep 3 && curl -s http://localhost:8000/api/health'

Running a Manual ETL

az vm run-command invoke --resource-group PS-RG-01 --name ps-argo-etl \
  --command-id RunShellScript \
  --scripts '#!/bin/bash
    export PATH="/opt/argo-analytics/backend/.venv/bin:$PATH"
    cd /opt/argo-analytics/backend
    python3 -m etl.backfill --days 2
    systemctl stop argo-api
    python3 -m etl.duckdb_cache
    systemctl start argo-api'

Adding a New ARGO Machine

  1. Verify the machine’s data exists on the Azure file share (/mnt/argodatastore/ARGO{N}/)
  2. Create symlink on the VM: ln -sfn /mnt/argodatastore/argo-analytics-data/parquet/argo{n} /opt/argo-analytics/data/parquet/argo{n}
  3. Run backfill: ARGO5_DATA_ROOT=/mnt/argodatastore/ARGO{N} python3 -m etl.backfill --days 9999 --machine argo{n}
  4. Add argo{n} to VALID_MACHINES in backend/api/deps.py
  5. Rebuild DuckDB: python3 -m etl.duckdb_cache --machines argo4 argo5 argo6 argo{n}
  6. Update the ETL cron script to include the new machine
  7. Restart the API

Troubleshooting

SymptomLikely CauseFix
Dashboard shows no dataDuckDB cache empty or API not runningCheck systemctl status argo-api and curl localhost:8000/api/health
”500 Internal Server Error” on page loadDuckDB lock conflict during ETLWait for ETL to finish, or restart API: systemctl restart argo-api
ETL cron not runningCron job lost after VM restartVerify: crontab -l, re-add if missing
File share not mountedVM restarted, mount failedCheck: mount | grep argodatastore, re-mount: mount /mnt/argodatastore
Yield shows 0% for a machineNo Parquet files for that machineCheck: find /mnt/argodatastore/argo-analytics-data/parquet/{machine} -name '*.parquet' | wc -l
”machine not found” errorMachine not in VALID_MACHINESAdd to backend/api/deps.py and redeploy
Old data (>4 hours stale)Cron job failed or stuckCheck ETL logs: journalctl -t argo-etl -n 20
VM unresponsiveOut of memory (DuckDB rebuild with too many machines)Restart VM: az vm restart --resource-group PS-RG-01 --name ps-argo-etl

Key Findings

Analysis of 22M parts across 3 machines (May 2023–March 2026):

FindingDetail
Total parts22,047,056 across ARGO4/5/6
Best machineARGO4 (90.17% yield)
Worst machineARGO6 (86.08% yield) — 4 points below ARGO4
Top defectHighpoints (dominant across all machines)
Predictive powerCurvature measurements = 55% of pass/fail prediction
Shift effectDay shift ~1% better than Night shift
Worst hourHour 23 (86.0%) — end of night shift
Best hourHour 13 (93.1%) — mid-day shift
Schema evolution375 cols (v1, 2023) → 382 cols (v4, 2026)


Last updated: March 2026