Analytics Methodology

How PSI quality metrics are calculated - formulas, AI classification, and data quality adjustments.

Core Metrics

Quality %

The primary quality metric: Quality Cost as a percentage of Project Value.

Quality % = (Total Quality Cost for Project) / Project Value x 100

Target: <2%

Calculation Options:

Mean (weighted): Total Quality Cost / Total Project Value across all projects
Median: Median of per-project Quality % values

When to use which:

Mean: Overall portfolio view
Median: More robust to outliers (single bad project doesn’t skew)

Quality Cost

Total estimated cost to address a quality issue.

Quality Cost = Labor Cost + Material Cost

Labor Cost = SUM (Hours x Rate x Calibration x Stage Multiplier)

Components:

Component	Calculation
Hours	AI-estimated by department
Rate	$150/hr eng, $125/hr shop
Calibration	Eng 1.0x, Shop 0.87x
Stage Multiplier	Based on detection timing

Stage Multipliers

Later detection = higher cost to fix.

Detection Stage	Multiplier	Rationale
Early (>90 days)	1.0x	Baseline - found during design
Mid-Build (30-90 days)	1.25x	Some rework, schedule pressure
Late Build (<30 days)	1.5x	Significant rework, overtime
Near/At Ship	2.0x	Emergency fixes, expediting
Post-Ship (Field)	2.0x	Travel costs, customer impact

Detection Timing

Based on days between redbook entry date and planned ship date.

Days_Before_Ship = Ship_Date - Ent_Date

if Days_Before_Ship > 90:       "Early (>90 days before ship)"
elif Days_Before_Ship > 30:     "Mid-Build (30-90 days)"
elif Days_Before_Ship > 0:      "Late Build (<30 days)"
elif Days_Before_Ship == 0:     "Near/At Ship"
else:                           "Post-Ship (Field)"

Caveat: Uses planned ship date. Projects shipping late may show issues as “Post-Ship” even if caught in-house.

AI Classification

How It Works

Azure OpenAI (GPT model) analyzes each redbook’s Problem, Cause, and Solution text to estimate:

Output	Options
Category	10 categories (Design, Drawing, Assembly, etc.)
Severity	Critical, Major, Moderate, Minor
Root Cause	25 categories
Hours by Dept	7 department buckets
Material Impact	Low/Medium/High + flags

Single-Call Classification

All fields estimated in one API call for consistency. The AI sees the full issue context and makes correlated predictions.

Calibration

AI estimates are calibrated based on human validation:

Engineering hours: Generally accurate (1.0x)
Shop hours: Tend to be overestimated (0.87x)
Long tail boost: Issues >4 base hours get 1.5x multiplier

Material Cost Estimation

Levels

Level	Estimated Cost
None	$0
Low	$50-200
Medium	$200-1000
High	$1000+

Flags

Additional indicators:

Involves_Rework: Requires redoing existing work
Involves_New_Parts: Need to order new parts
Involves_Scrap: Material will be scrapped

Design Reuse Metrics

Deployment Number

For each PHYS. level part, tracks which deployment this is.

Deployment_Number = count of times this part shipped before + 1

Is_First_Time = (Deployment_Number == 1)

Reuse Rate

Project Reuse Rate = (Reused Parts) / (Total Parts) x 100

where Reused Parts = parts with Deployment_Number > 1

PSI shows ~82% reuse rate at PHYS. level.

First-Time Quality Analysis

Comparing first-time vs reused deployments:

Issue rate: Issues per deployment
Average cost: Cost per issue when they occur

Finding: First-time parts have 1.09x higher average cost per issue.

Quality Risk Score

A 0-100 score for individual products based on quality history.

Components

Factor	Weight	Description
Issue Rate	30%	Issues / Deployments
Project Spread	25%	Projects with issues / Total projects
Cost Impact	25%	Total quality cost
First-Time Rate	20%	First-time issue rate

Score Interpretation

Score	Risk Level
80-100	Critical
60-79	High
40-59	Medium
0-39	Low

Recurring Escape Detection

Definition

A recurring escape is the same Product x Root Cause combination appearing across multiple projects.

Is_Recurring_Escape = (
    Product_Number + AI_RootCause combination
    appears in 2+ different projects
)

Preventable Cost

Preventable Cost = Cost of 2nd+ occurrences

Logic: First occurrence discovered the issue
       Subsequent occurrences could have been prevented

SLA Compliance

Targets

Severity	Resolution Target
Critical	⇐3 days
Major	⇐14 days
Moderate	⇐21 days
Minor	⇐30 days

Calculation

SLA Met = (Resolution_Days <= Target_Days)

SLA Compliance % = (Issues Meeting SLA) / (Total Issues) x 100

Systemic Issue Detection

Definition

A systemic issue is a root cause appearing across >50% of projects.

Project Spread % = (Projects with Root Cause) / (Total Projects) x 100

Is_Systemic = (Project Spread % > 50%)

Systemic issues indicate organizational patterns, not project-specific problems. They warrant process-level interventions.

Data Quality Adjustments

Employee ID Normalization

# XML has leading zeros, userlist doesn't
normalized_id = id.lstrip('0')
 
# Handle "Initals" typo in userlist
initials = row.get('Initals') or row.get('Initials')

Project Number Normalization

# Strip leading zeros
project = project.lstrip('0')
 
# Handle stock items
if project.lower() in ['0stock', '000stk']:
    project = 'STOCK'
 
# Remove suffixes
project = project.split('-')[0]

Drawing to Product Linkage

Drawing -> BOM Hierarchy -> PHYS. Ancestor -> Product Classification

Coverage: ~55% (many drawings are schematics, not PHYS parts)

Lead Time Causal Analysis Methodology

Added: February 2026 — C:\git\schedule\build_comprehensive_dataset.py

Approach

The comprehensive causal analysis tests 10 hypotheses about why PSI’s lead time nearly doubled (180d → 330d) and OTD collapsed (86% → 37%) starting 2022.

Causation standard (not just correlation):

Temporal precedence: Did the cause precede the effect?
Dose-response: More cause = more effect?
Mechanism: Is there a plausible mechanism?
Alternatives ruled out: Can other explanations account for the finding?

Era Definitions

Era	Years	Characteristics
Baseline	2017-2021	Normal operations, OTD ~85%
Overload	2022-2023	Active projects surged to 74, OTD collapsed to 37%
Recovery	2024-2025	Load reducing, OTD recovering to ~52%

Key Metrics (from comprehensive_dataset.csv)

Metric	Source	What It Measures
`ActiveProjectsAtShip`	project_customers.csv	Concurrent active projects when this one shipped
`TS_Avg_Concurrent`	tslabor2.csv	Avg concurrent projects per employee-month on this job
`TS_Avg_Weekly_Switches`	tslabor2.csv	Avg unique jobs per employee per week on this job
`DM_110E_Span`	fabric_dept_milestones	Controls Engineering duration (first to last labor day)
`DM_CE_MVI_Overlap`	fabric_dept_milestones	Days CE (110E) labor overlaps MVI (120) start
`RB_RFC_Count`	redbook_coq.db	Redbook RFC count per project
`ECN_Count`	ecn.csv	Engineering change notice count per project
`JP_Part_Count`	jobparts.csv	BOM part count per project
`PO_Days_From_Order`	purchaseorders.csv	Days from project order to first PO placement
`TS_Op2162_Hrs`	tslabor2.csv	Floor Engineer hours (rework indicator)
`WO_Avg_Cycle_Days`	openwo.csv	Average work order cycle time
`TS_Emp_Hrs_CV`	tslabor2.csv	Coefficient of variation in employee hours (time variance)

Correlation Ranking (Feb 2026 results)

Top correlators with DaysLate (2017+ external projects, n=660):

r	Metric	Interpretation
+0.203	DM_110E_Span	Longer CE duration → more late
+0.191	WO_Avg_Cycle_Days	Longer WO cycles → more late
+0.166	ActiveProjectsAtShip	More concurrent projects → more late
+0.138	Tot_Actual	More total hours → more late
+0.118	TS_Avg_Concurrent	More context switching → more late
+0.093	TS_Emp_Hrs_CV	More employee variance → more late
+0.090	RB_Total_Cost	Higher quality cost → more late

Hypothesis Verdicts

Hypothesis	Verdict	Evidence Strength
H1: Capacity Overload	STRONG	Dose-response confirmed: >60 active projects = 42% OTD vs <30 = 73%
H2: Machine Complexity	Contributing	Part count up ~10%, hours up 17-41%, but can’t explain full 54% LT increase
H3: Supply Chain	Contributing	Vendor delays avg 3-5 days, project delays avg 34 days
H4: Context Switching	Pending	Concurrent projects jumped 10.8 → 14.2 in overload era
H5: Quality/Rework	Pending	RFC count stable (~38-42) but ECN count rose 11 → 15
H7: Controls Late	Pending	CE finishes before MVI only 33-43% of the time
H8: Time Variance	Pending	Employee CV slightly higher in overload (1.44 → 1.53)
H11: Planning Accuracy	Pending	Budget accuracy slightly worse in overload (91% → 94%)

Dashboards - Dashboard UI and tab descriptions
PSI Data Brain - Master data source map (start here for any data question)
Data Dictionary - Field definitions
Data Brain - Data sources
Terminology - PSI vocabulary

Last updated: February 27, 2026

PSI Knowledge Base

Explorer

methodology

Analytics Methodology

Core Metrics

Quality %

Quality Cost

Stage Multipliers

Detection Timing

AI Classification

How It Works

Single-Call Classification

Calibration

Material Cost Estimation

Levels

Flags

Design Reuse Metrics

Deployment Number

Reuse Rate

First-Time Quality Analysis

Quality Risk Score

Components

Score Interpretation

Recurring Escape Detection

Definition

Preventable Cost

SLA Compliance

Targets

Calculation

Systemic Issue Detection

Definition

Data Quality Adjustments

Employee ID Normalization

Project Number Normalization

Drawing to Product Linkage

Lead Time Causal Analysis Methodology

Approach

Era Definitions

Key Metrics (from comprehensive_dataset.csv)

Correlation Ranking (Feb 2026 results)

Hypothesis Verdicts

Related Pages

Graph View

Table of Contents

Backlinks