Analytics Methodology
How PSI quality metrics are calculated - formulas, AI classification, and data quality adjustments.
Core Metrics
Quality %
The primary quality metric: Quality Cost as a percentage of Project Value.
Quality % = (Total Quality Cost for Project) / Project Value x 100
Target: <2%
Calculation Options:
- Mean (weighted): Total Quality Cost / Total Project Value across all projects
- Median: Median of per-project Quality % values
When to use which:
- Mean: Overall portfolio view
- Median: More robust to outliers (single bad project doesn’t skew)
Quality Cost
Total estimated cost to address a quality issue.
Quality Cost = Labor Cost + Material Cost
Labor Cost = SUM (Hours x Rate x Calibration x Stage Multiplier)
Components:
| Component | Calculation |
|---|---|
| Hours | AI-estimated by department |
| Rate | $150/hr eng, $125/hr shop |
| Calibration | Eng 1.0x, Shop 0.87x |
| Stage Multiplier | Based on detection timing |
Stage Multipliers
Later detection = higher cost to fix.
| Detection Stage | Multiplier | Rationale |
|---|---|---|
| Early (>90 days) | 1.0x | Baseline - found during design |
| Mid-Build (30-90 days) | 1.25x | Some rework, schedule pressure |
| Late Build (<30 days) | 1.5x | Significant rework, overtime |
| Near/At Ship | 2.0x | Emergency fixes, expediting |
| Post-Ship (Field) | 2.0x | Travel costs, customer impact |
Detection Timing
Based on days between redbook entry date and planned ship date.
Days_Before_Ship = Ship_Date - Ent_Date
if Days_Before_Ship > 90: "Early (>90 days before ship)"
elif Days_Before_Ship > 30: "Mid-Build (30-90 days)"
elif Days_Before_Ship > 0: "Late Build (<30 days)"
elif Days_Before_Ship == 0: "Near/At Ship"
else: "Post-Ship (Field)"
Caveat: Uses planned ship date. Projects shipping late may show issues as “Post-Ship” even if caught in-house.
AI Classification
How It Works
Azure OpenAI (GPT model) analyzes each redbook’s Problem, Cause, and Solution text to estimate:
| Output | Options |
|---|---|
| Category | 10 categories (Design, Drawing, Assembly, etc.) |
| Severity | Critical, Major, Moderate, Minor |
| Root Cause | 25 categories |
| Hours by Dept | 7 department buckets |
| Material Impact | Low/Medium/High + flags |
Single-Call Classification
All fields estimated in one API call for consistency. The AI sees the full issue context and makes correlated predictions.
Calibration
AI estimates are calibrated based on human validation:
- Engineering hours: Generally accurate (1.0x)
- Shop hours: Tend to be overestimated (0.87x)
- Long tail boost: Issues >4 base hours get 1.5x multiplier
Material Cost Estimation
Levels
| Level | Estimated Cost |
|---|---|
| None | $0 |
| Low | $50-200 |
| Medium | $200-1000 |
| High | $1000+ |
Flags
Additional indicators:
Involves_Rework: Requires redoing existing workInvolves_New_Parts: Need to order new partsInvolves_Scrap: Material will be scrapped
Design Reuse Metrics
Deployment Number
For each PHYS. level part, tracks which deployment this is.
Deployment_Number = count of times this part shipped before + 1
Is_First_Time = (Deployment_Number == 1)
Reuse Rate
Project Reuse Rate = (Reused Parts) / (Total Parts) x 100
where Reused Parts = parts with Deployment_Number > 1
PSI shows ~82% reuse rate at PHYS. level.
First-Time Quality Analysis
Comparing first-time vs reused deployments:
- Issue rate: Issues per deployment
- Average cost: Cost per issue when they occur
Finding: First-time parts have 1.09x higher average cost per issue.
Quality Risk Score
A 0-100 score for individual products based on quality history.
Components
| Factor | Weight | Description |
|---|---|---|
| Issue Rate | 30% | Issues / Deployments |
| Project Spread | 25% | Projects with issues / Total projects |
| Cost Impact | 25% | Total quality cost |
| First-Time Rate | 20% | First-time issue rate |
Score Interpretation
| Score | Risk Level |
|---|---|
| 80-100 | Critical |
| 60-79 | High |
| 40-59 | Medium |
| 0-39 | Low |
Recurring Escape Detection
Definition
A recurring escape is the same Product x Root Cause combination appearing across multiple projects.
Is_Recurring_Escape = (
Product_Number + AI_RootCause combination
appears in 2+ different projects
)
Preventable Cost
Preventable Cost = Cost of 2nd+ occurrences
Logic: First occurrence discovered the issue
Subsequent occurrences could have been prevented
SLA Compliance
Targets
| Severity | Resolution Target |
|---|---|
| Critical | ⇐3 days |
| Major | ⇐14 days |
| Moderate | ⇐21 days |
| Minor | ⇐30 days |
Calculation
SLA Met = (Resolution_Days <= Target_Days)
SLA Compliance % = (Issues Meeting SLA) / (Total Issues) x 100
Systemic Issue Detection
Definition
A systemic issue is a root cause appearing across >50% of projects.
Project Spread % = (Projects with Root Cause) / (Total Projects) x 100
Is_Systemic = (Project Spread % > 50%)
Systemic issues indicate organizational patterns, not project-specific problems. They warrant process-level interventions.
Data Quality Adjustments
Employee ID Normalization
# XML has leading zeros, userlist doesn't
normalized_id = id.lstrip('0')
# Handle "Initals" typo in userlist
initials = row.get('Initals') or row.get('Initials')Project Number Normalization
# Strip leading zeros
project = project.lstrip('0')
# Handle stock items
if project.lower() in ['0stock', '000stk']:
project = 'STOCK'
# Remove suffixes
project = project.split('-')[0]Drawing to Product Linkage
Drawing -> BOM Hierarchy -> PHYS. Ancestor -> Product Classification
Coverage: ~55% (many drawings are schematics, not PHYS parts)
Lead Time Causal Analysis Methodology
Added: February 2026 —
C:\git\schedule\build_comprehensive_dataset.py
Approach
The comprehensive causal analysis tests 10 hypotheses about why PSI’s lead time nearly doubled (180d → 330d) and OTD collapsed (86% → 37%) starting 2022.
Causation standard (not just correlation):
- Temporal precedence: Did the cause precede the effect?
- Dose-response: More cause = more effect?
- Mechanism: Is there a plausible mechanism?
- Alternatives ruled out: Can other explanations account for the finding?
Era Definitions
| Era | Years | Characteristics |
|---|---|---|
| Baseline | 2017-2021 | Normal operations, OTD ~85% |
| Overload | 2022-2023 | Active projects surged to 74, OTD collapsed to 37% |
| Recovery | 2024-2025 | Load reducing, OTD recovering to ~52% |
Key Metrics (from comprehensive_dataset.csv)
| Metric | Source | What It Measures |
|---|---|---|
ActiveProjectsAtShip | project_customers.csv | Concurrent active projects when this one shipped |
TS_Avg_Concurrent | tslabor2.csv | Avg concurrent projects per employee-month on this job |
TS_Avg_Weekly_Switches | tslabor2.csv | Avg unique jobs per employee per week on this job |
DM_110E_Span | fabric_dept_milestones | Controls Engineering duration (first to last labor day) |
DM_CE_MVI_Overlap | fabric_dept_milestones | Days CE (110E) labor overlaps MVI (120) start |
RB_RFC_Count | redbook_coq.db | Redbook RFC count per project |
ECN_Count | ecn.csv | Engineering change notice count per project |
JP_Part_Count | jobparts.csv | BOM part count per project |
PO_Days_From_Order | purchaseorders.csv | Days from project order to first PO placement |
TS_Op2162_Hrs | tslabor2.csv | Floor Engineer hours (rework indicator) |
WO_Avg_Cycle_Days | openwo.csv | Average work order cycle time |
TS_Emp_Hrs_CV | tslabor2.csv | Coefficient of variation in employee hours (time variance) |
Correlation Ranking (Feb 2026 results)
Top correlators with DaysLate (2017+ external projects, n=660):
| r | Metric | Interpretation |
|---|---|---|
| +0.203 | DM_110E_Span | Longer CE duration → more late |
| +0.191 | WO_Avg_Cycle_Days | Longer WO cycles → more late |
| +0.166 | ActiveProjectsAtShip | More concurrent projects → more late |
| +0.138 | Tot_Actual | More total hours → more late |
| +0.118 | TS_Avg_Concurrent | More context switching → more late |
| +0.093 | TS_Emp_Hrs_CV | More employee variance → more late |
| +0.090 | RB_Total_Cost | Higher quality cost → more late |
Hypothesis Verdicts
| Hypothesis | Verdict | Evidence Strength |
|---|---|---|
| H1: Capacity Overload | STRONG | Dose-response confirmed: >60 active projects = 42% OTD vs <30 = 73% |
| H2: Machine Complexity | Contributing | Part count up ~10%, hours up 17-41%, but can’t explain full 54% LT increase |
| H3: Supply Chain | Contributing | Vendor delays avg 3-5 days, project delays avg 34 days |
| H4: Context Switching | Pending | Concurrent projects jumped 10.8 → 14.2 in overload era |
| H5: Quality/Rework | Pending | RFC count stable (~38-42) but ECN count rose 11 → 15 |
| H7: Controls Late | Pending | CE finishes before MVI only 33-43% of the time |
| H8: Time Variance | Pending | Employee CV slightly higher in overload (1.44 → 1.53) |
| H11: Planning Accuracy | Pending | Budget accuracy slightly worse in overload (91% → 94%) |
Related Pages
- Dashboards - Dashboard UI and tab descriptions
- PSI Data Brain - Master data source map (start here for any data question)
- Data Dictionary - Field definitions
- Data Brain - Data sources
- Terminology - PSI vocabulary
Last updated: February 27, 2026