Redbook Analysis Project

The Redbook Analysis System is a Python data analytics pipeline that processes quality issue tickets (“Redbooks”) to analyze defect patterns, cost impacts, and detection timing.

Project Overview

Purpose

Leadership wants to understand:

Where quality costs come from (root causes)
Which projects have the most issues
Whether issues are being caught early or late
What’s systemic vs. project-specific
What could be prevented with better processes

Key Outputs

Output	Description
SQLite Database	Enriched redbook data with AI classification
Streamlit Dashboard	Interactive analysis tool
Analysis CSVs	Export files for specific analyses

Live Dashboard

URL: https://quality.progressivesurface.com

Authentication: Microsoft Entra ID (PSI credentials required)

The legacy URL https://ps-redbook-dashboard.azurewebsites.net redirects automatically to the canonical URL above.

System Architecture

┌─────────────────────────────────────────────────────────────────┐
│                       DATA SOURCES                               │
├─────────────────────────────────────────────────────────────────┤
│  redbook_export.csv    │  Project1287List.xml  │  BOM_Exports/  │
│  (49,758 records)      │  (3,084 projects)     │  (1,704 files) │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      PIPELINE PHASES                             │
├─────────────────────────────────────────────────────────────────┤
│  Phase 1: Data Prep                                              │
│    • Parse dates, normalize project numbers                      │
│    • Map employee IDs, extract ECN/NCN                          │
│    • Calculate detection timing                                  │
│                                                                  │
│  Phase 2: AI Classification (Azure OpenAI)                       │
│    • Category, Severity, Root Cause                             │
│    • Labor hour estimates by department                         │
│    • Material cost estimation                                    │
│                                                                  │
│  Phase 3: Product Enrichment                                     │
│    • Drawing → BOM → PHYS. linkage                              │
│    • Product classification                                      │
│    • Deployment tracking                                         │
│                                                                  │
│  Phase 4: Analytics                                              │
│    • Preventability analysis                                     │
│    • Recurring escape detection                                  │
│    • Quality risk scoring                                        │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                         OUTPUT                                   │
├─────────────────────────────────────────────────────────────────┤
│  Processed/redbook_coq.db → Streamlit Dashboard (Azure)         │
└─────────────────────────────────────────────────────────────────┘

Key Scripts

Script	Purpose
`redbook_unified_pipeline.py`	Main pipeline (all phases)
`redbook_coq_enrich.py`	Add metadata without AI
`build_product_deployment_history.py`	Build deployment history
`product_quality_analysis.py`	Product-level analysis
`recurring_escape_analysis.py`	Find recurring escapes
`preventability_analysis.py`	Deep dive on top issues
`app.py`	Streamlit dashboard (v7.3, 10 tabs)
`build_machine_dna.py`	Machine DNA feature extraction (82 features)
`train_machine_dna_model.py`	Multi-tier predictive quality model (time-aware CV)
`build_machine_type_profiles.py`	Per-type physical/quality profiles
`subsystem_quality_attribution.py`	Subsystem x root cause quality matrix
`generate_risk_recommendations.py`	Per-project risk scoring

Running the Pipeline

Full Pipeline (All Years)

# Full run (~$430-500 AI cost)
python Scripts/redbook_unified_pipeline.py
 
# Phased by year range (recommended)
python Scripts/redbook_unified_pipeline.py --years=2020-2025  # ~$150-180
python Scripts/redbook_unified_pipeline.py --years=2010-2019  # ~$180-200
python Scripts/redbook_unified_pipeline.py --years=1999-2009  # ~$100-120

Individual Phases

# Data preparation only (free)
python Scripts/redbook_unified_pipeline.py --phase=prep
 
# AI classification with resume
python Scripts/redbook_unified_pipeline.py --phase=ai --resume
 
# Product linkage only
python Scripts/redbook_unified_pipeline.py --phase=products
 
# Validate output
python Scripts/redbook_unified_pipeline.py --validate

Enrichment (No AI Cost)

python Scripts/redbook_coq_enrich.py

Local Dashboard

streamlit run Scripts/survey_app_v6.py

Dashboard Tabs

Tab	Purpose
Overview	Data exploration starting point, summary metrics
By Project	Project rankings by Quality %, scatter plots
Root Causes	Pareto analysis, systemic issues
Trends	Time series, SLA compliance
Deep Dive	Preventability analysis (94% finding)
Products	Product-level repeat offenders, risk scores
Explorer	Full data access, filters, search
Lead Time	Lead time analysis with comprehensive dataset
Machine DNA	First-principles quality profiling (see below)
Project Explorer	Per-project quality deep dive with DNA radar, RFC timeline, workforce profile, comp machine comparison

Admin Features

Password: Contact IT for credentials.

Engineer Analysis tab
Calibration controls
Feedback analysis

Key Findings

Quality Cost

Quality % target: <2% of project value
Average: Varies by industry and project type
Top cost driver: Design Function root cause

Detection Timing

Stage	% of Issues	Cost Multiplier
Early (>90 days)	~25%	1.0x
Mid-Build (30-90)	~30%	1.25x
Late Build (<30)	~20%	1.5x
Near/At Ship	~10%	2.0x
Post-Ship	~15%	2.0x

Preventability

From Deep Dive analysis of top 100 Design Function issues:

94% were preventable with better processes
Top prevention methods: Design review, supplier management, communication

Design Reuse

Metric	Value
PHYS. level reuse rate	~82%
First-time cost multiplier	1.09x higher
Recurring escapes identified	127

Machine DNA: First-Principles Quality Profiling

Machine DNA extracts physical and design characteristics from each machine’s BOM, control components, and configuration history to predict quality risk before the build starts.

Six Dimensions (82 Features)

Dimension	Features	Source
Structural Complexity	BOM size, depth, make/buy ratio	project_boms (4.3M items)
Subsystem Composition	Product class diversity, entropy	parts_master join
Design Novelty	First-time part %, known-problem parts	reuse metrics
Automation & Control	Robot/servo/VFD counts, PLC complexity	control_components
Configuration Lineage	Prior builds, comp machine quality	project_complexity
Workforce	Team tenure, HHI, corrective hours	labor_detail

Predictive Model (March 2026)

Tier	When Available	AUC	Key Insight
Order Time	At order	0.507	Config identity alone is predictive
At Staffing	2-3 weeks	0.631	Workforce features add signal
BOM Release	2-4 weeks	0.639	Novelty and BOM features add strong signal
Early Build	4-8 weeks	0.630	Matches normalized model using only leading indicators
Field Escape	Post-ship	0.614	Predicts post-ship quality escapes

Time-aware expanding-window CV. Earlier shuffled baselines (0.749–0.806) were inflated by temporal leakage.

Per-Machine-Type Signatures

Std Robot: VFD/control complexity (r=0.72) — more motion control = more issues
Swing Door: Control components (r=0.71) — automation integration risk
Acoustical Booth: Value tier + novelty (r=0.58) — scope and new designs
Special Machine: BOM size (r=0.57) — raw complexity drives risk

Risk Recommendations

The system generates per-project recommendations based on DNA features. Example triggers: >40% first-time parts, >3 known-problem parts, first-time configuration, high subsystem diversity.

See Documentation/MACHINE_DNA_METHODOLOGY.md in the repo for full details.

Design Reuse Analysis

What is Design Reuse?

PSI tracks whether PHYS. level parts are being deployed for the first time or have been used in previous projects.

Key Metrics

Metric	Definition
Deployment Number	Nth time this product has shipped
Is_First_Time	True if Deployment_Number = 1
Reuse Rate	% of parts not first-time

Findings

First-time parts have higher issue rates
- 1.09x higher average cost per issue
- More common root causes: Design Function, Drawing Error
Learning curve effect
- Issues decrease as products mature
- 0.033 → 0.005 issues per product by 5th deployment
Recurring escapes
- 127 product × root cause combinations across multiple projects
- $93,688 in preventable cost

Deep Dive Analysis

Methodology

Filter to Design Function root cause (top cost driver)
Sort by cost, take top 100
AI analyzes each for preventability
Human validation via feedback system

Key Finding

94% of top Design Function issues were preventable

Prevention Categories

Method	% of Preventable
Design Review	~40%
Better Requirements	~25%
Supplier Management	~20%
Process Control	~15%

Data Model

Main Tables

Table	Purpose
`redbooks`	Main issues table (100+ columns)
`product_deployment_history`	Deployment tracking
`project_reuse_metrics`	Per-project reuse rates
`part_quality_metrics`	Product risk scores

See Data Dictionary for field details.

Key Fields

AI Classification: AICategory, AI_Severity, AI_RootCause, Hours_, Cost__
Product Linkage: Product_Number, Product_Class_Name, Deployment_Number
Timing: Ship_Timing_Category, Days_Before_Ship, Resolution_Days

Deployment

Azure App Service

# Trigger deploy workflow (GitHub Enterprise)
GH_HOST=progressivesurface.ghe.com gh workflow run deploy.yml \
  -R ProgressiveSurface/redbook-analysis --ref main

Deployment Standard

App Service deployments are executed by GitHub Actions (identity-based deployment).
Publish-profile / local-git deployment patterns are deprecated for production.

Configuration

Python 3.8
Streamlit
Microsoft Entra ID authentication

See main project DEPLOYMENT.md for full details.

Known Limitations

Data Quality

Closer data unavailable: Raw export limitation
Planned vs actual ship dates: Affects detection timing
~55% product linkage: Many drawings aren’t PHYS parts

AI Classification

Estimates are directional: Not precise figures
Calibration varies: Shop hours tend to be overestimated
Single-call classification: Category, severity, root cause, hours in one API call

Repository

Location: C:\git\redbook-dashboard

Key Directories

Directory	Contents
`Scripts/`	Python pipeline and dashboard
`RawData/`	Source data files
`Processed/`	Output database and CSVs
`Documentation/`	Additional docs

Data Brain - Data sources
Data Dictionary - Field definitions
Quality Process - Redbook system
Methodology - Calculation methods

Last updated: March 2026

PSI Knowledge Base

Explorer

redbook-analysis