Redbook Analysis Project
The Redbook Analysis System is a Python data analytics pipeline that processes quality issue tickets (“Redbooks”) to analyze defect patterns, cost impacts, and detection timing.
Project Overview
Purpose
Leadership wants to understand:
- Where quality costs come from (root causes)
- Which projects have the most issues
- Whether issues are being caught early or late
- What’s systemic vs. project-specific
- What could be prevented with better processes
Key Outputs
| Output | Description |
|---|---|
| SQLite Database | Enriched redbook data with AI classification |
| Streamlit Dashboard | Interactive analysis tool |
| Analysis CSVs | Export files for specific analyses |
Live Dashboard
URL: https://quality.progressivesurface.com
Authentication: Microsoft Entra ID (PSI credentials required)
The legacy URL
https://ps-redbook-dashboard.azurewebsites.netredirects automatically to the canonical URL above.
System Architecture
┌─────────────────────────────────────────────────────────────────┐
│ DATA SOURCES │
├─────────────────────────────────────────────────────────────────┤
│ redbook_export.csv │ Project1287List.xml │ BOM_Exports/ │
│ (49,758 records) │ (3,084 projects) │ (1,704 files) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ PIPELINE PHASES │
├─────────────────────────────────────────────────────────────────┤
│ Phase 1: Data Prep │
│ • Parse dates, normalize project numbers │
│ • Map employee IDs, extract ECN/NCN │
│ • Calculate detection timing │
│ │
│ Phase 2: AI Classification (Azure OpenAI) │
│ • Category, Severity, Root Cause │
│ • Labor hour estimates by department │
│ • Material cost estimation │
│ │
│ Phase 3: Product Enrichment │
│ • Drawing → BOM → PHYS. linkage │
│ • Product classification │
│ • Deployment tracking │
│ │
│ Phase 4: Analytics │
│ • Preventability analysis │
│ • Recurring escape detection │
│ • Quality risk scoring │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ OUTPUT │
├─────────────────────────────────────────────────────────────────┤
│ Processed/redbook_coq.db → Streamlit Dashboard (Azure) │
└─────────────────────────────────────────────────────────────────┘
Key Scripts
| Script | Purpose |
|---|---|
redbook_unified_pipeline.py | Main pipeline (all phases) |
redbook_coq_enrich.py | Add metadata without AI |
build_product_deployment_history.py | Build deployment history |
product_quality_analysis.py | Product-level analysis |
recurring_escape_analysis.py | Find recurring escapes |
preventability_analysis.py | Deep dive on top issues |
app.py | Streamlit dashboard (v7.3, 10 tabs) |
build_machine_dna.py | Machine DNA feature extraction (82 features) |
train_machine_dna_model.py | Multi-tier predictive quality model (time-aware CV) |
build_machine_type_profiles.py | Per-type physical/quality profiles |
subsystem_quality_attribution.py | Subsystem x root cause quality matrix |
generate_risk_recommendations.py | Per-project risk scoring |
Running the Pipeline
Full Pipeline (All Years)
# Full run (~$430-500 AI cost)
python Scripts/redbook_unified_pipeline.py
# Phased by year range (recommended)
python Scripts/redbook_unified_pipeline.py --years=2020-2025 # ~$150-180
python Scripts/redbook_unified_pipeline.py --years=2010-2019 # ~$180-200
python Scripts/redbook_unified_pipeline.py --years=1999-2009 # ~$100-120Individual Phases
# Data preparation only (free)
python Scripts/redbook_unified_pipeline.py --phase=prep
# AI classification with resume
python Scripts/redbook_unified_pipeline.py --phase=ai --resume
# Product linkage only
python Scripts/redbook_unified_pipeline.py --phase=products
# Validate output
python Scripts/redbook_unified_pipeline.py --validateEnrichment (No AI Cost)
python Scripts/redbook_coq_enrich.pyLocal Dashboard
streamlit run Scripts/survey_app_v6.pyDashboard Tabs
| Tab | Purpose |
|---|---|
| Overview | Data exploration starting point, summary metrics |
| By Project | Project rankings by Quality %, scatter plots |
| Root Causes | Pareto analysis, systemic issues |
| Trends | Time series, SLA compliance |
| Deep Dive | Preventability analysis (94% finding) |
| Products | Product-level repeat offenders, risk scores |
| Explorer | Full data access, filters, search |
| Lead Time | Lead time analysis with comprehensive dataset |
| Machine DNA | First-principles quality profiling (see below) |
| Project Explorer | Per-project quality deep dive with DNA radar, RFC timeline, workforce profile, comp machine comparison |
Admin Features
Password: Contact IT for credentials.
- Engineer Analysis tab
- Calibration controls
- Feedback analysis
Key Findings
Quality Cost
- Quality % target: <2% of project value
- Average: Varies by industry and project type
- Top cost driver: Design Function root cause
Detection Timing
| Stage | % of Issues | Cost Multiplier |
|---|---|---|
| Early (>90 days) | ~25% | 1.0x |
| Mid-Build (30-90) | ~30% | 1.25x |
| Late Build (<30) | ~20% | 1.5x |
| Near/At Ship | ~10% | 2.0x |
| Post-Ship | ~15% | 2.0x |
Preventability
From Deep Dive analysis of top 100 Design Function issues:
- 94% were preventable with better processes
- Top prevention methods: Design review, supplier management, communication
Design Reuse
| Metric | Value |
|---|---|
| PHYS. level reuse rate | ~82% |
| First-time cost multiplier | 1.09x higher |
| Recurring escapes identified | 127 |
Machine DNA: First-Principles Quality Profiling
Machine DNA extracts physical and design characteristics from each machine’s BOM, control components, and configuration history to predict quality risk before the build starts.
Six Dimensions (82 Features)
| Dimension | Features | Source |
|---|---|---|
| Structural Complexity | BOM size, depth, make/buy ratio | project_boms (4.3M items) |
| Subsystem Composition | Product class diversity, entropy | parts_master join |
| Design Novelty | First-time part %, known-problem parts | reuse metrics |
| Automation & Control | Robot/servo/VFD counts, PLC complexity | control_components |
| Configuration Lineage | Prior builds, comp machine quality | project_complexity |
| Workforce | Team tenure, HHI, corrective hours | labor_detail |
Predictive Model (March 2026)
| Tier | When Available | AUC | Key Insight |
|---|---|---|---|
| Order Time | At order | 0.507 | Config identity alone is predictive |
| At Staffing | 2-3 weeks | 0.631 | Workforce features add signal |
| BOM Release | 2-4 weeks | 0.639 | Novelty and BOM features add strong signal |
| Early Build | 4-8 weeks | 0.630 | Matches normalized model using only leading indicators |
| Field Escape | Post-ship | 0.614 | Predicts post-ship quality escapes |
Time-aware expanding-window CV. Earlier shuffled baselines (0.749–0.806) were inflated by temporal leakage.
Per-Machine-Type Signatures
- Std Robot: VFD/control complexity (r=0.72) — more motion control = more issues
- Swing Door: Control components (r=0.71) — automation integration risk
- Acoustical Booth: Value tier + novelty (r=0.58) — scope and new designs
- Special Machine: BOM size (r=0.57) — raw complexity drives risk
Risk Recommendations
The system generates per-project recommendations based on DNA features. Example triggers: >40% first-time parts, >3 known-problem parts, first-time configuration, high subsystem diversity.
See Documentation/MACHINE_DNA_METHODOLOGY.md in the repo for full details.
Design Reuse Analysis
What is Design Reuse?
PSI tracks whether PHYS. level parts are being deployed for the first time or have been used in previous projects.
Key Metrics
| Metric | Definition |
|---|---|
| Deployment Number | Nth time this product has shipped |
| Is_First_Time | True if Deployment_Number = 1 |
| Reuse Rate | % of parts not first-time |
Findings
-
First-time parts have higher issue rates
- 1.09x higher average cost per issue
- More common root causes: Design Function, Drawing Error
-
Learning curve effect
- Issues decrease as products mature
- 0.033 → 0.005 issues per product by 5th deployment
-
Recurring escapes
- 127 product × root cause combinations across multiple projects
- $93,688 in preventable cost
Deep Dive Analysis
Methodology
- Filter to Design Function root cause (top cost driver)
- Sort by cost, take top 100
- AI analyzes each for preventability
- Human validation via feedback system
Key Finding
94% of top Design Function issues were preventable
Prevention Categories
| Method | % of Preventable |
|---|---|
| Design Review | ~40% |
| Better Requirements | ~25% |
| Supplier Management | ~20% |
| Process Control | ~15% |
Data Model
Main Tables
| Table | Purpose |
|---|---|
redbooks | Main issues table (100+ columns) |
product_deployment_history | Deployment tracking |
project_reuse_metrics | Per-project reuse rates |
part_quality_metrics | Product risk scores |
See Data Dictionary for field details.
Key Fields
- AI Classification: AICategory, AI_Severity, AI_RootCause, Hours_, Cost__
- Product Linkage: Product_Number, Product_Class_Name, Deployment_Number
- Timing: Ship_Timing_Category, Days_Before_Ship, Resolution_Days
Deployment
Azure App Service
# Trigger deploy workflow (GitHub Enterprise)
GH_HOST=progressivesurface.ghe.com gh workflow run deploy.yml \
-R ProgressiveSurface/redbook-analysis --ref mainDeployment Standard
- App Service deployments are executed by GitHub Actions (identity-based deployment).
- Publish-profile / local-git deployment patterns are deprecated for production.
Configuration
- Python 3.8
- Streamlit
- Microsoft Entra ID authentication
See main project DEPLOYMENT.md for full details.
Known Limitations
Data Quality
- Closer data unavailable: Raw export limitation
- Planned vs actual ship dates: Affects detection timing
- ~55% product linkage: Many drawings aren’t PHYS parts
AI Classification
- Estimates are directional: Not precise figures
- Calibration varies: Shop hours tend to be overestimated
- Single-call classification: Category, severity, root cause, hours in one API call
Repository
Location: C:\git\redbook-dashboard
Key Directories
| Directory | Contents |
|---|---|
Scripts/ | Python pipeline and dashboard |
RawData/ | Source data files |
Processed/ | Output database and CSVs |
Documentation/ | Additional docs |
Related Pages
- Data Brain - Data sources
- Data Dictionary - Field definitions
- Quality Process - Redbook system
- Methodology - Calculation methods
Last updated: March 2026