Redbook Analysis Project

The Redbook Analysis System is a Python data analytics pipeline that processes quality issue tickets (“Redbooks”) to analyze defect patterns, cost impacts, and detection timing.


Project Overview

Purpose

Leadership wants to understand:

  1. Where quality costs come from (root causes)
  2. Which projects have the most issues
  3. Whether issues are being caught early or late
  4. What’s systemic vs. project-specific
  5. What could be prevented with better processes

Key Outputs

OutputDescription
SQLite DatabaseEnriched redbook data with AI classification
Streamlit DashboardInteractive analysis tool
Analysis CSVsExport files for specific analyses

Live Dashboard

URL: https://quality.progressivesurface.com

Authentication: Microsoft Entra ID (PSI credentials required)

The legacy URL https://ps-redbook-dashboard.azurewebsites.net redirects automatically to the canonical URL above.


System Architecture

┌─────────────────────────────────────────────────────────────────┐
│                       DATA SOURCES                               │
├─────────────────────────────────────────────────────────────────┤
│  redbook_export.csv    │  Project1287List.xml  │  BOM_Exports/  │
│  (49,758 records)      │  (3,084 projects)     │  (1,704 files) │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      PIPELINE PHASES                             │
├─────────────────────────────────────────────────────────────────┤
│  Phase 1: Data Prep                                              │
│    • Parse dates, normalize project numbers                      │
│    • Map employee IDs, extract ECN/NCN                          │
│    • Calculate detection timing                                  │
│                                                                  │
│  Phase 2: AI Classification (Azure OpenAI)                       │
│    • Category, Severity, Root Cause                             │
│    • Labor hour estimates by department                         │
│    • Material cost estimation                                    │
│                                                                  │
│  Phase 3: Product Enrichment                                     │
│    • Drawing → BOM → PHYS. linkage                              │
│    • Product classification                                      │
│    • Deployment tracking                                         │
│                                                                  │
│  Phase 4: Analytics                                              │
│    • Preventability analysis                                     │
│    • Recurring escape detection                                  │
│    • Quality risk scoring                                        │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                         OUTPUT                                   │
├─────────────────────────────────────────────────────────────────┤
│  Processed/redbook_coq.db → Streamlit Dashboard (Azure)         │
└─────────────────────────────────────────────────────────────────┘

Key Scripts

ScriptPurpose
redbook_unified_pipeline.pyMain pipeline (all phases)
redbook_coq_enrich.pyAdd metadata without AI
build_product_deployment_history.pyBuild deployment history
product_quality_analysis.pyProduct-level analysis
recurring_escape_analysis.pyFind recurring escapes
preventability_analysis.pyDeep dive on top issues
app.pyStreamlit dashboard (v7.3, 10 tabs)
build_machine_dna.pyMachine DNA feature extraction (82 features)
train_machine_dna_model.pyMulti-tier predictive quality model (time-aware CV)
build_machine_type_profiles.pyPer-type physical/quality profiles
subsystem_quality_attribution.pySubsystem x root cause quality matrix
generate_risk_recommendations.pyPer-project risk scoring

Running the Pipeline

Full Pipeline (All Years)

# Full run (~$430-500 AI cost)
python Scripts/redbook_unified_pipeline.py
 
# Phased by year range (recommended)
python Scripts/redbook_unified_pipeline.py --years=2020-2025  # ~$150-180
python Scripts/redbook_unified_pipeline.py --years=2010-2019  # ~$180-200
python Scripts/redbook_unified_pipeline.py --years=1999-2009  # ~$100-120

Individual Phases

# Data preparation only (free)
python Scripts/redbook_unified_pipeline.py --phase=prep
 
# AI classification with resume
python Scripts/redbook_unified_pipeline.py --phase=ai --resume
 
# Product linkage only
python Scripts/redbook_unified_pipeline.py --phase=products
 
# Validate output
python Scripts/redbook_unified_pipeline.py --validate

Enrichment (No AI Cost)

python Scripts/redbook_coq_enrich.py

Local Dashboard

streamlit run Scripts/survey_app_v6.py

Dashboard Tabs

TabPurpose
OverviewData exploration starting point, summary metrics
By ProjectProject rankings by Quality %, scatter plots
Root CausesPareto analysis, systemic issues
TrendsTime series, SLA compliance
Deep DivePreventability analysis (94% finding)
ProductsProduct-level repeat offenders, risk scores
ExplorerFull data access, filters, search
Lead TimeLead time analysis with comprehensive dataset
Machine DNAFirst-principles quality profiling (see below)
Project ExplorerPer-project quality deep dive with DNA radar, RFC timeline, workforce profile, comp machine comparison

Admin Features

Password: Contact IT for credentials.

  • Engineer Analysis tab
  • Calibration controls
  • Feedback analysis

Key Findings

Quality Cost

  • Quality % target: <2% of project value
  • Average: Varies by industry and project type
  • Top cost driver: Design Function root cause

Detection Timing

Stage% of IssuesCost Multiplier
Early (>90 days)~25%1.0x
Mid-Build (30-90)~30%1.25x
Late Build (<30)~20%1.5x
Near/At Ship~10%2.0x
Post-Ship~15%2.0x

Preventability

From Deep Dive analysis of top 100 Design Function issues:

  • 94% were preventable with better processes
  • Top prevention methods: Design review, supplier management, communication

Design Reuse

MetricValue
PHYS. level reuse rate~82%
First-time cost multiplier1.09x higher
Recurring escapes identified127

Machine DNA: First-Principles Quality Profiling

Machine DNA extracts physical and design characteristics from each machine’s BOM, control components, and configuration history to predict quality risk before the build starts.

Six Dimensions (82 Features)

DimensionFeaturesSource
Structural ComplexityBOM size, depth, make/buy ratioproject_boms (4.3M items)
Subsystem CompositionProduct class diversity, entropyparts_master join
Design NoveltyFirst-time part %, known-problem partsreuse metrics
Automation & ControlRobot/servo/VFD counts, PLC complexitycontrol_components
Configuration LineagePrior builds, comp machine qualityproject_complexity
WorkforceTeam tenure, HHI, corrective hourslabor_detail

Predictive Model (March 2026)

TierWhen AvailableAUCKey Insight
Order TimeAt order0.507Config identity alone is predictive
At Staffing2-3 weeks0.631Workforce features add signal
BOM Release2-4 weeks0.639Novelty and BOM features add strong signal
Early Build4-8 weeks0.630Matches normalized model using only leading indicators
Field EscapePost-ship0.614Predicts post-ship quality escapes

Time-aware expanding-window CV. Earlier shuffled baselines (0.749–0.806) were inflated by temporal leakage.

Per-Machine-Type Signatures

  • Std Robot: VFD/control complexity (r=0.72) — more motion control = more issues
  • Swing Door: Control components (r=0.71) — automation integration risk
  • Acoustical Booth: Value tier + novelty (r=0.58) — scope and new designs
  • Special Machine: BOM size (r=0.57) — raw complexity drives risk

Risk Recommendations

The system generates per-project recommendations based on DNA features. Example triggers: >40% first-time parts, >3 known-problem parts, first-time configuration, high subsystem diversity.

See Documentation/MACHINE_DNA_METHODOLOGY.md in the repo for full details.


Design Reuse Analysis

What is Design Reuse?

PSI tracks whether PHYS. level parts are being deployed for the first time or have been used in previous projects.

Key Metrics

MetricDefinition
Deployment NumberNth time this product has shipped
Is_First_TimeTrue if Deployment_Number = 1
Reuse Rate% of parts not first-time

Findings

  1. First-time parts have higher issue rates

    • 1.09x higher average cost per issue
    • More common root causes: Design Function, Drawing Error
  2. Learning curve effect

    • Issues decrease as products mature
    • 0.033 → 0.005 issues per product by 5th deployment
  3. Recurring escapes

    • 127 product × root cause combinations across multiple projects
    • $93,688 in preventable cost

Deep Dive Analysis

Methodology

  1. Filter to Design Function root cause (top cost driver)
  2. Sort by cost, take top 100
  3. AI analyzes each for preventability
  4. Human validation via feedback system

Key Finding

94% of top Design Function issues were preventable

Prevention Categories

Method% of Preventable
Design Review~40%
Better Requirements~25%
Supplier Management~20%
Process Control~15%

Data Model

Main Tables

TablePurpose
redbooksMain issues table (100+ columns)
product_deployment_historyDeployment tracking
project_reuse_metricsPer-project reuse rates
part_quality_metricsProduct risk scores

See Data Dictionary for field details.

Key Fields

  • AI Classification: AICategory, AI_Severity, AI_RootCause, Hours_, Cost__
  • Product Linkage: Product_Number, Product_Class_Name, Deployment_Number
  • Timing: Ship_Timing_Category, Days_Before_Ship, Resolution_Days

Deployment

Azure App Service

# Trigger deploy workflow (GitHub Enterprise)
GH_HOST=progressivesurface.ghe.com gh workflow run deploy.yml \
  -R ProgressiveSurface/redbook-analysis --ref main

Deployment Standard

  • App Service deployments are executed by GitHub Actions (identity-based deployment).
  • Publish-profile / local-git deployment patterns are deprecated for production.

Configuration

  • Python 3.8
  • Streamlit
  • Microsoft Entra ID authentication

See main project DEPLOYMENT.md for full details.


Known Limitations

Data Quality

  1. Closer data unavailable: Raw export limitation
  2. Planned vs actual ship dates: Affects detection timing
  3. ~55% product linkage: Many drawings aren’t PHYS parts

AI Classification

  1. Estimates are directional: Not precise figures
  2. Calibration varies: Shop hours tend to be overestimated
  3. Single-call classification: Category, severity, root cause, hours in one API call

Repository

Location: C:\git\redbook-dashboard

Key Directories

DirectoryContents
Scripts/Python pipeline and dashboard
RawData/Source data files
Processed/Output database and CSVs
Documentation/Additional docs


Last updated: March 2026