Project1287 Data Patterns

Practical reference for anyone analyzing PSI project data from Project1287List.xml or the underlying PROJECT.1287 AFTEC table. Catalogs the non-obvious patterns and data-hygiene quirks that have tripped up multiple analyses — sister jobs, post-ship labor, missing ship dates, internal projects, ghost records, and how to handle each.


Authoritative source	`\\ad.ptihome.com\dfs\Data\APPS\Settings\PSI.FileCache\Project1287List.xml` (live, updated by PSI.All apps)
Underlying ERP table	`PROJECT.1287` in AFTEC (UniData on MRP-PROD)
Pipeline-extracted form	`projects` table in `\\ad.ptihome.com\DFS\Schedule\SS123\LEADTIME\psi_analytics.db` (nightly, shipped-projects-only)
Companion	Quarterly Business Metrics applies all these patterns

Why this page exists

A naïve read of Project1287List.xml looks straightforward — order date, ship date, customer, value. But several non-obvious patterns will give wrong answers if you don’t account for them:

A project’s labor isn’t always on the project’s own job number (sister jobs)
A project’s ShipDate isn’t reliably populated, especially on internal jobs
A project’s labor doesn’t end at ship (install, training)
A backlog calculation will be ~2× too high without filtering data-hygiene cases
“On-time” looks different depending on which date you anchor to

This page documents each pattern, why it exists, and how to handle it in analytics code.

Core principles to keep in mind

These three unstated assumptions drive how to read PSI’s project data correctly. Get them wrong and your numbers will be off by ~2× or worse.

1. PSI doesn’t abandon projects

Every booked machine eventually ships. There is no real “abandoned” or “permanently stuck” category in PSI’s business model. So when you see an Active-status project with OrderDate from 3+ years ago, no ShipDate, no WarrentyStartDate, and zero recent labor — that’s almost certainly a data-hygiene case (shipped but never flagged, or closed without status update), not a real machine sitting somewhere unbuilt.

How to apply: Trust labor signals over date-field completeness. If labor has stopped for an extended period on an old project, infer completion even though no formal ship event was recorded.

2. ShipDate is a manual data-entry field, not a guaranteed signal

It gets populated when someone in operations remembers to type it in. For commercial projects this is usually consistent, but ~280 historical records have it blank when the machine demonstrably shipped. For PSI internal projects, ShipDate is essentially never populated — there’s no customer delivery event to trigger the data entry.

How to apply: Use effective_ship_date = ShipDate OR WarrentyStartDate as your primary ship signal. For PSI internal jobs, use labor cessation as the completion signal instead.

3. Labor data is the ground truth, AFTEC fields are interpretations of it

When something in Project1287.xml looks wrong, check the underlying labor_detail table in psi_analytics.db. The 1.38M timesheet rows have a far cleaner picture of “is this project actually being worked, and when did work start/stop” than the metadata fields do.

How to apply: For “is this active right now?” questions, query labor_detail for activity in the last 60-90 days. For “when did substantial work begin?” questions, find when cumulative labor crossed max(40 hrs absolute, 2% of total) — not just 2% alone (see Common Pitfalls below).

Pattern 1: Sister jobs (engineering shared across job numbers)

PSI sometimes books multiple machines from the same customer as separate jobs but pools the engineering on one of them. The “sister” jobs have legitimate OrderValue and ShipDate but minimal labor (especially in 110E / 110M — controls and mech engineering) because the design work was paid for under a different job.

Detection pattern: same customer, ordered within ~60 days, one project has <20% of the other’s total labor.

Confirmed example — jobs 2260 vs 2266, both shipped 2022:

2260 (sister): 660 total hours, only 10 hrs 110M + 54 hrs 110E — sparse labor
2266 (primary): 1,814 total hours, 535 hrs 110M + 250 hrs 110E — full design effort
Same machine class, similar OrderValue, similar lead time

Why it matters:

Per-job labor cost / hours look misleadingly low on sister jobs — they aren’t abandoned or under-built
Cumulative-labor classifiers (e.g., “has substantial work begun?”) may mis-classify sisters as not-started when they’re actively being built on the primary’s engineering
Fleet-wide labor totals are accurate (the engineering was paid for, just on the primary)

Authoritative link: Project1287List.xml has a REF.PROJ.NO lineage field. The MCP tool get_project_lineage reads it. To handle sister jobs correctly, follow lineage and treat the primary’s labor curve as authoritative for all members of the cluster.

Subtle gotcha — “substantial work begun” needs an absolute floor

A naïve “labor has crossed 2% of total” classifier breaks for fresh-but-barely-touched projects. If a project has only logged 1 hour in its entire life so far (e.g., proposal review), 2% = 0.02 hours and is trivially crossed by the first labor entry — the project is misclassified as “actively being built” when really nothing has started.

Use max(40 hours absolute, 2% of total) instead — the 40-hr floor (≈ 1 person-week of real engineering) catches the pathological case while 2% still drives classification on real projects (where 40 hrs is the 2% point for a typical 2000-hr ETO build anyway).

Pattern 2: Post-ship labor (install, training, commissioning)

PSI’s project lifecycle does not end at ShipDate. Significant labor continues post-ship for on-site install, commissioning, customer training, and punch-list resolution.

Quantified (161 projects shipped 2022-2025):

Metric	Value
Aggregate post-ship hours	27,193 hrs (~6,800/yr, ≈ 3-4 FTEs in the field continuously)
Post-ship % of total project labor	7.5% aggregate, 6.3% median per project, p90 = 16%
Median tail (ship → last labor entry)	101 days (p75 = 153 days, p90 = 239 days)

Dominant post-ship departments:

120 (MVI): 76% of post-ship hours — field install, commissioning, customer training
122 (Test/Startup, MVI subdivision): 6%
106 / 108 (mech / elec assembly): 5–6% combined — field rework
110E / 110M (controls / mech eng): 4% / 2% — PLC tweaks, design fixes during install

Why it matters:

“Last labor date” is NOT a reliable “machine is ready to ship” marker. Labor keeps flowing post-ship — service hours, MVI follow-up, warranty work. To find when build was complete, use only shop+eng depts (110/102/104/106/108/111), AND filter to dates ≤ ShipDate.
Customer-facing lead time ≠ ship lead time. Order→ship is currently ~330d median; order→install-complete is ~430d. Customers experience a 14-month engagement, not 11 months.
Field service / Dept 120 capacity is a separate constraint from build-bay capacity. Plan for it separately.

See also: Quarterly Business Metrics explicitly scopes On-Time % to ShipDate ≤ PlannedShipDate because post-ship duration is customer-paced. Build-org accountability ends at handoff.

Pattern 3: PSI internal projects (no ship event ever)

PSI builds machines for its own internal divisions — Process Services, Grand Rapids, occasional R&D capability builds. These appear in Project1287List.xml with AccountName starting “Progressive Surface” (e.g., “Progressive Surface, Process Services” or “Progressive Surface, Grand Rapids”).

Key difference vs commercial projects:

No customer delivery event → ShipDate never gets populated
No warranty obligation → WarrentyStartDate never populated either
No commercial billing → OrderValue and OrderTotal are typically $0

This means standard “in backlog if no ship date” logic catches internal projects as still-open forever, even when the machine is done and in service.

Detection — labor cessation = completion: For an internal project, if labor has stopped for more than a few months AND the project is old enough that a normal build would have finished, it’s effectively complete. Use tighter age + dormancy thresholds:

Project type	”Effectively closed” threshold
Commercial	age > 24 months AND no labor in 9+ months
PSI Internal	age > 12 months AND no labor in 4+ months

Confirmed example (verified by Adam Devereaux 2026-05-12): Job 2358 — “Waterjet abrasive milling machine,” Progressive Surface Grand Rapids, $0 OrderValue, 4,300 hours of labor over 18 months. Labor ramped Nov 2024 → tailed off Oct 2025. User confirmed the machine is “done and installed” — but ShipDate is blank in AFTEC because there’s no commercial ship event to mark.

Why it matters: Internal projects consume engineering and shop capacity while they’re being worked, and they belong in capacity-planning views during that period. They do not belong in commercial backlog forever after build completes.

Pattern 4: ShipDate hygiene — use WarrentyStartDate as fallback

Even on commercial machines, ShipDate is not reliably populated. ~280 historical records in current AFTEC data have blank ShipDate but did in fact ship.

Recovery strategy — use WarrentyStartDate as fallback: Warranty doesn’t start until a machine ships, so WarrentyStartDate being populated is definitive proof of shipment. This recovers about 31 historical records that the date-only logic missed.

effective_ship_date = ShipDate or WarrentyStartDate

Records still missing both (~250 across all history):

~110 are PSI internal (Pattern 3 above)
~135 are commercial but old, mostly with ProjectStatus = "Out of Warranty" (shipped pre-1997, before warranty-start tracking was introduced as a field)
~10 are Active-status with no recent labor — likely shipped but never flagged, see Pattern 5

Pattern 5: Stale ghost records (PSI doesn’t actually abandon projects)

The principle: PSI does not abandon projects — every booked machine eventually ships. Therefore, any “Active” project that has been sitting open for years with no recent labor activity is almost certainly:

Shipped, but ShipDate was never typed in
Closed informally (especially install-service piggyback tickets — see Pattern 6)
On an extended hold so long that the next labor entry will be the post-ship install months from now

These are ghost records, not real backlog. Filter them out with age + dormancy criteria (see Pattern 3 for thresholds).

Current ghost-record drops (as of 2026-05-12, customer-type only):

Job	Age	Dormant	Customer / Description
95765	4.7y	3.6y	PSI Process Services — Plasma Heat Exchanger
95809	3.4y	3.4y	Pratt & Whitney — Plasma Power Supply for 2177
95812 / 95813	3.4y	3.0y	PSI Process Services — SCARA retrofits
95855	2.6y	1.7y	Medtronic Costa Rica — Shot Peen Move
95857 / 95859 / 95860	2.5y	1.5-2.3y	Install-service tickets (see Pattern 6)
95869	2.2y	1.6y	PSI Process Services — Argo Line 4 mod
2342 / 2348	2.6-2.8y	1.5-1.7y	Linamar Driveline — $1.4M combined, 1,600 hrs — judgement call, may be on long customer hold

The Linamar pair is the only case where the filter makes a meaningful judgement call. If a customer hold is suspected, override with a manual flag rather than relying on the filter.

Pattern 6: Install-service piggyback jobs

A subset of 5-digit retrofit jobs are install-service tickets booked as separate jobs, where the actual machine work happens on a different (parent) job. Recognizable by the Description field.

Examples seen:

95857 — “Installation of 2228/Line 21”
95858 — “Installation of 2229/LINE 5”
95859 — “Install for #95781: 1518 PRIMS CNC Retrofit”
95860 — “Install for Job #95802 Upgrades for 1256 WJ”

These jobs accumulate labor only when the field-service team deploys to the customer site. They can look abandoned for long stretches while waiting on the parent build to finish.

Detection: Description typically starts with “Install for”, “Installation of”, or “Install” + parent job number.

How to handle: They’re real backlog (a real commitment to deploy field service), but their labor curves don’t follow the normal build pattern. Don’t apply standard soup-line / active-build classifiers to them; treat them as a separate cohort if doing build-stage analysis.

Pattern 7: ProjectStatus values and what they mean

ProjectStatus is a controlled vocabulary with five values:

Value	Meaning	What it implies
Active	Project is open in AFTEC	Currently being worked OR data-hygiene case (see Pattern 5)
Shipped	Recently shipped, still being followed	Post-ship install / training may be ongoing
In Warranty	Shipped, within warranty period	Field service may still be active
Out of Warranty	Shipped, warranty period expired	Historical record; do not count in current backlog
Cancelled	Project was cancelled	Rare; exclude from backlog (PSI rarely cancels — these are real exceptions)

Filter rule for current backlog: ProjectStatus == 'Active' (along with no effective ship date and not a ghost record).

Pattern 8: OrderValue vs OrderTotal

Two dollar-value fields with different semantics:

Field	Meaning	When to use
`OrderValue`	Original signed value at PO receipt	Booked sales (sales pipeline)
`OrderTotal`	Final billed value, including change orders	Shipped revenue (finance recognition)

About 17% of projects have OrderTotal > OrderValue — change orders were added during the build. Use the right field for the right metric:

Booked sales reflects what sales sold
Shipped revenue reflects what finance booked

Pattern 9: On-Time measured at handoff, not install

PSI’s On-Time KPI is ShipDate ≤ PlannedShipDate, deliberately not extended to install or acceptance. Once a machine ships, downstream activity is customer-controlled:

Site readiness
Install crew scheduling
Travel coordination
Customer training availability

The build org can only be fairly held accountable for the handoff moment. Don’t conflate PSI execution with customer-side variation by extending the metric.

See: Quarterly Business Metrics — Lead Time & On-Time for the full reasoning.

Note on customer-requested holds: When a customer asks PSI to hold a finished machine and ship later, the current on-time number wrongly counts that as PSI being late. Detection path: look at the gap between MVI/runoff completion ops (4800-TEST/STARTUP, 4860-CUSTOMER TRAINING) and the actual ShipDate. A long gap with no further build labor indicates the hold was customer-side. This filter isn’t yet implemented but is the known refinement path.

Filter chain for accurate current backlog

Putting it all together, the validated current-backlog query looks like:

project ∈ Backlog[as_of] iff
    OrderDate ≤ as_of
    AND (
        # Has effective ship date in the future:
        (EffectiveShipDate > as_of)
        OR
        # No ship signal at all — is it real backlog or a ghost?
        (
            NOT EffectiveShipDate
            AND ProjectStatus == 'Active'
            AND NOT (
                age > 24 months  # commercial
                AND no labor in last 9 months
            )
            AND NOT (
                is_internal  # AccountName starts "Progressive Surface"
                AND age > 12 months
                AND no labor in last 4 months
            )
        )
    )
 
where EffectiveShipDate = ShipDate OR WarrentyStartDate
      labor = SUM(labor) from labor_detail
              WHERE dept IN ('110M','110E','102','104','106','108','111')

This filter chain reduces a raw 184-project naïve count to 97 projects of validated commercial + internal-active backlog (as of 2026-05-12), which matches the lead-time analysis “81 active jobs” anchor + recent Q4 booking surge.

Common implementation pitfalls

Real traps that have caused real wrong numbers — all of which were discovered the hard way during the May 2026 quarterly-metrics work session.

Pitfall: SQL `MAX(date)` against `MM/DD/YY` strings

The date column in labor_detail is stored as MM/DD/YY strings. SQLite/SQL Server text comparison is lexicographic — so "12/31/22" > "01/15/26" because '1' > '0' at the first character. This silently returns the wrong “max date” for nearly every project, biasing toward whichever Dec/Nov entry happens to exist regardless of year.

Symptom: A project with labor from 2025-10 shows MAX(date) = '12/29/22' because that entry sorts later as a string.

Fix: Pull all dates and parse them with Python (datetime.strptime(s, '%m/%d/%y')), then find max in code. Don’t rely on SQL date aggregation against this column.

Pitfall: percentile-only thresholds against current-snapshot totals

A “labor crossed N% of total” classifier breaks for in-flight projects whose total is still small. If a project has 5 hrs total so far, 2% = 0.1 hrs — trivially crossed by the first labor entry. Result: barely-touched projects (proposal review, scope clarification) get classified as “actively being built.”

Symptom: A list of “active” projects in which 28% have fewer than 50 total hours — including some with 0.2 hours.

Fix: Use max(N hours absolute, X% of total) as the threshold. The absolute floor catches the pathological small-total case; the percentile catches the historical case where projects have settled totals. For PSI’s typical 2000-hr ETO build, 40 hrs ≈ 2% anyway, so the floor doesn’t change classification of normal projects.

Pitfall: assuming `ShipDate` blank = unshipped

About 280 historical records have blank ShipDate but were definitely shipped. ~31 of those have WarrentyStartDate populated (definitive ship signal — warranty doesn’t start otherwise). Another ~135 are old enough that they shipped pre-1997 before warranty-start tracking was a field.

Symptom: Backlog count is ~2× the realistic value (184 instead of ~100).

Fix: EffectiveShipDate = ShipDate OR WarrentyStartDate, then layer on a ProjectStatus == 'Active' filter to drop the pre-warranty-tracking-era records.

Pitfall: counting department headcount as `COUNT(DISTINCT emp)`

The naïve query “how many engineers in dept X this year?” run as SELECT COUNT(DISTINCT emp) FROM labor_detail WHERE dept = 'X' AND year(date) = Y overcounts by ~2× because it picks up:

Transfer-ins / transfer-outs who logged a handful of hours
Admin staff whose hours flow through department codes
Technicians who logged single training or meeting entries
Anyone who attended a single review that got booked to the dept

Symptom: A Controls Engineering deep dive that said “27-36 distinct CE engineers active per year.” The real working team was ~12 engineers logging ≥1,000 hrs/year.

Fix: Filter by substantive hours. For PSI engineering departments:

≥1,000 hrs/year = real full-time engineering workforce
≥500 hrs/year = substantive contributor (full-time + half-timers)
Any hours = noise, ~2× the real number

For Controls Engineering (110E + 111) in 2025: 16 with any hours, 15 with ≥500, 12 with ≥1,000. The 12 is what you should plan around.

Pitfall: counting “Active” status as authoritative

ProjectStatus = 'Active' doesn’t necessarily mean “currently being worked.” It means “this record hasn’t been moved to a closed status yet.” Some Active records are dormant ghost data (see Pattern 5).

Symptom: Even after filtering by Active status, backlog still includes projects with no labor for 1-3+ years.

Fix: Add an age + dormancy filter on top of the Active check — 24 mo / 9 mo for commercial, 12 mo / 4 mo for internal.

Authoritative analytics that apply these patterns

Quarterly Business Metrics — Shipped / booked / backlog / soup-line by quarter
Lead Time Analysis (Feb 2026) — Stage-by-stage breakdown of order-to-ship; informs the on-time scoping decision
PSI Data Brain — Complete catalog of all PSI data sources, including psi_analytics.db schema

Closing data-coverage gaps with `redbook_coq.db` + `labor_detail`

The nightly comprehensive_dataset.csv has several known coverage gaps that limit historical analysis. The May 2026 root-cause work built a per-project enrichment dataset (psi-eto-analysis/data/project_enrichment.csv) that closes most of them by joining alternative sources:

Field	`comprehensive_dataset.csv` coverage	Enriched source	Result
BOM parts (`JP_Part_Count`)	51% of projects, reliable 2014+	`redbook_coq.db.project_boms` (4.3M rows / 1,712 projects)	Coverage ~95%, back to 2000
Total project hours (`Tot_Actual`)	Reliable 2014+ only	Sum `labor_detail` by project	Back to 1995 (full labor history)
ECN count (`ECN_Count`)	Basic count	`redbook_coq.db.ecn_details` with `ChangeCode` breakdown	Same volume + Customer-vs-Engineering split
NCN count	Not present	`redbook_coq.db.ncn_details` (17,789 NCNs)	NEW quality dimension back to 1991
Customer / Industry / Machine type	Not aggregated	Live `Project1287List.xml`	NEW strategic-mix analysis

How to apply: Any time you’re tempted to limit an analysis to “2014+ where comprehensive_dataset has BOM/Hours data,” check the enrichment first — it almost certainly extends further back. The enrichment is rebuildable from scripts/build_enrichment.py in the psi-eto-analysis project.

Pipeline integration opportunity: The enrichment is currently project-local. To make it the team default, the nightly psi-data-pipeline could absorb the same joins and write the enriched fields back into comprehensive_dataset.csv. Roughly half a day of pipeline work — would benefit every consumer of the flat dataset.

Cross-validated finding: PLC complexity is a real driver, plus system overload

May 2026 root-cause analysis (C:\git\psi_root_cause_analysis.html) tested 6 hypotheses for why lead times grew. Conclusion is a two-factor model:

PLC complexity has nearly doubled at the p75/max (35 → 55 / 36 → 73 between 2010 and 2025), while BOM part counts stayed flat (~400 median for 12 years). The LDS methodology works for mechanical reuse but there is no equivalent reuse framework for controls software, HMI panels, or recipe modules. The strongest individual-project effect: high-PLC projects have r=+0.40 with LeadTimeActual and are 2.2-2.5× more likely to end in the Pain or Cholesterol “trouble” cohorts (cross-validated against the independent psi-pain-analysis dataset).
System overload breached capacity in 2022. Engineering capacity per engineer is stable (hours/person rock-steady at 36-38/week for years), but concurrent projects per engineer jumped 23% (2.2 → 2.7). The multitasking penalty cascades across all departments. Pre-2022, within-year r(PLC, Pain) was +0.4 to +0.6 — complexity differentiated outcomes. Post-2022, within-year r collapses to near zero — system overload affects all projects regardless of complexity, drowning out the signal.

Implication for fixes: WIP limits + re-baselined quoting address the system-overload factor (fast, zero-cost). PLC framework + HIL test rig address the individual-project complexity factor (medium-term investment). Hiring more controls engineers is the last lever, not the first — adding people to an overloaded queue just creates more multitasking.

LDS Methodology — How PSI structures the project lifecycle
AFTEC subroutines — How to query PROJECT.1287 directly
PSI Machine MCP Server — Tools that expose this data to AI agents (get_project_info, get_project_lineage, etc.)

Created: 2026-05-12 Source: Analytics work session investigating quarterly business metrics (Adam Devereaux + Claude Code) Validation: All patterns confirmed against live data in Project1287List.xml and labor_detail in psi_analytics.db

PSI Knowledge Base

Explorer

project1287-data-patterns

Project1287 Data Patterns

Why this page exists

Core principles to keep in mind

1. PSI doesn’t abandon projects

2. ShipDate is a manual data-entry field, not a guaranteed signal

3. Labor data is the ground truth, AFTEC fields are interpretations of it

Pattern 1: Sister jobs (engineering shared across job numbers)

Subtle gotcha — “substantial work begun” needs an absolute floor

Pattern 2: Post-ship labor (install, training, commissioning)

Pattern 3: PSI internal projects (no ship event ever)

Pattern 4: ShipDate hygiene — use WarrentyStartDate as fallback

Pattern 5: Stale ghost records (PSI doesn’t actually abandon projects)

Pattern 6: Install-service piggyback jobs

Pattern 7: ProjectStatus values and what they mean

Pattern 8: OrderValue vs OrderTotal

Pattern 9: On-Time measured at handoff, not install

Filter chain for accurate current backlog

Common implementation pitfalls

Pitfall: SQL `MAX(date)` against `MM/DD/YY` strings

Pitfall: percentile-only thresholds against current-snapshot totals

Pitfall: assuming `ShipDate` blank = unshipped

Pitfall: counting department headcount as `COUNT(DISTINCT emp)`

Pitfall: counting “Active” status as authoritative

Authoritative analytics that apply these patterns

Closing data-coverage gaps with `redbook_coq.db` + `labor_detail`

Cross-validated finding: PLC complexity is a real driver, plus system overload

Graph View

Table of Contents

PSI Knowledge Base

Explorer

project1287-data-patterns

Project1287 Data Patterns

Why this page exists

Core principles to keep in mind

1. PSI doesn’t abandon projects

2. ShipDate is a manual data-entry field, not a guaranteed signal

3. Labor data is the ground truth, AFTEC fields are interpretations of it

Pattern 1: Sister jobs (engineering shared across job numbers)

Subtle gotcha — “substantial work begun” needs an absolute floor

Pattern 2: Post-ship labor (install, training, commissioning)

Pattern 3: PSI internal projects (no ship event ever)

Pattern 4: ShipDate hygiene — use WarrentyStartDate as fallback

Pattern 5: Stale ghost records (PSI doesn’t actually abandon projects)

Pattern 6: Install-service piggyback jobs

Pattern 7: ProjectStatus values and what they mean

Pattern 8: OrderValue vs OrderTotal

Pattern 9: On-Time measured at handoff, not install

Filter chain for accurate current backlog

Common implementation pitfalls

Pitfall: SQL MAX(date) against MM/DD/YY strings

Pitfall: percentile-only thresholds against current-snapshot totals

Pitfall: assuming ShipDate blank = unshipped

Pitfall: counting department headcount as COUNT(DISTINCT emp)

Pitfall: counting “Active” status as authoritative

Authoritative analytics that apply these patterns

Closing data-coverage gaps with redbook_coq.db + labor_detail

Cross-validated finding: PLC complexity is a real driver, plus system overload

Related concepts

Graph View

Table of Contents

Pitfall: SQL `MAX(date)` against `MM/DD/YY` strings

Pitfall: assuming `ShipDate` blank = unshipped

Pitfall: counting department headcount as `COUNT(DISTINCT emp)`

Closing data-coverage gaps with `redbook_coq.db` + `labor_detail`