FLX-ENG-RFC-006 — Week 3 · Defect Catalog, Lifecycle Demos & Baseline Report¶

Field	Value
RFC ID	FLX-ENG-RFC-006
Status	Active — Week 3 (2026-07-13 → 2026-07-19)
Author	Arun Singh, Senior Distinguished Engineer / Architect (Consulting)
Reviewers	Raja Choudhary (sign-off), Rahul, Tushar, Shrikant
Scope	DORA baseline read, target calibration, defect catalog production, mSORT extension, lifecycle demos
Parent Epic	GitHub Issue #3 — [EPIC] Week 3 · Defect Catalog, Lifecycle Demos & Baseline Report
Milestone	MS#3 — due 2026-07-19
Priority	P0-CRITICAL
Related Issues	#35 (baseline read), #36 (calibrate targets), #37 (mSORT extension), #38 (idempotency), #39 (API gaps), #40 (defect catalog)

TL;DR¶

Week 3 is the highest-value week of the engagement: it converts 7+ days of raw DORA data into a baseline report, identifies the specific code defects driving the metrics, and delivers live demos of QA/QC, deployment, and monitoring tooling. The defect catalog is the primary deliverable that informs the 99.99% availability roadmap.

1. Step-by-Step Tasks¶

Task 1 · US-3.1 — Read CodePulse Dashboard and Document DORA Baseline (GitHub Issue #35)¶

Priority: P1 | Effort: 1.5 hrs | Owner: Arun Singh

Prerequisite: ≥7 days of CodePulse collection complete (US-2.1 #17)

Step-by-step: 1. Open CodePulse dashboard → Overview tab 2. Set date range to cover full collection window (Week 1 CodePulse activation → today) 3. Change Lead Time: - Note P50, P90, P99 values in hours - Note sample size (number of PRs) - Identify any outliers (PRs with >1 week lead time — investigate) 4. Deployment Frequency: - Note average deploys per week - List deployment dates (to identify gaps) 5. Recovery Time: - Note MTTR if CodePulse captured incidents - Cross-reference with manual incident list from Week 1 sync (Task 4 of RFC-002) 6. Change Fail Rate: - Note % value and sample size - Identify which deployments had hotfixes 7. Rework Rate: - Run scripts/dora/rework-rate.sh (from RFC-004 Metric 5) - Note result 8. Fill in docs/dora-baseline/BASELINE-NUMBERS.md template (RFC-004 §3) 9. Screenshot each CodePulse metric view → save to docs/dora-baseline/screenshots/ 10. Exit signal: All 5 metric values documented with sample sizes; screenshots committed

Task 2 · US-3.2 — Calibrate 3-Month DORA Targets with Raja (GitHub Issue #36)¶

Priority: P1 | Effort: 1 hr | Owner: Arun Singh (facilitates) + Raja (approves)

Step-by-step: 1. Prepare a one-page summary of current baseline (from Task 1) + DORA elite thresholds 2. Schedule 45-minute meeting with Raja 3. Meeting agenda:

00:00 – Present current baseline numbers (5 min)
05:00 – Explain DORA performance bands (Elite/High/Medium/Low) (5 min)
10:00 – Propose 3-month targets for each metric (10 min):
          - Lead Time: current P50 → target P50 (suggest 50% improvement)
          - Deploy Freq: current → target (suggest next band up)
          - MTTR: current → target (suggest < 1 hour)
          - Change Fail Rate: current → target < 15%
          - Rework Rate: current → target < 10%
20:00 – Raja adjusts targets based on team capacity (15 min)
35:00 – Final targets agreed and recorded (10 min)

4. Commit agreed targets to docs/dora-baseline/BASELINE-NUMBERS.md (target column) 5. Add targets as milestone acceptance criteria in GitHub (link to this issue) 6. Exit signal: All 5 targets agreed, documented, and signed off by Raja

Task 3 · US-3.3 — Extend DORA Metrics and CI to mSORT Dashboard (GitHub Issue #37)¶

Priority: P1 | Effort: 2 hrs | Owner: Arun Singh

Step-by-step: 1. Confirm mSORT Dashboard repo access (US-1.1 covered DMS + mSORT) 2. Run the same Week 1 scans on mSORT:

# Build check
cd path/to/msort-dashboard && dotnet build -c Release

# If Python: python -m py_compile **/*.py

# SAST
semgrep --config "p/python" src/

# CVE
trivy fs .

3. Compare mSORT baseline vs. DMS baseline: - Which repo has higher change fail rate? - Which has more SAST findings? - Does mSORT have CI at all? 4. Extend CodePulse configuration to include mSORT (Settings → Add Repository → mSORT Dashboard) 5. Configure the same 10 CI gates for mSORT (create .github/workflows/build.yml in mSORT repo) 6. Document differences between DMS and mSORT in docs/dora-baseline/msort-vs-dms-comparison.md 7. Exit signal: CodePulse collecting from mSORT; mSORT scan report committed

Task 4 · US-3.4 — Idempotency and Race-Condition Hunt (GitHub Issue #38)¶

Priority: P0 | Effort: 3 hrs | Owner: Arun Singh

Context: The "ghost-parcel" incident was diagnosed as likely a race condition or idempotency gap during parcel transfer. This task systematically hunts for all such gaps across DMS endpoints.

Step-by-step investigation:

Phase A — Identify all state-modifying endpoints:

# Find all POST, PUT, PATCH endpoints in InfeedController and others
grep -n "HttpPost\|HttpPut\|HttpPatch\|HttpDelete" \
  src/distribution-management-server-layered/Core/Controller/*.cs

Phase B — For each state-modifying endpoint, check: 1. Idempotency key: Does the endpoint accept/require an idempotency key header? (Idempotency-Key: <uuid>) 2. Duplicate detection: Is there a unique constraint on the database operation? Check EF migrations for HasAlternateKey() or IsUnique() 3. Transaction scope: Is the entire operation wrapped in a DB transaction? Check for using var transaction = await _context.Database.BeginTransactionAsync() 4. Race condition pattern: Is there any "check-then-act" pattern without locking?

// DANGEROUS: race condition between check and insert
if (!await _context.Parcels.AnyAsync(p => p.TrackingId == id))
    await _context.Parcels.AddAsync(new Parcel { TrackingId = id });

// SAFE: upsert with unique constraint
await _context.Database.ExecuteSqlRawAsync(
    "INSERT INTO parcels ... ON CONFLICT (tracking_id) DO NOTHING");

5. Scan-zone-clear gap: For scanner operations, is there a race between a scanner clearing a zone and DMS reading that zone?

Phase C — Document findings:

# Idempotency & Race Condition Audit
Date: YYYY-MM-DD

## Endpoints Reviewed
| Endpoint | Method | Idempotency Key | DB Unique Constraint | Transaction | Race Condition Risk |
|----------|--------|-----------------|---------------------|-------------|---------------------|
| POST /api/infeed | POST | No | No (tracking_id) | Yes | HIGH |
| ... | | | | | |

## P0 Findings
[List specific endpoints with HIGH risk]

## Recommendations
[For each P0 finding: specific code change recommendation]

Exit signal: Audit table committed to docs/developer-guide/idempotency-audit.md; all P0 findings have GitHub issues

Task 5 · US-3.5 — API Validation, Exception Handling, and Auth Gap Inventory (GitHub Issue #39)¶

Priority: P0 | Effort: 2 hrs | Owner: Arun Singh

Step-by-step audit:

A — API Validation gaps:

# Find all controller actions — do they validate input?
grep -rn "\[ApiController\]\|ModelState\|Validate\|FluentValidation" \
  src/distribution-management-server-layered/Core/Controller/

Checklist for each endpoint: - [ ] Input model has [Required], [Range], [StringLength] annotations - [ ] Returns 400 Bad Request on validation failure (not 500) - [ ] Does not accept null for required fields

B — Exception handling gaps:

# Find try/catch coverage at controller boundary
grep -rn "try\|catch\|Exception\|ProblemDetails" \
  src/distribution-management-server-layered/Core/Controller/

Checklist: - [ ] Global exception middleware registered in Program.cs - [ ] Each controller action has at minimum a catch for Exception returning 500 + problem details - [ ] No empty catch blocks: catch (Exception) { } (swallows errors silently) - [ ] Exceptions logged with correlation ID

C — Auth gaps:

# Find all endpoints — are they protected?
grep -rn "\[Authorize\]\|\[AllowAnonymous\]\|\[Route\]" \
  src/distribution-management-server-layered/Core/Controller/

Checklist: - [ ] All endpoints require authentication except explicitly public ones - [ ] JWT validation configured correctly (issuer, audience, lifetime) - [ ] No hardcoded API keys or bypass tokens

Document findings:

# API/Auth/Exception Gap Inventory

## Validation Gaps
| Endpoint | Missing Validation | Severity |
|----------|-------------------|----------|

## Exception Handling Gaps
| Location | Issue | Severity |
|----------|-------|----------|

## Auth Gaps
| Endpoint | Issue | Severity |
|----------|-------|----------|

Exit signal: Inventory committed; all Critical gaps have P0 GitHub issues opened

Task 6 · US-3.6 — Produce P0/P1/P2 Defect Catalog (GitHub Issue #40)¶

Priority: P1 | Effort: 2 hrs | Owner: Arun Singh

This is the primary engagement deliverable. It aggregates findings from Tasks 4, 5, and the CI gate scans into a structured catalog mapped to the 99.99% availability commitment.

Step-by-step: 1. Collect all findings from: - Week 1 scan report (build errors, SAST, CVE, secret scan) - Task 4 idempotency audit - Task 5 API/auth/exception inventory - Week 2 pair session calibration notes 2. Classify each finding by severity: - P0: Will cause production outage or data corruption. Fix in current sprint. - P1: Degraded availability or security risk. Fix in next sprint. - P2: Code quality / maintainability. Fix in next 3 months. 3. Map each P0/P1 to availability impact: - "This race condition on the infeed endpoint could cause ghost parcels (as seen in prior incident)" - "CVE-2024-XXXX in PackageName could allow RCE if exploited" 4. Create catalog document:

# DMS P0/P1/P2 Defect Catalog
Date: YYYY-MM-DD
Author: Arun Singh

## P0 — Production Risk (Fix immediately)
| ID | Finding | Location | Availability Impact | Fix Recommendation | Effort |
|----|---------|---------|--------------------|--------------------|--------|
| P0-001 | Race condition on infeed endpoint | InfeedController.cs:L47 | Ghost parcels | Add idempotency key + DB unique constraint | M |

## P1 — Security/Reliability Risk
[table]

## P2 — Quality Debt
[table]

## Availability Impact Summary
| Defect Class | Count | Risk to 99.99% Availability |
|--------------|-------|------------------------------|
| Race conditions | N | High |
| Missing validation | N | Medium |
| Unhandled exceptions | N | Medium |
| CVE vulnerabilities | N | High |

5. Open GitHub issues for each P0 defect (or confirm existing issue covers it) 6. Exit signal: Defect catalog committed to docs/developer-guide/defect-catalog-YYYY-MM-DD.md; shared with Raja for review

2. Dependencies¶

Dependency	Required for
CodePulse data (≥7 days)	Task 1
Week 1 scan results (#13, #16)	Task 6
Idempotency audit (Task 4)	Task 6
API gap inventory (Task 5)	Task 6
Raja availability (approx 45 min)	Task 2

3. Success Criteria¶

DORA baseline numbers (all 5 metrics) documented with sample sizes
3-month targets agreed with Raja and committed
mSORT Dashboard onboarded to CodePulse and scanned
Idempotency audit table committed
API/auth/exception inventory committed
Defect catalog committed with P0/P1/P2 classification and availability impact mapping
All P0 defects have GitHub issues opened (or existing issues confirmed)