FLX-ENG-RFC-006 — Week 3 · Defect Catalog, Lifecycle Demos & Baseline Report¶
| Field | Value |
|---|---|
| RFC ID | FLX-ENG-RFC-006 |
| Status | Active — Week 3 (2026-07-13 → 2026-07-19) |
| Author | Arun Singh, Senior Distinguished Engineer / Architect (Consulting) |
| Reviewers | Raja Choudhary (sign-off), Rahul, Tushar, Shrikant |
| Scope | DORA baseline read, target calibration, defect catalog production, mSORT extension, lifecycle demos |
| Parent Epic | GitHub Issue #3 — [EPIC] Week 3 · Defect Catalog, Lifecycle Demos & Baseline Report |
| Milestone | MS#3 — due 2026-07-19 |
| Priority | P0-CRITICAL |
| Related Issues | #35 (baseline read), #36 (calibrate targets), #37 (mSORT extension), #38 (idempotency), #39 (API gaps), #40 (defect catalog) |
TL;DR¶
Week 3 is the highest-value week of the engagement: it converts 7+ days of raw DORA data into a baseline report, identifies the specific code defects driving the metrics, and delivers live demos of QA/QC, deployment, and monitoring tooling. The defect catalog is the primary deliverable that informs the 99.99% availability roadmap.
1. Step-by-Step Tasks¶
Task 1 · US-3.1 — Read CodePulse Dashboard and Document DORA Baseline (GitHub Issue #35)¶
Priority: P1 | Effort: 1.5 hrs | Owner: Arun Singh
Prerequisite: ≥7 days of CodePulse collection complete (US-2.1 #17)
Step-by-step: 1. Open CodePulse dashboard → Overview tab 2. Set date range to cover full collection window (Week 1 CodePulse activation → today) 3. Change Lead Time: - Note P50, P90, P99 values in hours - Note sample size (number of PRs) - Identify any outliers (PRs with >1 week lead time — investigate) 4. Deployment Frequency: - Note average deploys per week - List deployment dates (to identify gaps) 5. Recovery Time: - Note MTTR if CodePulse captured incidents - Cross-reference with manual incident list from Week 1 sync (Task 4 of RFC-002) 6. Change Fail Rate: - Note % value and sample size - Identify which deployments had hotfixes 7. Rework Rate: - Run scripts/dora/rework-rate.sh (from RFC-004 Metric 5) - Note result 8. Fill in docs/dora-baseline/BASELINE-NUMBERS.md template (RFC-004 §3) 9. Screenshot each CodePulse metric view → save to docs/dora-baseline/screenshots/ 10. Exit signal: All 5 metric values documented with sample sizes; screenshots committed
Task 2 · US-3.2 — Calibrate 3-Month DORA Targets with Raja (GitHub Issue #36)¶
Priority: P1 | Effort: 1 hr | Owner: Arun Singh (facilitates) + Raja (approves)
Step-by-step: 1. Prepare a one-page summary of current baseline (from Task 1) + DORA elite thresholds 2. Schedule 45-minute meeting with Raja 3. Meeting agenda:
00:00 – Present current baseline numbers (5 min)
05:00 – Explain DORA performance bands (Elite/High/Medium/Low) (5 min)
10:00 – Propose 3-month targets for each metric (10 min):
- Lead Time: current P50 → target P50 (suggest 50% improvement)
- Deploy Freq: current → target (suggest next band up)
- MTTR: current → target (suggest < 1 hour)
- Change Fail Rate: current → target < 15%
- Rework Rate: current → target < 10%
20:00 – Raja adjusts targets based on team capacity (15 min)
35:00 – Final targets agreed and recorded (10 min)
docs/dora-baseline/BASELINE-NUMBERS.md (target column) 5. Add targets as milestone acceptance criteria in GitHub (link to this issue) 6. Exit signal: All 5 targets agreed, documented, and signed off by Raja Task 3 · US-3.3 — Extend DORA Metrics and CI to mSORT Dashboard (GitHub Issue #37)¶
Priority: P1 | Effort: 2 hrs | Owner: Arun Singh
Step-by-step: 1. Confirm mSORT Dashboard repo access (US-1.1 covered DMS + mSORT) 2. Run the same Week 1 scans on mSORT:
# Build check
cd path/to/msort-dashboard && dotnet build -c Release
# If Python: python -m py_compile **/*.py
# SAST
semgrep --config "p/python" src/
# CVE
trivy fs .
.github/workflows/build.yml in mSORT repo) 6. Document differences between DMS and mSORT in docs/dora-baseline/msort-vs-dms-comparison.md 7. Exit signal: CodePulse collecting from mSORT; mSORT scan report committed Task 4 · US-3.4 — Idempotency and Race-Condition Hunt (GitHub Issue #38)¶
Priority: P0 | Effort: 3 hrs | Owner: Arun Singh
Context: The "ghost-parcel" incident was diagnosed as likely a race condition or idempotency gap during parcel transfer. This task systematically hunts for all such gaps across DMS endpoints.
Step-by-step investigation:
Phase A — Identify all state-modifying endpoints:
# Find all POST, PUT, PATCH endpoints in InfeedController and others
grep -n "HttpPost\|HttpPut\|HttpPatch\|HttpDelete" \
src/distribution-management-server-layered/Core/Controller/*.cs
Phase B — For each state-modifying endpoint, check: 1. Idempotency key: Does the endpoint accept/require an idempotency key header? (Idempotency-Key: <uuid>) 2. Duplicate detection: Is there a unique constraint on the database operation? Check EF migrations for HasAlternateKey() or IsUnique() 3. Transaction scope: Is the entire operation wrapped in a DB transaction? Check for using var transaction = await _context.Database.BeginTransactionAsync() 4. Race condition pattern: Is there any "check-then-act" pattern without locking?
// DANGEROUS: race condition between check and insert
if (!await _context.Parcels.AnyAsync(p => p.TrackingId == id))
await _context.Parcels.AddAsync(new Parcel { TrackingId = id });
// SAFE: upsert with unique constraint
await _context.Database.ExecuteSqlRawAsync(
"INSERT INTO parcels ... ON CONFLICT (tracking_id) DO NOTHING");
Phase C — Document findings:
# Idempotency & Race Condition Audit
Date: YYYY-MM-DD
## Endpoints Reviewed
| Endpoint | Method | Idempotency Key | DB Unique Constraint | Transaction | Race Condition Risk |
|----------|--------|-----------------|---------------------|-------------|---------------------|
| POST /api/infeed | POST | No | No (tracking_id) | Yes | HIGH |
| ... | | | | | |
## P0 Findings
[List specific endpoints with HIGH risk]
## Recommendations
[For each P0 finding: specific code change recommendation]
Exit signal: Audit table committed to docs/developer-guide/idempotency-audit.md; all P0 findings have GitHub issues
Task 5 · US-3.5 — API Validation, Exception Handling, and Auth Gap Inventory (GitHub Issue #39)¶
Priority: P0 | Effort: 2 hrs | Owner: Arun Singh
Step-by-step audit:
A — API Validation gaps:
# Find all controller actions — do they validate input?
grep -rn "\[ApiController\]\|ModelState\|Validate\|FluentValidation" \
src/distribution-management-server-layered/Core/Controller/
[Required], [Range], [StringLength] annotations - [ ] Returns 400 Bad Request on validation failure (not 500) - [ ] Does not accept null for required fields B — Exception handling gaps:
# Find try/catch coverage at controller boundary
grep -rn "try\|catch\|Exception\|ProblemDetails" \
src/distribution-management-server-layered/Core/Controller/
Program.cs - [ ] Each controller action has at minimum a catch for Exception returning 500 + problem details - [ ] No empty catch blocks: catch (Exception) { } (swallows errors silently) - [ ] Exceptions logged with correlation ID C — Auth gaps:
# Find all endpoints — are they protected?
grep -rn "\[Authorize\]\|\[AllowAnonymous\]\|\[Route\]" \
src/distribution-management-server-layered/Core/Controller/
Document findings:
# API/Auth/Exception Gap Inventory
## Validation Gaps
| Endpoint | Missing Validation | Severity |
|----------|-------------------|----------|
## Exception Handling Gaps
| Location | Issue | Severity |
|----------|-------|----------|
## Auth Gaps
| Endpoint | Issue | Severity |
|----------|-------|----------|
Exit signal: Inventory committed; all Critical gaps have P0 GitHub issues opened
Task 6 · US-3.6 — Produce P0/P1/P2 Defect Catalog (GitHub Issue #40)¶
Priority: P1 | Effort: 2 hrs | Owner: Arun Singh
This is the primary engagement deliverable. It aggregates findings from Tasks 4, 5, and the CI gate scans into a structured catalog mapped to the 99.99% availability commitment.
Step-by-step: 1. Collect all findings from: - Week 1 scan report (build errors, SAST, CVE, secret scan) - Task 4 idempotency audit - Task 5 API/auth/exception inventory - Week 2 pair session calibration notes 2. Classify each finding by severity: - P0: Will cause production outage or data corruption. Fix in current sprint. - P1: Degraded availability or security risk. Fix in next sprint. - P2: Code quality / maintainability. Fix in next 3 months. 3. Map each P0/P1 to availability impact: - "This race condition on the infeed endpoint could cause ghost parcels (as seen in prior incident)" - "CVE-2024-XXXX in PackageName could allow RCE if exploited" 4. Create catalog document:
# DMS P0/P1/P2 Defect Catalog
Date: YYYY-MM-DD
Author: Arun Singh
## P0 — Production Risk (Fix immediately)
| ID | Finding | Location | Availability Impact | Fix Recommendation | Effort |
|----|---------|---------|--------------------|--------------------|--------|
| P0-001 | Race condition on infeed endpoint | InfeedController.cs:L47 | Ghost parcels | Add idempotency key + DB unique constraint | M |
## P1 — Security/Reliability Risk
[table]
## P2 — Quality Debt
[table]
## Availability Impact Summary
| Defect Class | Count | Risk to 99.99% Availability |
|--------------|-------|------------------------------|
| Race conditions | N | High |
| Missing validation | N | Medium |
| Unhandled exceptions | N | Medium |
| CVE vulnerabilities | N | High |
docs/developer-guide/defect-catalog-YYYY-MM-DD.md; shared with Raja for review 2. Dependencies¶
| Dependency | Required for |
|---|---|
| CodePulse data (≥7 days) | Task 1 |
| Week 1 scan results (#13, #16) | Task 6 |
| Idempotency audit (Task 4) | Task 6 |
| API gap inventory (Task 5) | Task 6 |
| Raja availability (approx 45 min) | Task 2 |
3. Success Criteria¶
- DORA baseline numbers (all 5 metrics) documented with sample sizes
- 3-month targets agreed with Raja and committed
- mSORT Dashboard onboarded to CodePulse and scanned
- Idempotency audit table committed
- API/auth/exception inventory committed
- Defect catalog committed with P0/P1/P2 classification and availability impact mapping
- All P0 defects have GitHub issues opened (or existing issues confirmed)