FLX-ENG-RFC-004 — DORA Metrics Framework · 5 Metrics × Baseline × Threshold × Cadence¶

Field	Value
RFC ID	FLX-ENG-RFC-004
Status	Active — Weeks 2–3
Author	Arun Singh, Senior Distinguished Engineer / Architect (Consulting)
Reviewers	Raja Choudhary (sign-off), Rahul (Eng Lead)
Scope	Defines all 5 DORA metrics: collection method, baseline, 3-month target, reporting cadence
Parent Epic	GitHub Issue #4 — [EPIC] DORA Metrics Framework
Priority	P0-CRITICAL
Related Issues	#30 (Lead Time), #31 (Deploy Freq), #32 (Recovery Time), #33 (Fail Rate), #34 (Rework Rate)

TL;DR¶

This RFC is the single source of truth for all DORA measurements in this engagement. It defines what each metric means, how it's collected (passive CodePulse + manual), what baseline looks like, the elite-performer target, and the reporting cadence. Each metric maps to one GitHub issue for tracking.

1. Why DORA¶

DORA (DevOps Research & Assessment) is the industry standard for measuring software delivery performance. 10+ years of research across 33,000+ respondents shows that high-performing teams on all 4 DORA metrics deliver better business outcomes: 2× more likely to exceed profitability goals, 50% lower change fail rate.

For Flexli, the 5-metric suite (4 standard DORA + 1 custom Rework Rate) provides: - An objective before/after comparison for the engagement's value - A shared language between engineering (Rahul, Tushar) and business (Raja) - An input to the defect catalog (high change fail rate → find root cause)

2. Metric Definitions & Collection¶

Metric 1 · Change Lead Time (GitHub Issue #30)¶

Priority: P1 | Effort: 0.5 hr active

Property	Value
Definition	Time from first commit on a feature branch to that commit running in production
Measurement point	PR merge timestamp → production deployment timestamp
Data source	CodePulse passive collection (GitHub PR events + deployment events)
Elite threshold	< 1 hour
High performer	1 hour – 1 day
Medium performer	1 week – 1 month
Low performer	> 1 month

Collection steps: 1. CodePulse automatically captures PR merge and deployment events 2. Lead time = deployment_timestamp - pr_merge_timestamp per PR 3. Week 3 action (US-3.1 #35): read the P50, P90, P99 values from CodePulse dashboard 4. Document in baseline report as: Current P50 lead time: X hours 5. Target (set in US-3.2 #36): Raja to approve 3-month target (suggest: current P50 → current P50 × 0.5)

Baseline capture (US-5.1 step-by-step):

1. Open CodePulse → Metrics → Change Lead Time
2. Set date range: last 30 days (or all available data if < 30 days)
3. Note: P50 (median), P90, P99 values in hours
4. Note: number of PRs in sample
5. Screenshot dashboard → save to docs/dora-baseline/lead-time-baseline.png
6. Commit numbers to docs/dora-baseline/BASELINE-NUMBERS.md

Metric 2 · Deployment Frequency (GitHub Issue #31)¶

Priority: P1 | Effort: 0.5 hr active

Property	Value
Definition	How often the team deploys to production
Measurement point	Merge to `main` branch (or tag push if semver tagging is in use)
Data source	CodePulse passive collection
Elite threshold	Multiple per day
High performer	Once per day – once per week
Medium performer	Once per week – once per month
Low performer	< once per month

Collection steps: 1. CodePulse automatically counts deployments per time period 2. Week 3 action: read "deployments per week" average from CodePulse 3. Document: Current deployment frequency: X deploys per week (averaged over N weeks) 4. Target: Raja approves — suggest increase by 50% over 3 months

Baseline capture (US-5.2 step-by-step):

1. Open CodePulse → Metrics → Deployment Frequency
2. Set date range: last 30 days
3. Note: average deployments per week
4. Note: deployment dates (to identify any multi-week gaps = deployment risk)
5. Screenshot → docs/dora-baseline/deploy-freq-baseline.png
6. Commit to BASELINE-NUMBERS.md

Metric 3 · Failed Deployment Recovery Time (GitHub Issue #32)¶

Priority: P1 | Effort: 1 hr active (some manual research needed)

Property	Value
Definition	Mean time to restore service after a production incident or failed deployment
Measurement point	Incident open timestamp → incident closed (service restored) timestamp
Data source	GitHub issues with `incident` label + CodePulse hotfix detection
Elite threshold	< 1 hour
High performer	< 1 day
Medium performer	1 day – 1 week
Low performer	> 1 week

Important: This metric requires linking production incidents to code fixes. If no incident tracking system exists yet, this metric must be collected partially manually.

Collection steps: 1. Ask Rahul: "What production incidents occurred in the last 3 months? How were they resolved?" 2. For each incident: record incident_start, incident_end, hotfix_PR_number 3. CodePulse will detect PRs labelled hotfix/* — confirm these match the manual list 4. If no formal incident log exists: this metric will be self-reported for baseline → note as "self-reported, not yet instrumented" 5. Note the raw times and compute mean: MTTR = sum(incident_durations) / count(incidents) 6. If zero incidents in period: record as "0 incidents in baseline window" — not as MTTR=0

Baseline capture (US-5.3 step-by-step):

1. Open CodePulse → Metrics → Mean Time to Restore
2. Note: automated hotfix detection results
3. Cross-reference with manual incident list from Rahul
4. Compute mean time if CodePulse value is incomplete
5. Document: "N incidents in baseline window; MTTR = X hours (Y% from CodePulse, Z% manual)"
6. Commit to BASELINE-NUMBERS.md

Metric 4 · Change Fail Rate (GitHub Issue #33)¶

Priority: P1 | Effort: 0.5 hr active

Property	Value
Definition	% of deployments that result in a rollback or hotfix within 24 hours
Measurement point	Any `hotfix/*` PR merged within 24 hours of a production deployment
Data source	CodePulse passive collection (branch name pattern detection)
Elite threshold	< 5%
High performer	5% – 10%
Medium performer	11% – 25%
Low performer	> 25%

Collection steps: 1. CodePulse automatically identifies hotfix/* branches and links them to preceding deployments 2. Requires hotfix/ branch naming convention — confirm with Rahul during Week 1 sync 3. If hotfix/ convention not used: manually identify emergency fixes from git log 4. Change Fail Rate = count(deployments_followed_by_hotfix_within_24h) / count(total_deployments)

Baseline capture (US-5.4 step-by-step):

1. Open CodePulse → Metrics → Change Fail Rate
2. Note: % value and sample size (N deployments in window)
3. If hotfix branch naming is non-standard: git log --oneline main | grep -i "fix\|revert\|rollback"
4. Document rate and whether it was CodePulse-computed or manually calculated
5. Commit to BASELINE-NUMBERS.md

Metric 5 · Deployment Rework Rate (GitHub Issue #34)¶

Priority: P2 | Effort: 1 hr active

Property	Value
Definition	% of deployments where a file changed in deployment N is also changed in deployment N+1 (within 7 days)
Measurement point	File-level overlap between consecutive deployments
Data source	Hotfix branch detection (CodePulse) + manual git analysis
Elite threshold	< 10%
Target	< 10% at 3-month mark

Custom metric: This is a Flexli-specific 5^th metric beyond the standard 4 DORA metrics. It measures re-work — how often a just-deployed change needs to be immediately fixed.

Collection steps:

# Manual computation via git log
git log --oneline --merges main | head -20  # list last 20 merges to main

# For each pair of consecutive deployments (merge commits):
git diff <deploy_N_sha> <deploy_N1_sha> --name-only > /tmp/deploy-N-files.txt
git diff <deploy_N1_sha> <deploy_N2_sha> --name-only > /tmp/deploy-N1-files.txt

# Find overlap
comm -12 <(sort /tmp/deploy-N-files.txt) <(sort /tmp/deploy-N1-files.txt)
# Count overlapping files / total files in deploy N = rework rate for that deployment

Baseline capture:

1. Identify last 10 deployment pairs from git log
2. Compute rework rate for each pair using the script above
3. Average across all pairs
4. Document: "Rework rate = X% (averaged over N deployment pairs)"
5. Commit to BASELINE-NUMBERS.md + commit script to scripts/dora/rework-rate.sh

3. Baseline Report Template¶

The output of all 5 metrics feeds into docs/dora-baseline/BASELINE-NUMBERS.md:

# DMS DORA Baseline Numbers
Date: YYYY-MM-DD
Collection window: YYYY-MM-DD → YYYY-MM-DD (N days)
Tool: CodePulse SaaS + manual supplement

| Metric | Baseline Value | Sample Size | DORA Band | 3-Month Target |
|--------|---------------|-------------|-----------|----------------|
| Change Lead Time (P50) | X hours | N PRs | Medium/High/Elite | Y hours |
| Deployment Frequency | X/week | N deployments | | Y/week |
| Recovery Time (MTTR) | X hours | N incidents | | Y hours |
| Change Fail Rate | X% | N deployments | | Y% |
| Rework Rate | X% | N deploy pairs | | Y% |

## Notes
- Recovery Time: N incidents in baseline window (self-reported / CodePulse)
- Rework Rate: computed manually using scripts/dora/rework-rate.sh

4. Dependencies¶

Dependency	Required for
CodePulse active (US-1.6 #14)	All 5 metrics
Week 2 collection window (US-2.1 #17)	Data availability in Week 3
Hotfix branch naming convention confirmed	Metrics #4 and #5
Raja sign-off on 3-month targets (US-3.2 #36)	Final report

5. Success Criteria¶

All 5 DORA metrics have baseline values documented in BASELINE-NUMBERS.md
Each metric value includes sample size (not just a number)
DORA performance band identified for each metric (Elite/High/Medium/Low)
3-month targets agreed with Raja and committed
Collection methodology documented (CodePulse vs. manual) for each metric