FLX-ENG-RFC-001 — Branching, Build & Release Engineering¶

Field	Value
RFC ID	FLX-ENG-RFC-001
Status	WIP — target sign-off EoD 22 May 2026
Author	Arun Singh, Senior Distinguished Engineer / Architect (Consulting)
Reviewers	Raja Choudhary (Founder), Rahul (Eng Lead), Kanchan, Tushar
Scope	mSort application — version control, build, release, multi-tenant rollout, PoC plan, tech stack
Supersedes	Ad-hoc dev/prod branching; Meesho-specific default behaviour in baseline

TL;DR — The decision¶

Maintain exactly one product codebase on a trunk-based branching model. Handle client-specific behaviour through tenant-aware configuration and feature flags, not through client-specific branches or builds. For offline / semi-connected client sites, ship the same artifact with a Flexli-signed deployment manifest that activates the correct feature set per customer.

Branches manage software lifecycle; tenancy manages customer variation; the deployment manifest carries entitlement.

1. Problem¶

Flexli is moving from a single-customer baseline (currently carrying Meesho-specific defaults) to a multi-client product line. Without a deliberate engineering process, the natural drift is toward per-customer code forks, per-customer branches, and per-customer builds. That path looks fast for the first three customers and becomes the single largest source of incidents, merge debt, and release risk by customer ten.

Current constraints¶

The baseline codebase carries Meesho-specific defaults — not scalable to other clients
Only prod and dev branches with short-lived feature branches — no formal branching model
Gaps in publishing, bundling, and deployment — no CI/CD, no container image, no artifact signing
Client variation is in business logic itself, not only configuration values

Design principles (each future proposal that violates one needs an explicit waiver)¶

Branches model lifecycle, tenants model customers. Branching expresses change delivery; tenancy expresses business segmentation. Mixing them creates merge debt.
One codebase, one artifact, multiple activations. The pipeline produces a single versioned artifact; customer differences are activated at deployment time through governed configuration.
Control lives with Flexli, not the customer. Feature entitlement is generated, signed, and shipped by Flexli. Customer-editable plain settings are not a control surface; operability and rollback are first-class concerns.

2. Branching Model¶

Trunk-based development with short-lived feature branches and on-demand release branches.

Branches¶

Branch	Purpose	Lifetime	Merges into
`main`	Single source of truth. Always releasable. Protected; no direct push.	Permanent	—
`feature/<id>-<slug>`	One unit of work, scoped to a user story or task ID.	Hours to a few days	`main` via PR + CI
`release/<version>`	Stabilisation window for an upcoming UAT or production cut. Only fixes allowed.	Days to two weeks	`main` (and tag)
`hotfix/<id>`	Urgent production fix on the currently released tag.	Hours	`main` + release tag

Rules¶

Direct pushes to main are blocked; every change lands through a PR with green CI and at least one reviewer (two for release/* and hotfix/*)
Feature branches rebase on main daily and merge within five working days
Tags follow semantic versioning, cut from a release/* branch
We do not create: customer/<name> branches, long-lived develop branches, or environment branches

3. Multi-tenant Architecture¶

Client-specific behaviour is expressed through four mechanisms, in this order of preference:

Mechanism	When to use	Carrier
Tenant configuration	Values differ per client: endpoints, limits, regex, SLAs, branding	Signed deployment manifest
Feature flags	Same capability, enable/disable per client or per environment	Feature registry in manifest
Strategy / plugin interfaces	Business logic genuinely differs (e.g., routing, validation, notification provider)	Interface + per-client implementation, selected by config
Deployment topology	Isolation required: regulatory, on-prem, dedicated SLO, noisy-neighbour risk	IaC module variant, same artifact

What we do not do: edit source per client; maintain client forks; rebuild the artifact per client.

Feature registry and manifest¶

All flags live in a single feature registry in code, each entry declaring key, owner, default, and dependencies. The per-client manifest references those keys. A manifest is a small signed JSON:

{
  "tenantId": "meesho-blr-wh-01",
  "site": "BLR_WAREHOUSE",
  "version": "1.2.3",
  "features": {
    "awb_data_sync": true,
    "manifest_close": true,
    "manual_sort": false
  },
  "config": {
    "api_base_url": "https://api.meesho.com",
    "data_sync_interval_seconds": 300
  },
  "validityWindow": {
    "notBefore": "2026-06-01T00:00:00Z",
    "notAfter": "2026-12-31T23:59:59Z"
  },
  "signature": "<cosign-detached-signature>"
}

The application validates the signature at startup and refuses to start on a tampered or expired manifest.

4. Build, Artifact, and Deployment¶

Pipeline stages¶

Build → Test → Scan → Package → Promote → Deploy → Verify

Stage	What happens
Build	`dotnet build` — compile
Test	`dotnet test` — unit + integration
Scan	Static analysis + dependency vulnerability scan
Package	One immutable OCI image + CycloneDX SBOM + Cosign signature
Promote	Same artifact moves QA → Staging → UAT → Production. No rebuild on promotion.
Deploy	Orchestrator validates manifest signature and binds it to the artifact
Verify	Smoke tests + health checks + canary window

Online vs offline client deployment¶

Topology	Manifest delivery
Flexli cloud / managed	Pulled from Flexli config service at startup and on refresh; hot-reload or rolling restart
Client on-prem, connected	Pulled through site agent; cached locally; rolling restart per site
Client on-prem, air-gapped	Signed manifest bundled with the artifact at deployment time; new bundle on the next site visit

Critical

Even for air-gapped sites, the manifest is generated and signed by Flexli. The client does not edit toggles by hand. A missing, tampered, or expired manifest puts the app into a documented safe state and emits an alert.

Rollback¶

Code rollback: redeploy the previous tagged artifact (retained at least three releases back)
Config rollback: redeploy the previous signed manifest (manifests are versioned and immutable)
Every release has a documented rollback criterion (error rate, latency, business KPI) and a named on-call owner

5. Technology Stack¶

Layer	Recommended choice
Source control & CI	Git (GitHub); GitHub Actions
Build & test	Existing .NET / `dotnet test`; add static-analysis and dependency-scan steps
Artifact format & registry	OCI container image + SBOM (CycloneDX); GHCR
Signing	Cosign (sigstore) with keys in cloud KMS or HSM
Tenant manifest	JSON Schema v1 + detached Cosign signature; bundled with artifact for offline sites
Feature flags	OpenFeature SDK with backend adapter selectable per tenant
Observability	OpenTelemetry → Grafana stack (Loki, Tempo, Prometheus / Mimir)
Secrets & keys	HashiCorp Vault or cloud KMS (AWS KMS / Azure Key Vault)
Deployment	Docker Compose (single-box sites) or k3s (multi-service sites)
IaC	Terraform (cloud) + Ansible (on-prem)

6. Release Engineering Process¶

Two cadences¶

Continuous to Flexli-managed environments: every merge to main that passes CI is deployed automatically to dev → staging → Flexli-internal pilot site. If canary metrics hold for 30 minutes, promoted further.
Scheduled to client sites: fixed train tied to the sprint. One MINOR release per sprint, PATCH hotfixes only when needed.

Sprint-aligned release train (2-week sprint)¶

Day	What happens	Owner
1–8	Normal development. PRs merge to main continuously. Each merge auto-deploys to dev → staging → Flexli pilot. Canary metrics are watched.	All engineers
9	Release cut: tag `vX.Y.0-rc1` from main. QA owns regression sweep on the rc tag in staging.	Release DRI
10	Soft freeze on rc tag: only release-blocker fixes (cherry-picked to `release/X.Y`). New feature work continues on main for next sprint.	Release DRI + QA
11	Promote to first pilot client (5–10% of fleet). Canary watch for 24 h on per-tenant SLIs.	Release DRI + SRE
12	If canary green: promote to remaining client sites. Publish release notes and manifest diff per client. Update audit log.	Release DRI
13–14	Retrospective on the release: DORA metrics for the sprint, any rollback or freeze events, runbook gaps, next-sprint corrections.	Release DRI + Eng Lead

DORA + one Flexli metric¶

Deploy frequency — many merges/day to Flexli envs; ≥ 1 client release per sprint
Lead time — commit → first production tenant; target < 2 weeks
Change-failure rate — < 15%
MTTR — < 1 h cloud, < 1 day on-prem
Tenant-attribution rate — share of incidents diagnosed via per-tenant telemetry without manual log diving

What we deliberately don't do¶

No release branches kept alive between sprints (release/X.Y is archived once shipped)
No manual QA sign-off on every PR — automation is the gate; manual QA owns the rc and canary
No release committee — a rotating Release DRI per sprint owns the call

7. PoC — Distribution Management Server¶

What already exists (§10.1 from RFC)¶

A working strategy seam. Program.cs already registers IStrategy and IDropOffStrategy implementations for Flexli, Myntra, MyntraSingleScan, CsvClient, LiveClientServer, X1G3, WTM. Client variation is already an interface, not a branch.
A clean DI container. ASP.NET Core DI is the natural place to bind a tenant manifest to the right strategy at startup.
Configuration is already file-driven. appsettings.json and configuration.json exist; today they are flat and client-editable. The PoC converts them into a Flexli-signed manifest validated at startup.

What is missing (§10.2 from RFC)¶

No CI, no Dockerfile for production, no signed artifact
No tenant manifest (strategy selection is hardcoded by configuration values the client can edit on disk)
No per-tenant observability (App.Metrics counters are present but emit global counters; nothing is tagged with tenantId)
Framework is .NET 6 (out of LTS since Nov 2024)

PoC week plan¶

Week	Work	Exit signal
1	Port project to .NET 8 LTS; resolve breaking packages; tests green locally	`dotnet build` + `dotnet test` pass
1–2	Branch protection on `main`; GitHub Actions workflow (restore → test → scan)	Green CI on a sample PR
2–3	Dockerfile (multi-stage); push to GHCR; semver tagging from CI	Image runs locally with `docker run`
3	SBOM (CycloneDX) + Cosign keyless signing of image	`cosign verify-blob` succeeds
4	`tenant.manifest.json` schema; signed-manifest loader in `Program.cs`	Tampered manifest is rejected at startup
4–5	Wire OpenFeature; bind `IStrategy` / `IDropOffStrategy` selection to manifest keys	Switch Flexli↔Myntra by manifest change, no rebuild
5–6	OpenTelemetry SDK; `tenantId` tag on metrics, logs, traces; local Grafana + Loki + Tempo via compose	Per-tenant dashboard shows the right tenant's traffic
6–7	`docker-compose.yml` for offline-style install (app + Postgres + manifest); rollback drill against previous tag	Rollback completes within documented RTO
7–8	Second pilot tenant: generate a second signed manifest; run both tenants on the same image	Two tenants live on identical artifact, divergent manifests
8–9	Buffer for issues, runbook, demo polish, internal review	Demo to leadership; G1–G5 gates closed

Decision gates (G1–G5)¶

Gate	Condition
G1	CI green on `main`; direct push blocked
G2	Signed artifact published with SBOM
G3	Signed manifest validated at startup; tampered or expired rejected
G4	Rollback drill passes within documented RTO
G5	Two tenants on one image, with per-tenant telemetry

Kill criterion: if porting to .NET 8 or extracting tenant variation from configuration.json requires touching more than ~30% of services or controllers, pause and re-scope the manifest model before continuing.

8. Definition of Success¶

Implemented successfully when all hold simultaneously:

One main branch in active use; no per-customer branches exist or are required
Every release is a tagged, signed, immutable artifact paired with a signed tenant manifest
Onboarding a new tenant is a manifest change, not a code change
Per-tenant telemetry queryable by tenantId across logs, metrics, and traces
Rollback drills run quarterly and pass within the documented RTO
Two tenants live on identical artifacts as evidence the model scales without forks