How to Write an Engineering RFC: Getting Buy-In for Technical Decisions

At some point in every developer's career, they have a good technical idea that gets shot down — not because the idea was wrong, but because it was communicated wrong. The decision was made too quickly, the wrong people were in the room, the trade-offs weren't articulated, and the team defaulted to the status quo.

The RFC (Request for Comments) process is the engineering practice that prevents this. It separates "thinking about a change" from "building the change" — creating a structured document that explains the problem, proposes a solution, acknowledges alternatives, and explicitly invites criticism before any code is written.

Google, Rust, Python, Kubernetes, Next.js, and most serious engineering organizations use RFCs for significant decisions. This post teaches you how to write one that builds genuine consensus.

When to Write an RFC

Not every decision needs an RFC. Write one when the change:

Affects multiple teams or services — others need to weigh in.
Is architecturally significant — adding a new database, changing the auth system.
Has meaningful alternatives — a decision others might reasonably disagree with.
Will be difficult or costly to reverse — migrations, protocol changes, API contracts.
Involves significant trade-offs — performance vs. consistency, simplicity vs. flexibility.

Don't write an RFC for:

Adding a new endpoint to an existing API.
Choosing between two npm packages with similar features.
A bug fix, even a complex one.
An internal implementation detail no one else depends on.

RFC Structure

A good RFC has seven sections:

1. Summary (2–3 sentences)

## Summary

Propose migrating our session management from JWT tokens stored in localStorage 
to HttpOnly cookies with a server-side session store (Redis). This eliminates our 
current XSS vulnerability and enables instant session revocation.

2. Motivation

## Motivation

Our current implementation stores JWT tokens in localStorage, which means any 
XSS vulnerability in our application allows an attacker to steal all active 
user sessions. This risk materialized in Q2 when a third-party script injection 
on our checkout page was discovered — fortunately caught in a security audit 
before exploitation.

Additionally, our current JWTs cannot be revoked. If a token is stolen or a 
user wants to log out all devices, we have no mechanism to invalidate existing 
tokens before their 7-day expiry.

The motivation section should explain why this is urgent. Cite specific incidents, metrics, or constraints.

3. Detailed Design

This is the core of the RFC. Be specific:

## Detailed Design

### Session Storage
Sessions will be stored in Redis with the following structure:

```
Key:   session:{uuid}
Value: { userId, createdAt, expiresAt, ipAddress, userAgent }
TTL:   24 hours (rolling)
```

### Cookie Configuration
```typescript
cookieStore.set('session_id', sessionUuid, {
  httpOnly: true,
  secure: true,         // HTTPS only
  sameSite: 'strict',   // CSRF protection
  path: '/',
  maxAge: 60 * 60 * 24, // 24 hours
});
```

### Request Validation Flow
Every authenticated request:
1. Read `session_id` cookie
2. Look up session in Redis (less than 1ms)
3. If not found or expired → 401
4. Extend TTL by 24 hours (rolling session)
5. Attach userId to request context

### Migration Strategy
Week 1: Issue new HttpOnly cookies alongside existing JWTs (dual-session)
Week 2: Verify all clients are using the new session system
Week 3: Stop accepting JWT tokens

Total migration window: 3 weeks with zero forced logouts.

4. Drawbacks

## Drawbacks

1. **Redis dependency**: We're adding a new infrastructure dependency. 
   If Redis is unavailable, all authenticated requests fail.
   Mitigation: Redis Cluster with replication; circuit breaker for fallback.

2. **Session state complexity**: Stateful sessions are harder to scale 
   horizontally than stateless JWTs. Each request requires a Redis lookup.
   Mitigation: Redis at sub-millisecond latency is acceptable; current JWT 
   verification is ~0.5ms, Redis lookup is ~0.3ms.

3. **Migration complexity**: 3-week dual-session period requires maintaining 
   two authentication systems simultaneously.
   Mitigation: Feature flag to control rollout; existing JWT code is unchanged.

The drawbacks section is critical. RFC reviewers will think of objections — if you acknowledge them first, you demonstrate thorough thinking and preempt resistance.

5. Alternatives Considered

## Alternatives Considered

### A: Keep JWT in localStorage, add CSP
We could prevent XSS through a strict Content Security Policy, reducing 
the impact of script injection. However:
- CSP is a mitigation, not a prevention — it reduces but doesn't eliminate XSS risk.
- Revocation is still impossible.
- Rejected: addresses symptoms, not root cause.

### B: JWT in Memory (JavaScript variable)
Storing JWT in a JavaScript variable prevents localStorage theft but loses 
the token on page refresh — unacceptable UX.
- Rejected: poor user experience.

### C: Use a third-party auth service (Auth0, Clerk)
Eliminates custom session management entirely.
- We evaluated this in RFC-012 and decided to keep authentication in-house 
  for compliance reasons. This RFC does not reopen that decision.
- Rejected: out of scope.

6. Open Questions

## Open Questions

1. **Session invalidation on password change**: Should changing a password 
   invalidate all active sessions? If yes, we need a "sessions" management 
   UI so users can see and revoke active sessions. Is this in scope for this RFC 
   or a follow-up?

2. **Mobile app compatibility**: Our iOS/Android apps use the REST API with 
   Authorization headers (not cookies). Does this RFC affect them?

3. **Redis persistence**: Should session data be persisted to disk (potential 
   recovery on Redis restart) or ephemeral (all sessions lost on restart)? 
   Ephemeral is simpler; persistent increases Redis cost and complexity.

7. Unresolved Questions

## Unresolved Questions

- What is the correct session TTL? 24 hours of inactivity vs. 7 days absolute?
  This is a UX vs. security trade-off that requires product team input.

- Should we implement session fingerprinting (IP + user-agent binding)?
  This reduces session hijacking risk but will cause unexpected logouts for 
  mobile users on mobile networks.

The RFC Review Process

Before Submitting

Share a draft with 2–3 people who will be affected. Get informal feedback first.
Ensure the document is complete — open questions are fine, but avoid missing sections.
Keep it to a readable length — aim for under 1,500 words for the main design.

During Review (1–2 weeks)

Post the RFC in your team's design documents channel.
Explicitly request reviews from affected team leads.
Respond to all comments, even if you disagree. "Not addressed" kills RFCs.
Edit the document based on feedback — the RFC should evolve.

After Review

Mark status as Accepted, Rejected, or Withdrawn.
If accepted, link the implementation PRs to the RFC.
If rejected, document why — it is valuable institutional memory.

RFC Status Labels

RFC-042: HttpOnly Cookie Sessions

Status: ACCEPTED ✅
Author: @sabaoon
Created: 2026-06-01
Last Updated: 2026-06-08
Implementation: PR #1847, PR #1851

[Previous statuses: DRAFT → IN REVIEW → ACCEPTED]

Conclusion

The RFC process is one of the highest-leverage practices in software engineering because it forces the most valuable activity in technical leadership: structured, documented thinking before action. An RFC that takes three hours to write can prevent three months of implementation work on the wrong solution. More importantly, it builds genuine consensus — engineers who contributed to the design are far more committed to making the implementation succeed than those who had something handed down to them. Write the RFC before you write the code.