The Security & Bug Gate: Two-Tier Code Review for AI-Generated Features
AI can ship features fast, but it can also ship vulnerabilities. Here's our quality firewall.
The Security & Bug Gate: Two-Tier Code Review for AI-Generated Features
AI-generated code is unsafe until proven otherwise.
Let me tell you about a close call we had three months ago.
We were using Claude to generate a Supabase RLS policy for a new feature. The feature seemed to work perfectly in development. The AI-generated code was clean, well-commented, and passed our basic tests. We shipped it to staging.
Two days later, during a security audit, we discovered the RLS policy had a subtle flaw: users could read data from any organization they'd ever been a member of—including ones they'd been removed from. The policy checked organization_members but didn't verify the membership was still active.
One bad WHERE clause away from a major data leak.
This wasn't a fluke. AI code generation tools (Claude, ChatGPT, Copilot, Cursor) are incredible productivity multipliers. But they're also confidently wrong in ways that are hard to spot. They'll generate authentication checks that look right but have holes. They'll write database queries that work for the happy path but leak data on edge cases. They'll skip input validation because "it's just a quick script."
That's when we built the Security & Bug Gate: a two-tier code review system that treats all AI-generated code as unsafe until proven otherwise.
In this article, I'll walk you through both tiers—the everyday prompt for rapid iteration and the pre-production gate for release candidates—and show you exactly how to implement them in your own workflow.
The Core Insight: Default to Distrust
Here's the mental model shift we had to make:
Old mindset: "This AI code looks good, let's ship it."
New mindset: "This AI code is unsafe until it passes the security checklist."
AI tools are optimized for helpfulness and plausibility, not correctness or security. They'll generate code that seems right, that compiles, that passes surface-level tests. But they don't reason about:
- Multi-tenant data isolation: Can user A see user B's data?
- Authorization edge cases: What if a user is invited to an org but hasn't accepted yet?
- Input validation gaps: What if someone sends malformed JSON or a 10MB string?
- Race conditions: What if two requests modify the same record simultaneously?
- Secrets management: Did the API key get hardcoded or loaded from env?
These aren't AI limitations—humans miss them too. But AI fails confidently, without the "wait, does this feel right?" gut check that experienced developers have.
So we systematize the gut check.
Tier 1: The Everyday Prompt (Development Review)
For rapid iteration during development, we use a lightweight 5-minute review checklist. This runs before code gets committed to a feature branch.
The Prompt Template
Here's the core structure (full template at docs/templates/SECURITY-DAY-TO-DAY-PROMPT.md):
## Quick Security & Logic Check
**Feature**: [Brief description]
**Files changed**: [List]
### 1. Auth & Permissions
- ✅ Verified: User identity (auth.uid or session)
- ✅ Verified: Organization membership (if multi-tenant)
- ✅ Verified: Role-based access (if applicable)
- ⚠️ Issue: [Describe any problems]
### 2. Data Scoping & RLS
- ✅ All queries scoped by organization_id
- ✅ RLS enabled on relevant tables
- ⚠️ Issue: [Describe any problems]
### 3. Input Validation
- ✅ Required fields validated
- ✅ Type checking (strings, numbers, arrays, etc.)
- ✅ Size limits (e.g., max string length, array length)
- ⚠️ Issue: [Describe any problems]
### 4. Error Handling
- ✅ Try/catch blocks where needed
- ✅ Errors logged (not silenced)
- ⚠️ Issue: [Describe any problems]
### 5. Secrets & Config
- ✅ No secrets in code
- ✅ Service role keys only used server-side
- ⚠️ Issue: [Describe any problems]
**Verdict**: ✅ Safe to commit | ⚠️ Fix issues first
When to Use It
- Before every commit during feature development
- When adding new API routes or Edge Functions
- When modifying database queries or RLS policies
- When integrating third-party services
What It Catches
The everyday prompt is optimized for speed and practicality. It catches:
- Missing auth checks
- Forgotten
organization_idscoping - Obvious input validation gaps
- Hardcoded secrets
- Silent error swallowing
It won't catch subtle logic bugs or complex race conditions—that's what Tier 2 is for.
Example: Catching an Auth Gap
Here's a real example from last week. AI generated this API route:
// AI-generated code (BEFORE review)
export async function POST(request: Request) {
const { projectId, name } = await request.json();
const newTask = await db.insert(tasks).values({
project_id: projectId,
name: name,
});
return Response.json(newTask);
}
Everyday prompt review:
- ❌ Auth & Permissions: No verification of user identity. Anyone can create tasks for any project.
- ❌ Data Scoping: No check that the project belongs to the user's organization.
- ❌ Input Validation: No validation of
namelength or required fields.
Verdict: ⚠️ Fix issues first.
Fixed version:
// After Security & Bug Gate review
export async function POST(request: Request) {
const session = await getServerSession();
if (!session?.user) {
return Response.json({ error: 'Unauthorized' }, { status: 401 });
}
const body = await request.json();
// Input validation
if (!body.projectId || typeof body.name !== 'string' || body.name.length > 500) {
return Response.json({ error: 'Invalid input' }, { status: 400 });
}
// Verify project belongs to user's org
const project = await db.query.projects.findFirst({
where: and(
eq(projects.id, body.projectId),
eq(projects.organization_id, session.user.organizationId)
)
});
if (!project) {
return Response.json({ error: 'Project not found' }, { status: 404 });
}
const newTask = await db.insert(tasks).values({
project_id: body.projectId,
name: body.name.trim(),
created_by: session.user.id,
organization_id: project.organization_id,
});
return Response.json(newTask);
}
Time to review & fix: ~8 minutes. Data leak prevented.
Tier 2: The Pre-Production Gate (Release Review)
Before any feature ships to production, it goes through a comprehensive security and logic audit. This is the full checklist—the one that takes 30-60 minutes but catches everything.
The Full Checklist
The pre-production gate (docs/templates/SECURITY-PRE-PRODUCTION-GATE-PROMPT.md) has 8 sections:
1. High-Level Change Understanding
- Summarize what the code does and where it fits (frontend, backend, Edge Function, DB)
- Identify high-risk areas (auth, billing, org permissions, data writes, RLS)
2. Auth & Authorization / Permissions
- Missing or weak checks of auth status, user identity, role/permissions, org membership
- Code paths where users can access/modify data for orgs they don't belong to
- Non-admins performing admin actions
- Supabase-specific: Client SDK never handles sensitive operations alone
3. Supabase & RLS / Data Tenancy
- Every query scoped by
organization_id - RLS enabled on all tables
- RLS policies use
auth.uid()and org membership joins correctly - No bypass policies like
USING (true)unless non-sensitive
4. Input Validation & API Surface
- Input validation (types, ranges, allowed values)
- Protection against malformed data and abuse
organization_idanduser_idderived from auth/session, not blindly trusted from client
5. Logic Bugs, Edge Cases, and Reliability
- Null / undefined risks
- Race conditions (async, parallel calls, state updates)
- Error handling (missing try/catch, silent failures, ignored promises)
- Off-by-one, empty arrays, timezones
6. Secrets, Config, and Env
- Secrets not committed
- Loaded from env/config
- Service role keys only server-side
- No secrets in browser runtime
7. Tests & Future Regression Protection
- List specific unit and integration tests to catch the issues found
- Critical user flows for e2e testing
8. Risk Score & Deploy Verdict
- Risk Score (1-10):
- 1-3: Low risk, minor issues
- 4-6: Medium risk, must fix key items before production
- 7-10: High risk, do NOT deploy
- Verdict:
- "SAFE TO SHIP"
- "SHIP WITH FIXES APPLIED FIRST"
- "DO NOT SHIP"
Example: Full Review of a Billing Integration
Here's a condensed example of a pre-production review for a Stripe integration:
1. High-Level Change:
- Adds subscription management endpoints
- Handles webhook events from Stripe
- Updates
subscriptionstable and user permissions - High-risk areas: Billing logic, webhook signature verification, org-level permissions
2. Auth & Authorization:
- ✅ Webhook signature verified via
stripe.webhooks.constructEvent - ✅ User endpoints verify session and org membership
- ⚠️ Issue: Cancel subscription endpoint doesn't verify user is org owner
3. Supabase & RLS:
- ✅ RLS enabled on
subscriptionstable - ⚠️ Issue: RLS policy allows any org member to read subscription data (should be owner/admin only)
4. Input Validation:
- ✅ Webhook payload validated by Stripe library
- ⚠️ Issue: User-facing endpoints trust
priceIdfrom client without verification
5. Logic Bugs:
- ⚠️ Issue: Race condition if two subscription events arrive simultaneously (no locking)
- ⚠️ Issue: No handling for Stripe event duplicates (idempotency)
6. Secrets:
- ✅ Stripe keys loaded from env
- ✅ Webhook secret stored securely
7. Tests:
- Missing: Unit test for webhook signature failure
- Missing: Integration test for subscription upgrade path
- Missing: E2e test for downgrade and cancellation
8. Risk Score & Verdict:
- Risk Score: 7/10 (High)
- Verdict: DO NOT SHIP
- Reasoning:
- Authorization gap on cancel endpoint = users can cancel other orgs' subscriptions
- RLS policy leak = members see billing details they shouldn't
- Race condition + no idempotency = potential double-charges or missed events
Time to review: 45 minutes. Multiple critical bugs caught before production.
Practical Application: How to Implement This
Here's how to add the Security & Bug Gate to your workflow:
1. Create the Prompt Templates
Save both templates:
SECURITY-DAY-TO-DAY-PROMPT.md(5-minute checklist)SECURITY-PRE-PRODUCTION-GATE-PROMPT.md(full audit)
Make them copy-pasteable. We keep ours in docs/templates/ so they're always available.
2. Make Them Mandatory Gates
Everyday prompt:
- Required before committing to feature branch
- Add to your pull request template: "Did you run the everyday security checklist?"
- Use a git pre-commit hook to remind developers
Pre-production gate:
- Required before merging to
mainor releasing - Add to your CI/CD pipeline as a manual approval step
- Assign a "security reviewer" role (human or AI-assisted)
3. Train Your Team (or AI)
If you're using AI to generate code:
- Give it the everyday prompt and ask it to self-review before outputting code
- Use the pre-production prompt as a final review step
If you have a human team:
- Run a workshop on common security gaps (auth, RLS, input validation)
- Share examples of bugs caught by the gate
- Make it a habit, not a checkbox
4. Track What You Catch
Keep a log of issues caught by the gate:
- Missing auth checks
- RLS leaks
- Input validation gaps
- Logic bugs
Review this log monthly to identify patterns. If you keep catching the same issue, add guardrails (linters, type checkers, architecture rules) to prevent it by default.
5. Automate What You Can
Some checks can be automated:
- Secrets scanning: Use tools like
gitleaksortruffleHog - Linting: ESLint rules for missing error handling, unused promises
- Type checking: Strict TypeScript mode catches many input validation gaps
- RLS verification: Write tests that attempt cross-org data access
Automation catches the easy stuff, leaving humans (or AI) to focus on the subtle logic issues.
Real-World Results
Since implementing the Security & Bug Gate, here's what we've caught:
In the past 3 months:
- 12 missing auth checks (everyday prompt)
- 5 RLS policy leaks (pre-production gate)
- 8 input validation gaps (everyday prompt)
- 3 race conditions (pre-production gate)
- 2 hardcoded API keys (everyday prompt)
- 1 critical billing logic bug (pre-production gate)
Estimated value:
- Zero security incidents in production (previously: 2-3 per quarter)
- ~40 hours saved debugging production issues
- Confidence to move faster with AI code generation
Trade-Offs and Limitations
The Security & Bug Gate isn't free:
Time Investment:
- Everyday prompt: 5-10 minutes per feature
- Pre-production gate: 30-60 minutes per release
- For a team shipping daily, this adds up
Requires Expertise:
- Running the checklists effectively requires security and architecture knowledge
- Junior developers might miss subtle issues
- AI reviewers need good prompts to be effective
Can Slow Velocity:
- If the gate becomes a bottleneck (one person reviewing everything), shipping slows
- Need to balance thoroughness with speed
False Negatives:
- The checklist doesn't catch everything
- Novel attack vectors or complex logic bugs can slip through
- Still need penetration testing and security audits for high-stakes systems
Can Create Review Fatigue:
- If every tiny change requires a full audit, developers get annoyed
- Need to calibrate: everyday prompt for most changes, full gate for releases
When to Skip It
We use lighter reviews for:
- Internal tools (no public exposure)
- Prototype features (not production-ready yet)
- Low-risk changes (copy updates, styling tweaks)
But anything touching auth, billing, or user data goes through the full gate.
Takeaways
Here's what to remember about the Security & Bug Gate:
-
Default to Distrust: Treat AI-generated code as unsafe until it passes a security checklist. AI is optimized for plausibility, not correctness.
-
Two-Tier Review: Use a lightweight everyday prompt for rapid iteration and a comprehensive pre-production gate for releases.
-
Checklist Covers 8 Areas: Auth, RLS, input validation, logic bugs, secrets, tests, plus a risk score and deploy verdict.
-
Make It Mandatory: Add gates to your pull request and CI/CD process. Don't let code ship without review.
-
Track and Learn: Log what the gate catches and use patterns to add automated guardrails (linters, type checkers, architecture rules).
-
Automate the Easy Stuff: Use secrets scanning, linting, and type checking to catch surface-level issues, leaving humans to focus on subtle logic.
What's Next?
The Security & Bug Gate has become one of our most valuable practices. It's the reason we can confidently use AI to generate code at scale without constantly shipping vulnerabilities.
We're now working on:
- AI-powered review: Using Claude to run the pre-production checklist automatically
- Expanded checklist: Adding sections for performance, accessibility, and observability
- Integration with CI: Automating parts of the gate in GitHub Actions
AI can ship features fast, but it can also ship vulnerabilities. The Security & Bug Gate is our quality firewall.
If you're using AI code generation (or just want more rigorous code review), I encourage you to implement a two-tier review system. Start with the everyday prompt, add the pre-production gate for releases, and track what you catch.
What if every line of code passed a security checklist before shipping? That's the promise of systematic code review.