Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessWill AI make it harder for non-graduates to climb the jobs ladder?Financial Times TechDeepSource vs Snyk: Code Quality vs SecurityDEV CommunitySave a massive $950 on this epic Alienware Area-51 gaming PC with an RTX 5090 and 9950X3D — grab this liquid-cooled 4K gaming powerhouse with 32GB DDR5 and a 2TB SSD for just $5,299 while you cantomshardware.comReact Native Android plugin not found: what I would check firstDEV Community7 Headless CMSs in 2026: Stop Picking the Wrong Headless CMS — A No-BS GuideDEV CommunityMigrating From Laravel Forge to Deploynix: A Step-by-Step GuideDEV CommunityI Built a Zero-Dependency Supply-Chain Security Scanner for Node.js — 18 Checks, One CommandDEV CommunityThe Rubber Duck Prompt: Debug AI Output by Making It Explain Every DecisionDEV CommunityYour Encrypted Backups Are Slow Because Encryption Isn't the BottleneckDEV CommunityUltimate AI Agent for Laravel (2026)DEV CommunityI Built a Skill Reviewer. Then I Ran It on Itself.DEV CommunityBuilding a Real-Time AI Dungeon Master with Claude API, Socket.io, and Next.js 16DEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessWill AI make it harder for non-graduates to climb the jobs ladder?Financial Times TechDeepSource vs Snyk: Code Quality vs SecurityDEV CommunitySave a massive $950 on this epic Alienware Area-51 gaming PC with an RTX 5090 and 9950X3D — grab this liquid-cooled 4K gaming powerhouse with 32GB DDR5 and a 2TB SSD for just $5,299 while you cantomshardware.comReact Native Android plugin not found: what I would check firstDEV Community7 Headless CMSs in 2026: Stop Picking the Wrong Headless CMS — A No-BS GuideDEV CommunityMigrating From Laravel Forge to Deploynix: A Step-by-Step GuideDEV CommunityI Built a Zero-Dependency Supply-Chain Security Scanner for Node.js — 18 Checks, One CommandDEV CommunityThe Rubber Duck Prompt: Debug AI Output by Making It Explain Every DecisionDEV CommunityYour Encrypted Backups Are Slow Because Encryption Isn't the BottleneckDEV CommunityUltimate AI Agent for Laravel (2026)DEV CommunityI Built a Skill Reviewer. Then I Ran It on Itself.DEV CommunityBuilding a Real-Time AI Dungeon Master with Claude API, Socket.io, and Next.js 16DEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

Webhook Best Practices: Retry Logic, Idempotency, and Error Handling

DEV Communityby Henry HangApril 2, 20268 min read1 views
Source Quiz

<h1> Webhook Best Practices: Retry Logic, Idempotency, and Error Handling </h1> <p>Most webhook integrations fail silently. A handler returns 500, the provider retries a few times, then stops. Your system never processed the event and no one knows.</p> <p>Webhooks are not guaranteed delivery by default. How reliably your integration works depends almost entirely on how you write the receiver. This guide covers the patterns that make webhook handlers production-grade: proper retry handling, idempotency, error response codes, and queue-based processing.</p> <h2> Understand the Delivery Model </h2> <p>Before building handlers, understand what you are dealing with:</p> <ul> <li>Providers send webhook events as HTTP POST requests</li> <li>They expect a 2xx response within a timeout (typically 5

Webhook Best Practices: Retry Logic, Idempotency, and Error Handling

Most webhook integrations fail silently. A handler returns 500, the provider retries a few times, then stops. Your system never processed the event and no one knows.

Webhooks are not guaranteed delivery by default. How reliably your integration works depends almost entirely on how you write the receiver. This guide covers the patterns that make webhook handlers production-grade: proper retry handling, idempotency, error response codes, and queue-based processing.

Understand the Delivery Model

Before building handlers, understand what you are dealing with:

  • Providers send webhook events as HTTP POST requests

  • They expect a 2xx response within a timeout (typically 5-30 seconds)

  • If they do not receive 2xx, they retry on a schedule (often exponential backoff over hours or days)

  • Most providers have a maximum retry count after which the event is dropped

  • Some providers allow you to manually retry from their dashboard

Stripe retry schedule: Attempt 1: immediate Attempt 2: 5 minutes Attempt 3: 30 minutes Attempt 4: 2 hours Attempt 5: 5 hours Attempt 6: 10 hours Attempt 7: 24 hours ... continues for ~72 hours total

Enter fullscreen mode

Exit fullscreen mode

This retry behavior is your safety net -- but only if your handler is idempotent.

Rule 1: Respond Fast, Process Async

Your webhook handler should acknowledge receipt immediately and do the actual work in the background. If you do database writes, call external APIs, or send emails synchronously inside the handler, you risk timing out.

// BAD: synchronous processing risks timeout app.post('/webhook/stripe', async (req, res) => {  const event = JSON.parse(req.body);

if (event.type === 'payment_intent.succeeded') { // This could take several seconds await fulfillOrder(event.data.object); await sendConfirmationEmail(event.data.object.metadata.email); await updateInventory(event.data.object.metadata.items); }

res.json({ received: true }); // might never get here if above throws });

// GOOD: acknowledge immediately, process async app.post('/webhook/stripe', async (req, res) => { const event = JSON.parse(req.body);

// Queue the work — respond in milliseconds await queue.add('stripe-webhook', { event });

res.json({ received: true }); // always returns 200 fast });

// Worker processes the queue queue.process('stripe-webhook', async (job) => { const { event } = job.data; if (event.type === 'payment_intent.succeeded') { await fulfillOrder(event.data.object); await sendConfirmationEmail(event.data.object.metadata.email); await updateInventory(event.data.object.metadata.items); } });`

Enter fullscreen mode

Exit fullscreen mode

The queue gives you retry logic, failure visibility, and async processing without blocking the HTTP response.

Rule 2: Make Handlers Idempotent

Since providers retry webhooks, your handler may receive the same event multiple times. You must make your handler safe to run more than once with the same event ID.

Without idempotency, a network blip that causes Stripe to retry a payment_intent.succeeded event could charge a customer twice, create duplicate orders, or send duplicate emails.

Track Processed Event IDs

The simplest approach: store event IDs and skip events you have already processed.

async function handleStripeEvent(event) {  // Check if we already processed this event  const existing = await db.query(  'SELECT id FROM processed_webhooks WHERE event_id = $1',  [event.id]  );

if (existing.rows.length > 0) { console.log(Skipping duplicate event: ${event.id}); return; // idempotent: no-op on duplicate }

// Process the event await processEvent(event);

// Record that we processed it await db.query( 'INSERT INTO processed_webhooks (event_id, processed_at) VALUES ($1, NOW())', [event.id] ); }`

Enter fullscreen mode

Exit fullscreen mode

Upsert Instead of Insert

When creating records from webhook data, use upsert (insert-or-update) instead of plain insert:

-- BAD: fails or creates duplicate on retry INSERT INTO subscriptions (stripe_id, user_id, status, plan) VALUES ($1, $2, $3, $4);

-- GOOD: idempotent, safe to run multiple times INSERT INTO subscriptions (stripe_id, user_id, status, plan) VALUES ($1, $2, $3, $4) ON CONFLICT (stripe_id) DO UPDATE SET status = EXCLUDED.status, plan = EXCLUDED.plan;`

Enter fullscreen mode

Exit fullscreen mode

Use Database Transactions with Idempotency Key

For more complex operations, wrap the idempotency check and business logic in a transaction:

async function handleWebhookIdempotent(eventId, operation) {  return await db.transaction(async (trx) => {  // Atomic check-and-insert prevents race conditions on concurrent retries  const result = await trx.raw(
INSERT INTO processed_webhooks (event_id, processed_at) VALUES (?, NOW()) ON CONFLICT (event_id) DO NOTHING RETURNING id , [eventId]);

if (result.rows.length === 0) { // Already processed — skip return null; }

// Run business logic inside the same transaction return await operation(trx); }); }`

Enter fullscreen mode

Exit fullscreen mode

Rule 3: Return the Right HTTP Status Codes

Your response code tells the provider whether to retry. Use it correctly:

Status Meaning Provider behavior

200-299 Success No retry

400 Bad request (your choice not to process) Providers usually stop retrying

401/403 Unauthorized Providers usually stop retrying

500-503 Your server error Provider retries

Timeout No response in time Provider retries

The key distinction: use 5xx when the error is transient (database temporarily down, external API timeout) and 4xx when the error is permanent (invalid payload format, unsupported event type).

app.post('/webhook', async (req, res) => {  let event;

// Signature verification failure: return 400, don't want retry try { event = verifyAndParseWebhook(req.body, req.headers); } catch (err) { return res.status(400).json({ error: 'Invalid signature' }); }

// Unknown event type: return 200, don't retry if (!supportedEvents.includes(event.type)) { return res.status(200).json({ received: true, skipped: true }); }

// Queue for async processing, return 200 fast try { await queue.add(event); return res.status(200).json({ received: true }); } catch (err) { // Queue is down: return 503 so provider retries later return res.status(503).json({ error: 'Service unavailable' }); } });`

Enter fullscreen mode

Exit fullscreen mode

Rule 4: Handle Out-of-Order Delivery

Providers do not guarantee that webhooks arrive in the order events occurred. A customer.subscription.updated event might arrive before the customer.subscription.created event for the same subscription.

Design your handlers to work regardless of order:

async function handleSubscriptionEvent(event) {  const sub = event.data.object;

if (event.type === 'customer.subscription.updated') { // Don't assume the subscription already exists in your DB await db.query(

 INSERT INTO subscriptions (stripe_id, status, plan, updated_at)  VALUES ($1, $2, $3, NOW())  ON CONFLICT (stripe_id)  DO UPDATE SET  status = EXCLUDED.status,  plan = EXCLUDED.plan,  updated_at = EXCLUDED.updated_at  WHERE subscriptions.updated_at < EXCLUDED.updated_at 
, [sub.id, sub.status, sub.items.data[0].price.id]); } }`

Enter fullscreen mode

Exit fullscreen mode

The WHERE subscriptions.updated_at < EXCLUDED.updated_at clause handles the case where an older event arrives after a newer one — it will not overwrite newer data with stale data.

Rule 5: Log Everything

Log enough to reconstruct what happened to any webhook event without going back to the provider's dashboard:

const logger = require('pino')();

app.post('/webhook', async (req, res) => { const eventId = req.headers['stripe-event-id'] ?? 'unknown'; const eventType = req.body?.type ?? 'unknown';

logger.info({ eventId, eventType }, 'Webhook received');

try { await queue.add({ event: req.body }); logger.info({ eventId, eventType }, 'Webhook queued'); res.json({ received: true }); } catch (err) { logger.error({ eventId, eventType, err }, 'Failed to queue webhook'); res.status(503).json({ error: 'Unavailable' }); } });

// In your queue worker queue.process(async (job) => { const { event } = job.data; logger.info({ eventId: event.id, type: event.type, attempt: job.attemptsMade }, 'Processing webhook');

try { await processEvent(event); logger.info({ eventId: event.id }, 'Webhook processed successfully'); } catch (err) { logger.error({ eventId: event.id, err }, 'Webhook processing failed'); throw err; // let the queue retry } });`

Enter fullscreen mode

Exit fullscreen mode

Rule 6: Monitor Webhook Health

Failed webhooks are silent by default. Set up monitoring:

  • Check provider dashboards — Stripe, GitHub, and Shopify all show webhook delivery history. Check them regularly or set up alerts.

  • Alert on queue depth — If your webhook queue grows, something is wrong upstream.

  • Track error rates — Log a counter whenever a webhook handler fails. Alert if the error rate spikes.

  • Set up dead letter queues — Events that fail after all retries should go to a dead letter queue for manual inspection, not disappear silently.

// BullMQ dead letter queue example const queue = new Queue('webhooks'); const worker = new Worker('webhooks', processWebhook, {  attempts: 5,  backoff: { type: 'exponential', delay: 1000 }, });

worker.on('failed', (job, err) => { if (job.attemptsMade >= job.opts.attempts) { // Move to dead letter queue deadLetterQueue.add('failed-webhook', { event: job.data.event, error: err.message, failedAt: new Date().toISOString(), }); } });`

Enter fullscreen mode

Exit fullscreen mode

Testing Webhook Handling with HookCap

HookCap makes it easy to test these patterns before production:

  • Capture real webhook payloads — Point your provider to a HookCap endpoint to collect real events. Inspect headers, body structure, and signature format.

  • Test retry handling — Use HookCap's replay feature to send the same event to your handler multiple times. Verify that your idempotency logic prevents duplicate processing.

  • Test error recovery — Replay a captured event to a handler you deliberately break (return 500). Watch how your queue retries it. Fix the handler and replay again.

  • Simulate out-of-order delivery — Capture a sequence of related events and replay them in reverse order to verify your handler processes them correctly.

The replay feature is especially useful for idempotency testing: you can replay the same event ID dozens of times and confirm your database shows exactly one processed record each time.

Summary

Production webhook handlers need:

  • Fast acknowledgment — Return 200 immediately, process async

  • Idempotency — Track event IDs, use upserts, handle duplicate deliveries

  • Correct status codes — 5xx for transient errors (retry-worthy), 4xx for permanent errors

  • Order independence — Design DB writes to handle out-of-order events

  • Comprehensive logging — Log receipt, queuing, processing, and failures

  • Dead letter queues — Capture events that exhaust all retries

Most webhook failures come down to missing one of these. Add them to your integration checklist before going to production.

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelavailableupdate

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Webhook Bes…modelavailableupdateproductservicefeatureDEV Communi…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 136 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!