Products model available update product service feature

Webhook Best Practices: Retry Logic, Idempotency, and Error Handling

DEV Communityby Henry HangApril 2, 20268 min read1 views

<h1> Webhook Best Practices: Retry Logic, Idempotency, and Error Handling </h1> <p>Most webhook integrations fail silently. A handler returns 500, the provider retries a few times, then stops. Your system never processed the event and no one knows.</p> <p>Webhooks are not guaranteed delivery by default. How reliably your integration works depends almost entirely on how you write the receiver. This guide covers the patterns that make webhook handlers production-grade: proper retry handling, idempotency, error response codes, and queue-based processing.</p> <h2> Understand the Delivery Model </h2> <p>Before building handlers, understand what you are dealing with:</p> <ul> <li>Providers send webhook events as HTTP POST requests</li> <li>They expect a 2xx response within a timeout (typically 5

Webhook Best Practices: Retry Logic, Idempotency, and Error Handling

Most webhook integrations fail silently. A handler returns 500, the provider retries a few times, then stops. Your system never processed the event and no one knows.

Webhooks are not guaranteed delivery by default. How reliably your integration works depends almost entirely on how you write the receiver. This guide covers the patterns that make webhook handlers production-grade: proper retry handling, idempotency, error response codes, and queue-based processing.

Understand the Delivery Model

Before building handlers, understand what you are dealing with:

Providers send webhook events as HTTP POST requests
They expect a 2xx response within a timeout (typically 5-30 seconds)
If they do not receive 2xx, they retry on a schedule (often exponential backoff over hours or days)
Most providers have a maximum retry count after which the event is dropped
Some providers allow you to manually retry from their dashboard

Stripe retry schedule: Attempt 1: immediate Attempt 2: 5 minutes Attempt 3: 30 minutes Attempt 4: 2 hours Attempt 5: 5 hours Attempt 6: 10 hours Attempt 7: 24 hours ... continues for ~72 hours total

Stripe retry schedule: Attempt 1: immediate Attempt 2: 5 minutes Attempt 3: 30 minutes Attempt 4: 2 hours Attempt 5: 5 hours Attempt 6: 10 hours Attempt 7: 24 hours ... continues for ~72 hours total

Enter fullscreen mode

Exit fullscreen mode

This retry behavior is your safety net -- but only if your handler is idempotent.

Rule 1: Respond Fast, Process Async

Your webhook handler should acknowledge receipt immediately and do the actual work in the background. If you do database writes, call external APIs, or send emails synchronously inside the handler, you risk timing out.

// BAD: synchronous processing risks timeout app.post('/webhook/stripe', async (req, res) => {  const event = JSON.parse(req.body);

// BAD: synchronous processing risks timeout app.post('/webhook/stripe', async (req, res) => {  const event = JSON.parse(req.body);

if (event.type === 'payment_intent.succeeded') { // This could take several seconds await fulfillOrder(event.data.object); await sendConfirmationEmail(event.data.object.metadata.email); await updateInventory(event.data.object.metadata.items); }

res.json({ received: true }); // might never get here if above throws });

// GOOD: acknowledge immediately, process async app.post('/webhook/stripe', async (req, res) => { const event = JSON.parse(req.body);

// Queue the work — respond in milliseconds await queue.add('stripe-webhook', { event });

res.json({ received: true }); // always returns 200 fast });

// Worker processes the queue queue.process('stripe-webhook', async (job) => { const { event } = job.data; if (event.type === 'payment_intent.succeeded') { await fulfillOrder(event.data.object); await sendConfirmationEmail(event.data.object.metadata.email); await updateInventory(event.data.object.metadata.items); } });`

Enter fullscreen mode

Exit fullscreen mode

The queue gives you retry logic, failure visibility, and async processing without blocking the HTTP response.

Rule 2: Make Handlers Idempotent

Since providers retry webhooks, your handler may receive the same event multiple times. You must make your handler safe to run more than once with the same event ID.

Without idempotency, a network blip that causes Stripe to retry a payment_intent.succeeded event could charge a customer twice, create duplicate orders, or send duplicate emails.

Track Processed Event IDs

The simplest approach: store event IDs and skip events you have already processed.

async function handleStripeEvent(event) {  // Check if we already processed this event  const existing = await db.query(  'SELECT id FROM processed_webhooks WHERE event_id = $1',  [event.id]  );

async function handleStripeEvent(event) {  // Check if we already processed this event  const existing = await db.query(  'SELECT id FROM processed_webhooks WHERE event_id = $1',  [event.id]  );

if (existing.rows.length > 0) { console.log(Skipping duplicate event: ${event.id}); return; // idempotent: no-op on duplicate }

// Process the event await processEvent(event);

// Record that we processed it await db.query( 'INSERT INTO processed_webhooks (event_id, processed_at) VALUES ($1, NOW())', [event.id] ); }`

Enter fullscreen mode

Exit fullscreen mode

Upsert Instead of Insert

When creating records from webhook data, use upsert (insert-or-update) instead of plain insert:

-- BAD: fails or creates duplicate on retry INSERT INTO subscriptions (stripe_id, user_id, status, plan) VALUES ($1, $2, $3, $4);

-- BAD: fails or creates duplicate on retry INSERT INTO subscriptions (stripe_id, user_id, status, plan) VALUES ($1, $2, $3, $4);

-- GOOD: idempotent, safe to run multiple times INSERT INTO subscriptions (stripe_id, user_id, status, plan) VALUES ($1, $2, $3, $4) ON CONFLICT (stripe_id) DO UPDATE SET status = EXCLUDED.status, plan = EXCLUDED.plan;`

Enter fullscreen mode

Exit fullscreen mode

Use Database Transactions with Idempotency Key

For more complex operations, wrap the idempotency check and business logic in a transaction:

async function handleWebhookIdempotent(eventId, operation) {  return await db.transaction(async (trx) => {  // Atomic check-and-insert prevents race conditions on concurrent retries  const result = await trx.raw(

async function handleWebhookIdempotent(eventId, operation) {  return await db.transaction(async (trx) => {  // Atomic check-and-insert prevents race conditions on concurrent retries  const result = await trx.raw(

INSERT INTO processed_webhooks (event_id, processed_at) VALUES (?, NOW()) ON CONFLICT (event_id) DO NOTHING RETURNING id , [eventId]);

if (result.rows.length === 0) { // Already processed — skip return null; }

// Run business logic inside the same transaction return await operation(trx); }); }`

Enter fullscreen mode

Exit fullscreen mode

Rule 3: Return the Right HTTP Status Codes

Your response code tells the provider whether to retry. Use it correctly:

Status Meaning Provider behavior

200-299 Success No retry

400 Bad request (your choice not to process) Providers usually stop retrying

401/403 Unauthorized Providers usually stop retrying

500-503 Your server error Provider retries

Timeout No response in time Provider retries

The key distinction: use 5xx when the error is transient (database temporarily down, external API timeout) and 4xx when the error is permanent (invalid payload format, unsupported event type).

app.post('/webhook', async (req, res) => {  let event;

app.post('/webhook', async (req, res) => {  let event;

// Signature verification failure: return 400, don't want retry try { event = verifyAndParseWebhook(req.body, req.headers); } catch (err) { return res.status(400).json({ error: 'Invalid signature' }); }

// Unknown event type: return 200, don't retry if (!supportedEvents.includes(event.type)) { return res.status(200).json({ received: true, skipped: true }); }

// Queue for async processing, return 200 fast try { await queue.add(event); return res.status(200).json({ received: true }); } catch (err) { // Queue is down: return 503 so provider retries later return res.status(503).json({ error: 'Service unavailable' }); } });`

Enter fullscreen mode

Exit fullscreen mode

Rule 4: Handle Out-of-Order Delivery

Providers do not guarantee that webhooks arrive in the order events occurred. A customer.subscription.updated event might arrive before the customer.subscription.created event for the same subscription.

Design your handlers to work regardless of order:

async function handleSubscriptionEvent(event) {  const sub = event.data.object;

async function handleSubscriptionEvent(event) {  const sub = event.data.object;

if (event.type === 'customer.subscription.updated') { // Don't assume the subscription already exists in your DB await db.query(

 INSERT INTO subscriptions (stripe_id, status, plan, updated_at)  VALUES ($1, $2, $3, NOW())  ON CONFLICT (stripe_id)  DO UPDATE SET  status = EXCLUDED.status,  plan = EXCLUDED.plan,  updated_at = EXCLUDED.updated_at  WHERE subscriptions.updated_at < EXCLUDED.updated_at

 INSERT INTO subscriptions (stripe_id, status, plan, updated_at)  VALUES ($1, $2, $3, NOW())  ON CONFLICT (stripe_id)  DO UPDATE SET  status = EXCLUDED.status,  plan = EXCLUDED.plan,  updated_at = EXCLUDED.updated_at  WHERE subscriptions.updated_at < EXCLUDED.updated_at

, [sub.id, sub.status, sub.items.data[0].price.id]); } }`

Enter fullscreen mode

Exit fullscreen mode

The WHERE subscriptions.updated_at < EXCLUDED.updated_at clause handles the case where an older event arrives after a newer one — it will not overwrite newer data with stale data.

Rule 5: Log Everything

Log enough to reconstruct what happened to any webhook event without going back to the provider's dashboard:

const logger = require('pino')();

app.post('/webhook', async (req, res) => { const eventId = req.headers['stripe-event-id'] ?? 'unknown'; const eventType = req.body?.type ?? 'unknown';

logger.info({ eventId, eventType }, 'Webhook received');

try { await queue.add({ event: req.body }); logger.info({ eventId, eventType }, 'Webhook queued'); res.json({ received: true }); } catch (err) { logger.error({ eventId, eventType, err }, 'Failed to queue webhook'); res.status(503).json({ error: 'Unavailable' }); } });

// In your queue worker queue.process(async (job) => { const { event } = job.data; logger.info({ eventId: event.id, type: event.type, attempt: job.attemptsMade }, 'Processing webhook');

try { await processEvent(event); logger.info({ eventId: event.id }, 'Webhook processed successfully'); } catch (err) { logger.error({ eventId: event.id, err }, 'Webhook processing failed'); throw err; // let the queue retry } });`

Enter fullscreen mode

Exit fullscreen mode

Rule 6: Monitor Webhook Health

Failed webhooks are silent by default. Set up monitoring:

Check provider dashboards — Stripe, GitHub, and Shopify all show webhook delivery history. Check them regularly or set up alerts.
Alert on queue depth — If your webhook queue grows, something is wrong upstream.
Track error rates — Log a counter whenever a webhook handler fails. Alert if the error rate spikes.
Set up dead letter queues — Events that fail after all retries should go to a dead letter queue for manual inspection, not disappear silently.

// BullMQ dead letter queue example const queue = new Queue('webhooks'); const worker = new Worker('webhooks', processWebhook, {  attempts: 5,  backoff: { type: 'exponential', delay: 1000 }, });

// BullMQ dead letter queue example const queue = new Queue('webhooks'); const worker = new Worker('webhooks', processWebhook, {  attempts: 5,  backoff: { type: 'exponential', delay: 1000 }, });

worker.on('failed', (job, err) => { if (job.attemptsMade >= job.opts.attempts) { // Move to dead letter queue deadLetterQueue.add('failed-webhook', { event: job.data.event, error: err.message, failedAt: new Date().toISOString(), }); } });`

Enter fullscreen mode

Exit fullscreen mode

Testing Webhook Handling with HookCap

HookCap makes it easy to test these patterns before production:

Capture real webhook payloads — Point your provider to a HookCap endpoint to collect real events. Inspect headers, body structure, and signature format.
Test retry handling — Use HookCap's replay feature to send the same event to your handler multiple times. Verify that your idempotency logic prevents duplicate processing.
Test error recovery — Replay a captured event to a handler you deliberately break (return 500). Watch how your queue retries it. Fix the handler and replay again.
Simulate out-of-order delivery — Capture a sequence of related events and replay them in reverse order to verify your handler processes them correctly.

The replay feature is especially useful for idempotency testing: you can replay the same event ID dozens of times and confirm your database shows exactly one processed record each time.

Summary

Production webhook handlers need:

Fast acknowledgment — Return 200 immediately, process async
Idempotency — Track event IDs, use upserts, handle duplicate deliveries
Correct status codes — 5xx for transient errors (retry-worthy), 4xx for permanent errors
Order independence — Design DB writes to handle out-of-order events
Comprehensive logging — Log receipt, queuing, processing, and failures
Dead letter queues — Capture events that exhaust all retries

Most webhook failures come down to missing one of these. Add them to your integration checklist before going to production.

Original source

DEV Community

https://dev.to/henry_hang/webhook-best-practices-retry-logic-idempotency-and-error-handling-27i3

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelavailableupdate

ReleasesRecent

Microsoft issues emergency update for Windows 11 — fixes broken March preview update rollout from last week

tomshardware.com

1mabout 24 hours ago

ReleasesRecent

Start Small to Build Value through Digital Twin

The Future Ready podcast launched recently, offering a new channel where listeners can hear from Siemens and guest experts as they discuss the key technologies, industry trends and other drivers of today’s rapidly changing industrial landscape. The podcast has already featured conversations on the transition to software-defined automation, the immense potential of Industrial AI and [ ]

blog.siemens.com

1mabout 14 hours ago

ModelsLive

Building a Real-Time AI Dungeon Master with Claude API, Socket.io, and Next.js 16

Live Demo The AI in gaming market sits at $4.54 billion in 2025 and is projected to hit $81.19 billion by 2035 ( SNS Insider , 2025). That number isn't surprising when you think about what generative AI actually unlocks for games infinite narrative branching, dynamic NPCs, and a Dungeon Master who never gets tired at midnight. I built DnD AI, a multiplayer AI Dungeon Master running on Next.js 16, Claude API (claude-sonnet-4-6), Socket.io, and DALL-E 3. This post is a technical walkthrough of the six hardest problems I ran into, and how I solved them. No fluff just the architecture decisions that actually mattered. TL;DR: Next.js App Router can't maintain persistent WebSockets, so a custom server.ts boots Socket.io and Next.js in one process Claude streaming output pipes through Socket.io t

DEV Community

13m30 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 136 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

Webhook Best Practices: Retry Logic, Idempotency, and Error Handling

Webhook Best Practices: Retry Logic, Idempotency, and Error Handling

Understand the Delivery Model

Rule 1: Respond Fast, Process Async

Rule 2: Make Handlers Idempotent

Track Processed Event IDs

Upsert Instead of Insert

Use Database Transactions with Idempotency Key

Rule 3: Return the Right HTTP Status Codes

Rule 4: Handle Out-of-Order Delivery

Rule 5: Log Everything

Rule 6: Monitor Webhook Health

Testing Webhook Handling with HookCap

Summary

Daily AI Digest

More about

Microsoft issues emergency update for Windows 11 — fixes broken March preview update rollout from last week

Start Small to Build Value through Digital Twin

Building a Real-Time AI Dungeon Master with Claude API, Socket.io, and Next.js 16

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Products

Microsoft Copilot is now injecting ads into pull requests on GitHub - Neowin

AI’s Inevitable Robotics Integration and Use by Knuckleheads - Electronic Design

This Is The Year Your Company Hires a Robot - inc.com

Qualcomm (QCOM) Stock: Dips Slightly Despite Expanding South Korea Startup Pilot Program Push - CoinCentral