Idempotency for Carrier APIs: Preventing Duplicates During Outages and Retries

Koen M. Vermeulen

10 Sep 2025 — 7 min read

Your carrier experiences an outage. You're left unable to calculate shipping rates and customers aren't impressed. Label generation duplicates when your retry logic fires twice. Rate shopping fails when your cache logic stores partial responses. Welcome to the carrier integration reliability challenge.

Standard idempotency patterns weren't designed for the specific chaos of carrier APIs. You're dealing with services that bill per transaction, partial failures that look successful, and webhook deliveries that arrive out of order. In 2024, 153 carrier outages occurred across major providers, making robust idempotency not just good engineering practice but business survival.

The Carrier Integration Idempotency Challenge

Most middleware vendors approach idempotency like a standard web API problem. Hash the request, store the response, call it done. But carrier integrations break this model in three specific ways.

First, carriers charge for every API call. A duplicate UPS label creation costs you $0.25 plus the actual shipping cost. At scale, poor idempotency design becomes an expensive lesson. All the marketing dollars spent to get customers to make it to the checkout page go to waste if your page fails to load shipping rates. Even an hour of downtime can result in significant revenue loss during peak season.

Second, carrier responses often contain partial success states. FedEx might return a tracking number but fail to generate the PDF label. USPS could create the shipment but timeout before confirming pickup scheduling. Your idempotency layer needs to understand these partial states, not just HTTP status codes.

Third, carrier webhooks don't respect your neat ordering assumptions. A "shipment delivered" event might arrive before "shipment in transit." Package status updates can come in batches, hours later, with duplicate IDs but different timestamps. Standard timestamp-based deduplication fails spectacularly here.

EasyPost handles this through outage detection processes that prevent their APIs from waiting for carrier timeouts. In the event of a high volume of requests without outage detection in place, processing those timeout responses would pose a significant risk to our system. For this reason, when outages are detected, we update API responses to send back immediate error messages indicating the outage. ShipStation, nShift, and Cargoson each have different approaches to this problem, but all recognise that carrier-specific idempotency needs go beyond generic patterns.

Failure Modes That Demand Idempotency

Carrier API failures cluster into patterns you can design around. Like any business, website, or service, carriers like UPS, USPS and FedEx are not immune to issues or the need for maintenance. During an outage, no one can access rates from a carrier. This includes both retail customers, and third-party services like ShipperHQ.

Network timeouts during label creation create the classic "did it work?" scenario. Your HTTP client times out after 30 seconds, but UPS actually processed the request and created the label. Your retry logic fires, creating a second label and doubling your shipping costs. This isn't theoretical: it happens during peak season when carrier APIs are under load.

Partial carrier responses are more insidious. DHL returns HTTP 200 with a shipment ID but includes an error flag buried in the JSON response indicating the pickup request failed. Your code sees success, stores the result, then retries the pickup creation. DHL now has two pickup requests for the same shipment.

Webhook delivery failures compound the problem. Carrier status updates arrive, your endpoint is down for deployment, the webhook gets queued for retry. Meanwhile, your polling job fetches the same status update. Now you have duplicate "delivered" events triggering duplicate customer notifications.

We are currently seeing increased errors with label generation appears regularly in status pages across shipping APIs. When these incidents resolve, the queued retry storms can overwhelm your idempotency systems if they're designed around web app patterns rather than carrier-specific failure modes.

Idempotency Key Design for Multi-Carrier Systems

The key insight: carrier integration idempotency keys need to be semantic, not just unique. Prevent duplicate requests by allowing the Consumer of a Service to send a value that represents the uniqueness of a request, so that no request with the same unique value is attempted more than once.

For label generation, your key should encode the shipper, recipient addresses, package weight, and carrier service. Hash these together with your tenant ID. This prevents creating duplicate labels for identical shipments even if the original request IDs differ. But include a time window: the same shipment requested a week later should generate a new label.

Rate shopping requires different key construction. Include the origin-destination pair, package details, and requested service types. But exclude timestamps shorter than your cache TTL. You want identical rate requests within a 5-minute window to return cached results, not trigger new carrier API calls.

Shippo's approach balances carrier variability with consistent key generation across their platform. ProShip focuses on enterprise routing complexity. Manhattan Active handles multi-tenant key namespacing for large retailers. Cargoson specialises in European carrier peculiarities where address validation rules differ significantly between countries.

Rate Shopping Idempotency Windows

Rate shopping presents a unique idempotency challenge: you want to avoid duplicate API calls within minutes, but rates can legitimately change throughout the day due to fuel surcharges or carrier pricing updates.

Design your keys with cascading TTLs. Short-term deduplication (5 minutes) uses request content hash. Medium-term caching (2 hours) includes date but excludes time. Long-term prevention (24 hours) adds carrier-specific pricing cycle indicators.

When a carrier API fails during rate shopping, your fallback logic needs idempotency awareness. Don't just return cached rates; check whether the cache entry was generated during a known carrier outage. Mark those entries differently and expire them when the carrier recovers.

Label Generation: The Critical Path

Label creation is the highest-stakes idempotency requirement in carrier integrations. For example, charging a customer money cannot be cleanly rolled back once completed. The same applies to shipping labels: once created, most carriers charge you whether you use them or not.

Pre-generation validation becomes crucial. Before calling the carrier API, store your idempotency key with a "PENDING" status. If the same key arrives while one is pending, return a 409 status. Only after successful carrier response do you update to "COMPLETED".

Handle carrier confirmation correlation carefully. Some carriers return a pre-allocated tracking number immediately but generate the actual PDF asynchronously. Your idempotency logic needs to track both the API response and the eventual label file generation as separate completion states.

Webhook Delivery Idempotency at Scale

Implement a queue-based system for handling webhook calls. Instead of processing webhook events synchronously, push them to a message queue (like RabbitMQ or AWS SQS) where they are processed asynchronously. This approach decouples the webhook delivery from event processing.

Webhook idempotency differs from request idempotency because you're the consumer, not the producer. You can't control retry timing or duplicate detection at the source. Your defence is processing-time deduplication based on event characteristics.

Event ordering challenges multiply the complexity. FedEx might send "shipment created," "in transit," and "delivered" webhooks, but they arrive at your endpoint in reverse order due to retry logic. Your idempotency key must account for event type hierarchy, not just timestamp ordering.

Duplicate webhook handling requires understanding carrier-specific ID schemes. UPS tracking numbers are consistent, but their internal event IDs can repeat across different shipments. USPS event IDs are unique globally, but their webhooks can arrive hours after the event timestamp due to internal processing delays.

Message queue patterns help manage this chaos. Webhooks are usually delivered at-least-once. You will get duplicates. Push incoming webhooks to a queue with deduplication at the queue level, not the processing level. Include carrier-specific event signatures in your deduplication key: tracking number, event type, carrier timestamp, and tenant ID.

Convoy handles this through sophisticated webhook buffering with carrier-aware deduplication windows. ClickPost uses Redis-based deduplication with configurable expiry per carrier. Cargoson implements event ordering reconstruction for European carriers that send webhook batches with internal sequencing.

Implementation Patterns with Code Examples

Multi-tenant key storage requires careful database design. Create an `idempotency_keys` table with composite indexes on `(tenant_id, key_hash)` and `(tenant_id, created_at)` for efficient cleanup queries.

CREATE TABLE idempotency_keys ( id BIGSERIAL PRIMARY KEY, tenant_id UUID NOT NULL, key_hash VARCHAR(64) NOT NULL, operation_type VARCHAR(32) NOT NULL, status VARCHAR(16) NOT NULL DEFAULT 'PENDING', request_data JSONB, response_data JSONB, created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW(), UNIQUE(tenant_id, key_hash) );

Redis patterns work well for high-throughput scenarios. Use Redis Sets for duplicate detection with automatic expiry. Structure your keys hierarchically: `idempotency:{tenant_id}:{operation}:{hash}`.

Error recovery flows need special handling. When a carrier API returns a 5xx error, don't store that as a completed idempotent operation. Mark it as failed but retryable. Only 4xx errors (except 429 rate limiting) should prevent retries with the same idempotency key.

Monitoring approaches should track key collision rates, cleanup efficiency, and carrier-specific failure patterns. High collision rates might indicate clock drift between services or poorly designed key algorithms. Cleanup backlogs suggest database performance issues or incorrect retention policies.

SLO Math: Measuring Idempotency Effectiveness

Quantify your idempotency implementation through business metrics, not just technical ones. Track cost avoidance: how many duplicate label charges did your system prevent? Monitor customer impact: how many duplicate notification emails were suppressed?

153 carrier outages occurred—but Shippo's API maintained 99.95% uptime in 2024, showing how proper idempotency design contributes to overall reliability metrics. Your SLOs should include duplicate prevention rates alongside traditional uptime measures.

Recovery time objectives become more complex with idempotency. It's not just "how fast can we process requests again" but "how fast can we safely retry failed operations without creating duplicates." Include idempotency cleanup time in your RTO calculations.

Cost avoidance metrics matter for business case justification. Calculate monthly savings from prevented duplicate label charges, avoided webhook processing costs, and reduced customer service overhead from duplicate notifications. EasyPost's published reliability metrics show how this translates to customer retention. ShipEngine's status page demonstrates transparency around outage impact. Cargoson's approach focuses on European compliance requirements that make duplicate prevention legally mandated, not just technically desirable.

Edge Cases and Gotchas

Clock drift across services creates subtle idempotency failures. Your application server thinks it's 14:30:15, but your database server thinks it's 14:30:17. Time-based idempotency windows become unpredictable. Use logical timestamps or sequence numbers where possible.

Partial carrier API responses break naive idempotency logic. Royal Mail might return HTTP 200 with a JSON response containing both successful tracking number assignment and a nested error object indicating address validation warnings. Store the full response structure, not just the HTTP status, when determining completion state.

Cross-tenant key collisions become possible with certain hash algorithms. Two tenants creating shipments with identical content might generate the same idempotency key if you're not including tenant context in the hash input. Always prefix or incorporate tenant ID in key generation.

Even a minor downtime can significantly impact businesses and customers during peak season traffic spikes. Your idempotency systems need to handle 10x normal volume without degradation. Pre-allocate database connections, use Redis clustering, and implement circuit breakers around idempotency key storage.

Carrier maintenance windows often aren't announced with enough notice for your systems to prepare. Sometimes, carriers need to take their API offline to perform updates or maintenance. Typically, carriers will let you know about these planned outages ahead of time so you can prepare. But "ahead of time" might be 2 hours for a system that needs 6 hours to properly drain processing queues and update cache invalidation rules.

The key insight: build idempotency systems that assume carrier chaos, not ideal conditions. Your patterns should work when carriers are flaky, documentation is wrong, and webhooks arrive in impossible orders. That's not pessimistic engineering; that's realistic carrier integration.

Idempotency for Carrier APIs: Preventing Duplicates During Outages and Retries

Koen M. Vermeulen

The Carrier Integration Idempotency Challenge

Failure Modes That Demand Idempotency

Idempotency Key Design for Multi-Carrier Systems

Rate Shopping Idempotency Windows

Label Generation: The Critical Path

Webhook Delivery Idempotency at Scale

Implementation Patterns with Code Examples

SLO Math: Measuring Idempotency Effectiveness

Edge Cases and Gotchas

Read more

API Deprecation Early Warning Systems for Carrier Integration: Detecting Breaking Changes Before They Break Shipments

Webhook Retry Patterns for Carrier Integration: Building Resilient Event Processing at Scale

Production-Ready API Monitoring for Carrier Integration: Detecting Version Changes and Outages Before They Break Shipments

Taming OpenTelemetry Complexity in Carrier Integration: Production Patterns for Managing Data Volumes Without Breaking the Budget