In a perfect world, every API request gets a 200 OK, every network connection is flawless, and every service is available 100% of the time. But we build software for the real world—a world of transient network hiccups, temporary service overloads, and unexpected blips that can derail a critical process. For agentic workflows that execute business logic, a single failed step can bring the entire operation to a halt.
This is where a smart failure handling strategy becomes non-negotiable. Simply hoping for the best isn't an option. The most effective first line of defense against these temporary, or transient, failures is a well-configured retry policy.
At action.do, we believe that building robust, reliable workflows shouldn't require you to write complex, boilerplate error-handling logic for every step. That's why we've built intelligent retry mechanisms directly into the platform, allowing you to make your atomic actions resilient by default.
Agentic workflows are composed of a series of individual tasks or atomic actions. This could be anything from sending a welcome email, to generating an invoice, to updating a record in your CRM. While atomicity ensures that a single action either succeeds or fails completely, it doesn't prevent the failure itself.
Consider a user onboarding workflow:
If the API call in step 2 fails due to a momentary network issue, the entire workflow stops. The user has an account but no license and no welcome email. The process is left in an inconsistent state.
However, many of these failures are temporary. The external API might just be restarting a server or experiencing a brief traffic spike. Retrying the action after a short delay is often all that's needed to get the workflow back on track. This simple act of retrying transforms a fragile process into a resilient and self-healing one.
Implementing a good retry strategy involves more than just a for loop. You need to consider backoff delays (waiting longer between attempts) and jitter (randomizing the delay) to avoid overwhelming a struggling service.
action.do simplifies this by allowing you to declare your retry policy directly when you execute an action. We handle the complex orchestration of waiting, retrying, and logging for you.
Let's see it in action. Building on the basic execution example, here’s how you can add a robust retry policy to an atomic action:
import { Do } from '@do-sdk/core';
// Initialize the .do client with your API key
const a = new Do(process.env.DO_API_KEY);
// Execute an atomic action with a custom retry policy
const { result, error } = await a.action.execute({
name: 'provision-user-license',
params: {
userId: 'usr_12345',
plan: 'premium'
},
// NEW: Configure the retry policy for this action
retry: {
attempts: 5, // Try up to 5 times (1 initial + 4 retries)
backoff: 'exponential', // Use exponential backoff to increase delay
delay: 1000, // Start with a 1-second delay
jitter: true // Add randomness to prevent thundering herd issues
}
});
if (error) {
// This block only runs if the action has failed after all 5 attempts
console.error(`Action failed permanently after all retries:`, error);
// Here you can trigger a fallback or alert a human
} else {
console.log('Action Succeeded:', result);
}
Let's break down the retry configuration:
With this simple configuration, you've supercharged your atomic action. It will now automatically and intelligently retry in the face of transient errors, dramatically increasing the reliability of your entire workflow.
A crucial concept that makes retries safe is idempotency. An idempotent action can be executed multiple times with the same inputs and produce the same result as if it were executed only once.
This is a core principle for all actions built on the .do platform. Because actions are designed to be idempotent, you can enable retries without fear of dangerous side effects, like:
action.do is designed to help you build idempotent actions, giving you the confidence to apply retry policies aggressively and build truly fault-tolerant systems.
Finally, a retry policy is only as good as your ability to observe it. action.do doesn't just retry in the dark. Every single attempt—whether it succeeds or fails—is captured in an immutable audit log.
This gives you complete visibility into the health of your workflows. You can see exactly when an action failed, how many times it was retried, and when it ultimately succeeded. This detailed record is invaluable for debugging, monitoring, and proving the reliability of your Business-as-Code.
Stop letting transient errors derail your critical business processes. By combining atomic actions, idempotent design, and declarative retry policies, action.do provides the fundamental building blocks for unstoppable agentic workflows.