How to build deterministic agentic AI with state machines in n8n

The current agentic AI hype cycle often pitches the idea of autonomous agents: give an LLM a goal, a set of tools, and let it figure out the rest. While this is great for creative tasks, it’s sometimes dangerous for business processes that require strict logic.

For example: if you’re building an automated qualification system for high-ticket sales (like luxury travel, real estate, or insurance), “hallucination” isn’t just a quirky bug; it’s lost revenue. You can’t have an AI agent promising a custom itinerary to a lead with a $500 budget, or forgetting to collect an email address because it got distracted chatting about the weather.

In these high-value environments, we need probabilistic understanding (LLMs interpreting natural language) but deterministic routing (hard-coded logic controlling the flow).

This tutorial explores how to build a robust, state-machine-driven lead qualification system using n8n, a persistent data layer (n8n data tables), and an external CRM (GoHighLevel). We will move beyond stateless webhooks to create a system that “remembers” exactly where a user is in a complex application flow.

The architecture: Why state machines?

Most WhatsApp or SMS automation relies on stateless webhooks. When a user sends a message, your server receives a JSON payload. It doesn’t natively know if this is the user’s first “Hello” or if they are answering question #3 about their budget.

A naive approach involves sending the entire chat history to an OpenAI API and asking, “What should we do next?” This is expensive, slow, and prone to logic breaks (e.g., the user tricks the bot into skipping qualification).

A better architectural pattern is the Finite State Machine (FSM).

In this pattern, the AI is downgraded from “Decision Maker” to “Data Processor.” The application logic is handled by a router that checks a persistent database.

The data flow

Trigger: Webhook receives a message (via Evolution API).
Context Fetch: System queries the database using the sender’s phone number.
State Check: We retrieve the user’s current_state cursor.
Routing: A Switch node directs traffic based only on that state.
Execution: AI extracts data or classifies intent.
Persistence: We update the database with the new state.

Step 1: Designing the persistence layer

For this workflow, we use n8n’s internal data tables to act as our state store. If you are scaling to millions of rows, you would swap this for Supabase or Redis, but the logic remains identical.

We need a schema that tracks the user’s progress and the data collected so far.

The schema:

phone (primary key): The unique identifier
state: The cursor tracking position (e.g., waiting_for_type)
trip_type: The extracted intent (Custom, Resort, Flight)
budget: The integer value of the user’s spending power
is_qualified: Boolean flag based on budget threshold
email: The user’s contact info

When a webhook arrives, we perform an Upsert operation. If the phone number doesn’t exist, we create a new row with state: null. If it exists, we return the current row. This ensures idempotency; sending “Hello” twice doesn’t break the flow.

Step 2: The router (Switch Node)

This is the brain of the operation. Unlike an AI agent, which guesses the next step, the Router knows the next step.

In n8n, we implement this using a Switch Node connected to the output of our data table lookup. We define the following strict routes based on the state column:

Empty / Null: Route to Greeting.
waiting_for_type: Route to Trip Classification
waiting_for_budget: Route to Budget Extraction
waiting_for_email: Route to Email Validation
waiting_for_booking_link_sent: Route to Final Confirmation

This creates a linear dependency. A user cannot jump to the email step without passing through the budget check.

Step 3: Constraining AI (classification & extraction)

Now that we have routed the user to the correct step, we invoke the LLM. However, we strictly limit its scope. We don’t want a conversation; we want structured data.

State A: `waiting_for_type` (Classification)

The user has just received the greeting. They might reply, “I want to plan a honeymoon in Bali,” or “Just looking for cheap flights.”

We use an AI Text Classifier node (LangChain integration) with a strict schema. We map vague natural language to our internal Enums:

Input: “We want a fully organized tour of Japan.”
Categories:
- A: Description: “Full, custom-built itinerary”
- B: Description: “Luxury resort or cruise”
- C: Description: “Flight booking or simple hotel”

The Logic:

If the output is A (Custom), we update the data table:

trip_type: A
state: waiting_for_budget

We then send a hard-coded message: “Amazing! Custom itineraries are our specialty. What is the approximate budget per person you’ve set aside?”

State B: `waiting_for_budget` (Extraction)

The user is now in the waiting_for_budget state. They reply: “We are thinking around 10k, maybe 12.”

We use an AI Information Extractor node. Crucially, we use a system prompt to force JSON output and handle edge cases (like “k” notation).

System prompt:

You are a data extraction specialist.
Input: User text.
Task: Extract 'budget' as an integer and 'qualified' status.
Qualification Rules:
1. If budget >= 10,000, qualified = true.
2. If budget < 10,000, qualified = false.
3. If user says "10k", output 10000.

Business logic (The guardrails):

Immediately after extraction, we update the data table. We then split the flow using an If Node:

Qualified (True): Update state to waiting_for_email. Ask for contact info to send the booking link.
Unqualified (False): Update state to waiting_for_email_unqualified. We will still ask for an email, but we will send a “DIY Guide” instead of a meeting link.

This keeps the decision-making logic in our control, avoiding the risk of an AI agent “being nice” and booking a meeting for an unqualified lead.

Step 4: Input sanitization and loops

The most fragile part of any bot is identifying specific strings like emails. Users make typos (e.g., alex@gmail,com). If we rely solely on AI extraction, it might hallucinate a valid email or pass the bad string to our CRM.

In the waiting_for_email state, we implement a validation loop.

The Code node

After the AI extracts the email candidate, we pass it through a JavaScript Code node running a Regex check:

// n8n Code Node
const email = items[0].json.output.email;
// Comprehensive Regex for email validation
const tester = /^[-!#$%&'*+/0-9=?A-Z^_a-z{|}~](.?[-!#$%&'*+/0-9=?A-Z^_a-z`{|}~])*@[a-zA-Z0-9](-*.?[a-zA-Z0-9])*.[a-zA-Z](-?[a-zA-Z0-9])+$/;

if (!email || email.length > 254) {
  return { json: { isValid: false } };
}

const valid = tester.test(email);
return { json: { isValid: valid, email: email } };

The logic loop

We follow this with an If Node checking isValid:

If True: Update the database with the email, push data to the CRM, and advance the state.
If False: Send a re-prompt message: “Oops! It looks like that isn’t a valid email format. Could you check the spelling?”

Crucially, we do NOT update the state in the False path.

This means the user remains in waiting_for_email. Their next message will trigger the webhook, hit the Router, find waiting_for_email, and run the validation logic again. This creates an infinite retry loop until valid data is provided.
<h2id=”5-crm-integration”>Step 5: CRM integration & OAuth scopes

Once the local state machine has collected all necessary data (trip_type, budget, email), we sync it to the external system—in this case, GoHighLevel (GHL).

While n8n has a built-in GHL node, advanced custom fields (like Trip Type) require careful API configuration.

The OAuth scope pitfall

When connecting n8n to GHL (or many modern CRMs), the default app permissions often only include standard contact read/write scopes. To sync the data collected by our state machine, you must explicitly request custom field scopes during the handshake.

Ensure your connected app includes:

custom_fields.readonly
custom_fields.write

Without these, your workflow will execute successfully (returning a 200 OK), but the custom fields in the CRM will remain empty, failing silently.

Step 6: The async confirmation

The final state, waiting_for_booking_link_sent, handles an asynchronous dependency.

The booking link is generated by the CRM automation, not inside n8n:

n8n pushes the contact to GHL
n8n updates the local state to waiting_for_booking_link_sent
The workflow ends

We do not wait for the response immediately. Instead, we rely on a callback.

Inside GHL, an automation triggers when the Qualified tag is added. It generates the calendar link and fires a Webhook back to n8n.

n8n receives this new webhook, uses the phone number to find the user in our data table, verifies they are in the waiting_for_booking_link_sent state, and delivers the final WhatsApp message:

“Done! The booking link is on its way to your email, but you can also find it here: [Link]”

Conclusion

By treating LLMs as functional components within a rigid architecture rather than autonomous agents, we gain:

Determinism: We know exactly what the bot will do in every state.
Resilience: If the API crashes, the state is saved in the DB. The user can pick up exactly where they left off.
Data integrity: Regex and validation loops ensure no garbage data enters the CRM.

This architecture turns a chatbot into a reliable engineering system, suitable for any high-ticket or regulated industry where precision matters more than conversation.

The post How to build deterministic agentic AI with state machines in n8n appeared first on LogRocket Blog.

This post first appeared on Read More