Custom Datasets

Custom Datasets are named, workspace-scoped tables that agents can read from and write to across runs. They are the persistent memory layer for AI Agents — the place where a run stores what it computed, checked, or decided so the next run can pick up where the last one left off.

Why agents need persistent memory

AI Agents are stateless by design. When a run finishes, the variables it computed — query results, LLM summaries, intermediate values — are discarded. The next run starts completely fresh.

For simple fire-and-forget automations this is fine. But real CSM workflows are inherently stateful:

Deduplication — "Has this account already received a pricing-concern email this month?" A run with no memory has no way to know. It will send the same email on every trigger.
Incremental processing — "Process only tickets that arrived since the last run." Without a stored cursor, the agent re-processes every ticket every time, wasting LLM calls and producing duplicate output.
State accumulation — "Build a risk profile for this account that grows richer each week." Each run needs to read what prior runs recorded and add to it rather than starting from nothing.
Cross-agent coordination — One agent produces a health classification; another agent later queries that classification to decide whether to send an alert. Without a shared store, the two runs cannot communicate.
Audit and traceability — Storing the outcome of each run (which accounts were processed, what the LLM decided, when the last outreach was sent) gives you a queryable history that informs dashboards and downstream agents.

Custom Datasets solve all of these by giving agents a durable, queryable store that survives across runs.

Dataset types

Type	Written by	Read by	Best for
Records	AI Agents (at runtime)	Agents + SQL queries	Persistent agent state, deduplication keys, accumulated results
Query	External data connection (via SQL)	Agents + SQL queries	Enriching agents with external CRM, warehouse, or product data

Records datasets

A Records dataset is a key-value store. Each record has a string key and a JSON payload (the record object). Agents create and update records using CALL steps; records persist indefinitely until explicitly deleted.

The record field uses merge semantics: upserting a record only overwrites the keys you provide, leaving all other keys intact. This lets different agent steps update different fields of the same record without collisions.

key: "acct_0015XY"
record: {
  "last_email_sent":    "2026-04-10",
  "email_count":        3,
  "risk_classification": "high",
  "last_processed_at":  "2026-05-03T08:00:00Z"
}

Query datasets

A Query dataset is a view over an external data source — a SQL query against a configured Data Connection (Snowflake, BigQuery, HubSpot, etc.). The dataset does not store its own rows; it executes the query at read time.

Query datasets are useful for enriching agent runs with data that lives outside FunnelStory — for example, pulling product usage metrics from a warehouse or open opportunities from a CRM.

How agents interact with datasets

Writing to a dataset (CALL steps)

Agents write records using three CALL step function IDs:

`dataset.record.upsert`

Creates a record or merges new fields into an existing one.

{
  "function_id": "dataset.record.upsert",
  "args": {
    "dataset": "renewal_outreach",
    "key": "{{ $.account_id }}",
    "record": {
      "last_sent_at": "{{ $.now }}",
      "subject":      "{{ $.email_subject }}",
      "sent_by":      "renewal-agent"
    }
  }
}

If a record for this key already exists, only last_sent_at, subject, and sent_by are updated — all other fields are preserved.

`dataset.record.set_field`

Updates a single field without touching any others. Useful for incrementing counters or toggling flags:

{
  "function_id": "dataset.record.set_field",
  "args": {
    "dataset": "renewal_outreach",
    "key":     "{{ $.account_id }}",
    "field":   "email_count",
    "value":   "{{ $.new_count }}"
  }
}

`dataset.record.delete`

Removes a record entirely by key. Use this to reset state after a terminal event (for example, after a renewal closes, remove the account's outreach tracking record):

{
  "function_id": "dataset.record.delete",
  "args": {
    "dataset": "renewal_outreach",
    "key":     "{{ $.account_id }}"
  }
}

Reading from a dataset (SQL queries)

Agents read datasets inside any SQL step using the dataset_records('dataset_name') virtual table. The dataset name is a required filter — queries that omit it are rejected.

-- Check whether this account has been contacted in the last 30 days
SELECT key, record->>'last_sent_at' AS last_sent_at
FROM   dataset_records('renewal_outreach')
WHERE  key = '{{ $.account_id }}'

Because dataset_records behaves like a regular table in semantic SQL, you can join it against other tables:

-- Find accounts that have a high health score but no outreach record yet
SELECT a.account_id, a.name, a.health_score
FROM   accounts a
LEFT JOIN dataset_records('renewal_outreach') dr ON dr.key = a.account_id
WHERE  a.health_score > 80
AND    dr.key IS NULL

This LEFT JOIN pattern is the canonical way to implement incremental processing — process only rows that are not yet present in the dataset.

Creating and managing datasets

Datasets are configured in Settings → Datasets (/configure/datasets).

Create a dataset

Click + Create dataset.
Enter a Name — this is the identifier used in agent CALL steps and dataset_records() queries. Names are immutable after creation.
Optionally add a Description to explain the dataset's purpose.
Choose a Source type:
- Records — agent-writable key-value store (no SQL query needed).
- Query — SQL-backed view; select a Data Connection and enter the SQL query.
Click Save.

Browse and edit records

For a Records dataset, click View records on the dataset list or inside the dataset's settings page. The records table shows every key and its JSON payload. You can sort and filter to inspect what agents have stored.

Delete a dataset

Deleting a dataset removes its configuration and all stored records permanently. Any agent CALL step that references it will start failing immediately.

Common patterns

Pattern 1 — Deduplication gate

Prevent an action from running more than once per account per time window.

1. CALL semantic.query
   → SELECT account_id FROM accounts WHERE <condition>

2. LOOP over accounts
   2a. CALL semantic.query
       → SELECT key FROM dataset_records('sent_this_month')
          WHERE key = '{{ $.account_id }}'
       Save as: existing_record

   2b. BRANCH: if existing_record is empty → proceed, else → skip

   2c. CALL slack.send_message   (or email.send)
       → send the notification

   2d. CALL dataset.record.upsert
       → dataset: "sent_this_month"
          key:    "{{ $.account_id }}"
          record: { "sent_at": "{{ $.now }}" }

A separate cleanup agent (on a monthly schedule) calls dataset.record.delete on all records in sent_this_month to reset the gate for the new period.

Pattern 2 — Incremental cursor

Process only new rows since the last run.

1. CALL semantic.query
   → SELECT record->>'last_run_at' AS cursor
     FROM dataset_records('etl_state')
     WHERE key = 'global_cursor'
     Save as: last_cursor

2. CALL semantic.query
   → SELECT * FROM source_table
     WHERE created_at > '{{ $.last_cursor }}'
     ORDER BY created_at ASC
     Save as: new_rows

3. LOOP over new_rows
   → process each row

4. CALL dataset.record.upsert
   → dataset: "etl_state"
      key:    "global_cursor"
      record: { "last_run_at": "{{ $.now }}" }

Pattern 3 — Accumulating state

Build a richer profile over time without reprocessing historical data.

1. CALL semantic.query
   → Load existing record from dataset
     Save as: prior_state

2. CALL AI Agent
   → Analyze new signal in context of prior_state
     Save as: updated_analysis

3. CALL dataset.record.upsert
   → Merge updated_analysis fields into existing record
     (prior fields not mentioned are preserved by merge semantics)

Pattern 4 — Cross-agent handoff

Agent A classifies accounts and stores results; Agent B reads them to decide on outreach.

Agent A (runs nightly):

{
  "function_id": "dataset.record.upsert",
  "args": {
    "dataset": "risk_classifications",
    "key": "{{ $.account_id }}",
    "record": {
      "risk_tier":       "{{ $.tier }}",
      "classified_at":   "{{ $.now }}",
      "classifier_notes": "$.notes"
    }
  }
}

Agent B (runs on a needle mover trigger):

-- Only escalate if Agent A classified this account as high-risk
SELECT rc.record->>'risk_tier' AS tier
FROM dataset_records('risk_classifications') rc
WHERE rc.key = '{{ $.account_id }}'

Agents: Functions Reference — full dataset.* function signatures
Agents: Variables and data — how to pass dataset records into prompts and SQL
Agents: Examples — Incremental ETL and other copy-paste patterns using datasets
Data Connections — setting up external connections for Query-type datasets

Why agents need persistent memory​

Dataset types​

Records datasets​

Query datasets​

How agents interact with datasets​

Writing to a dataset (CALL steps)​

dataset.record.upsert​

dataset.record.set_field​

dataset.record.delete​

Reading from a dataset (SQL queries)​

Creating and managing datasets​

Create a dataset​

Browse and edit records​

Delete a dataset​

Common patterns​

Pattern 1 — Deduplication gate​

Pattern 2 — Incremental cursor​

Pattern 3 — Accumulating state​

Pattern 4 — Cross-agent handoff​

Related​