Custom Datasets
Custom Datasets are named, workspace-scoped tables that agents can read from and write to across runs. They are the persistent memory layer for AI Agents — the place where a run stores what it computed, checked, or decided so the next run can pick up where the last one left off.
Why agents need persistent memory
AI Agents are stateless by design. When a run finishes, the variables it computed — query results, LLM summaries, intermediate values — are discarded. The next run starts completely fresh.
For simple fire-and-forget automations this is fine. But real CSM workflows are inherently stateful:
- Deduplication — "Has this account already received a pricing-concern email this month?" A run with no memory has no way to know. It will send the same email on every trigger.
- Incremental processing — "Process only tickets that arrived since the last run." Without a stored cursor, the agent re-processes every ticket every time, wasting LLM calls and producing duplicate output.
- State accumulation — "Build a risk profile for this account that grows richer each week." Each run needs to read what prior runs recorded and add to it rather than starting from nothing.
- Cross-agent coordination — One agent produces a health classification; another agent later queries that classification to decide whether to send an alert. Without a shared store, the two runs cannot communicate.
- Audit and traceability — Storing the outcome of each run (which accounts were processed, what the LLM decided, when the last outreach was sent) gives you a queryable history that informs dashboards and downstream agents.
Custom Datasets solve all of these by giving agents a durable, queryable store that survives across runs.
Dataset types
| Type | Written by | Read by | Best for |
|---|---|---|---|
| Records | AI Agents (at runtime) | Agents + SQL queries | Persistent agent state, deduplication keys, accumulated results |
| Query | External data connection (via SQL) | Agents + SQL queries | Enriching agents with external CRM, warehouse, or product data |
Records datasets
A Records dataset is a key-value store. Each record has a string key and a JSON payload (the record object). Agents create and update records using CALL steps; records persist indefinitely until explicitly deleted.
The record field uses merge semantics: upserting a record only overwrites the keys you provide, leaving all other keys intact. This lets different agent steps update different fields of the same record without collisions.
key: "acct_0015XY"
record: {
"last_email_sent": "2026-04-10",
"email_count": 3,
"risk_classification": "high",
"last_processed_at": "2026-05-03T08:00:00Z"
}
Query datasets
A Query dataset is a view over an external data source — a SQL query against a configured Data Connection (Snowflake, BigQuery, HubSpot, etc.). The dataset does not store its own rows; it executes the query at read time.
Query datasets are useful for enriching agent runs with data that lives outside FunnelStory — for example, pulling product usage metrics from a warehouse or open opportunities from a CRM.
How agents interact with datasets
Writing to a dataset (CALL steps)
Agents write records using three CALL step function IDs:
dataset.record.upsert
Creates a record or merges new fields into an existing one.
{
"function_id": "dataset.record.upsert",
"args": {
"dataset": "renewal_outreach",
"key": "{{ $.account_id }}",
"record": {
"last_sent_at": "{{ $.now }}",
"subject": "{{ $.email_subject }}",
"sent_by": "renewal-agent"
}
}
}
If a record for this key already exists, only last_sent_at, subject, and sent_by are updated — all other fields are preserved.
dataset.record.set_field
Updates a single field without touching any others. Useful for incrementing counters or toggling flags:
{
"function_id": "dataset.record.set_field",
"args": {
"dataset": "renewal_outreach",
"key": "{{ $.account_id }}",
"field": "email_count",
"value": "{{ $.new_count }}"
}
}
dataset.record.delete
Removes a record entirely by key. Use this to reset state after a terminal event (for example, after a renewal closes, remove the account's outreach tracking record):
{
"function_id": "dataset.record.delete",
"args": {
"dataset": "renewal_outreach",
"key": "{{ $.account_id }}"
}
}
Reading from a dataset (SQL queries)
Agents read datasets inside any SQL step using the dataset_records('dataset_name') virtual table. The dataset name is a required filter — queries that omit it are rejected.
-- Check whether this account has been contacted in the last 30 days
SELECT key, record->>'last_sent_at' AS last_sent_at
FROM dataset_records('renewal_outreach')
WHERE key = '{{ $.account_id }}'
Because dataset_records behaves like a regular table in semantic SQL, you can join it against other tables:
-- Find accounts that have a high health score but no outreach record yet
SELECT a.account_id, a.name, a.health_score
FROM accounts a
LEFT JOIN dataset_records('renewal_outreach') dr ON dr.key = a.account_id
WHERE a.health_score > 80
AND dr.key IS NULL
This LEFT JOIN pattern is the canonical way to implement incremental processing — process only rows that are not yet present in the dataset.
Creating and managing datasets
Datasets are configured in Settings → Datasets (/configure/datasets).
Create a dataset
- Click + Create dataset.
- Enter a Name — this is the identifier used in agent CALL steps and
dataset_records()queries. Names are immutable after creation. - Optionally add a Description to explain the dataset's purpose.
- Choose a Source type:
- Records — agent-writable key-value store (no SQL query needed).
- Query — SQL-backed view; select a Data Connection and enter the SQL query.
- Click Save.
Browse and edit records
For a Records dataset, click View records on the dataset list or inside the dataset's settings page. The records table shows every key and its JSON payload. You can sort and filter to inspect what agents have stored.
Delete a dataset
Deleting a dataset removes its configuration and all stored records permanently. Any agent CALL step that references it will start failing immediately.
Common patterns
Pattern 1 — Deduplication gate
Prevent an action from running more than once per account per time window.
1. CALL semantic.query
→ SELECT account_id FROM accounts WHERE <condition>
2. LOOP over accounts
2a. CALL semantic.query
→ SELECT key FROM dataset_records('sent_this_month')
WHERE key = '{{ $.account_id }}'
Save as: existing_record
2b. BRANCH: if existing_record is empty → proceed, else → skip
2c. CALL slack.send_message (or email.send)
→ send the notification
2d. CALL dataset.record.upsert
→ dataset: "sent_this_month"
key: "{{ $.account_id }}"
record: { "sent_at": "{{ $.now }}" }
A separate cleanup agent (on a monthly schedule) calls dataset.record.delete on all records in sent_this_month to reset the gate for the new period.
Pattern 2 — Incremental cursor
Process only new rows since the last run.
1. CALL semantic.query
→ SELECT record->>'last_run_at' AS cursor
FROM dataset_records('etl_state')
WHERE key = 'global_cursor'
Save as: last_cursor
2. CALL semantic.query
→ SELECT * FROM source_table
WHERE created_at > '{{ $.last_cursor }}'
ORDER BY created_at ASC
Save as: new_rows
3. LOOP over new_rows
→ process each row
4. CALL dataset.record.upsert
→ dataset: "etl_state"
key: "global_cursor"
record: { "last_run_at": "{{ $.now }}" }
Pattern 3 — Accumulating state
Build a richer profile over time without reprocessing historical data.
1. CALL semantic.query
→ Load existing record from dataset
Save as: prior_state
2. CALL AI Agent
→ Analyze new signal in context of prior_state
Save as: updated_analysis
3. CALL dataset.record.upsert
→ Merge updated_analysis fields into existing record
(prior fields not mentioned are preserved by merge semantics)
Pattern 4 — Cross-agent handoff
Agent A classifies accounts and stores results; Agent B reads them to decide on outreach.
Agent A (runs nightly):
{
"function_id": "dataset.record.upsert",
"args": {
"dataset": "risk_classifications",
"key": "{{ $.account_id }}",
"record": {
"risk_tier": "{{ $.tier }}",
"classified_at": "{{ $.now }}",
"classifier_notes": "$.notes"
}
}
}
Agent B (runs on a needle mover trigger):
-- Only escalate if Agent A classified this account as high-risk
SELECT rc.record->>'risk_tier' AS tier
FROM dataset_records('risk_classifications') rc
WHERE rc.key = '{{ $.account_id }}'
Related
- Agents: Functions Reference — full
dataset.*function signatures - Agents: Variables and data — how to pass dataset records into prompts and SQL
- Agents: Examples — Incremental ETL and other copy-paste patterns using datasets
- Data Connections — setting up external connections for Query-type datasets