Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.suprsend.com/llms.txt

Use this file to discover all available pages before exploring further.

Amazon Athena reads Parquet files directly from S3 and bills per terabyte scanned, so it’s a fast way to explore the data the Amazon S3 v2.0 connector writes—no warehouse to provision, no ingestion pipeline to maintain. By the end of this guide you’ll have a database named suprsend_db with three tables (ss_requests, ss_workflow_executions, ss_messages), and you’ll have run your first query.

Prerequisites

You’ll need:
  • The S3 v2.0 connector running and writing Parquet files to your bucket.
  • Access to the AWS account that owns the bucket, with permission to use Athena and read the bucket.
  • An S3 location for Athena to store query results (a separate prefix or a different bucket).
Your connector also needs two specific settings—both are defaults for new connectors:
  • Path layout: per_type. The DDL in this guide creates a separate external table for each data point, with each one pointing at its own S3 prefix (<bucket>/workflow_executions/, .../requests/, .../messages/). That folder structure exists only with per_type.
  • Compression: any codec except lz4. Athena can’t reliably read lz4-encoded Parquet—this is a known Athena limitation, not something specific to SuprSend. snappy is the default compression which works well with Athena.
You can review or change either setting in Compression and path layout.
Connectors enabled before 21 May 2026 were on lz4 compression and shared path layout which had issues with Athena connector setup. So, we recommend changing your compression to snappy and path layout to per_type which works very well with Athena.When you change the setting, the new setting will only apply to files written after the switch. In order to sync the older data, you can reach out to suppport@suprsend.com and we can run one time sync of older data in your S3 bucket.

1. Set the Athena query result location

Open the Athena console in the same region as your S3 bucket. The first time you use Athena in a region you have to set a result location:
  1. Go to SettingsManage.
  2. Set Location of query result to a path you control, for example s3://YOUR_BUCKET_NAME/athena-results/.
  3. Save.

2. Create the database

In the query editor, run:
CREATE DATABASE IF NOT EXISTS suprsend_db;
The rest of the guide uses suprsend_db. If you pick a different name, update the CREATE EXTERNAL TABLE statements below to match.

3. Create the external tables

Run each statement to register one external table per data point. They use Athena partition projection on year/month/day/hour, so new hourly partitions are picked up automatically—you don’t need MSCK REPAIR TABLE or a Glue Crawler. Replace YOUR_BUCKET_NAME in LOCATION and storage.location.template with your bucket. If your connector writes under a path prefix (for example staging/), include it before the data-point folder—s3://YOUR_BUCKET_NAME/staging/workflow_executions/.
CREATE EXTERNAL TABLE suprsend_db.ss_workflow_executions (
  workspace_key                 string,
  created_at                    timestamp,
  updated_at                    timestamp,
  execution_id                  string,
  recipient_distinct_id         string,
  tenant_id                     string,
  idempotency_key               string,
  parent_object_execution_id    string,
  parent_object                 string,
  workflow_slug                 string,
  workflow_version              string,
  node_id                       string,
  node_name                     string,
  node_type                     string,
  execution_stage               string,
  status                        string,
  message                       string,
  properties                    string
)
PARTITIONED BY (
  year  string,
  month string,
  day   string,
  hour  string
)
STORED AS PARQUET
LOCATION 's3://YOUR_BUCKET_NAME/workflow_executions/'
TBLPROPERTIES (
  'projection.enabled'         = 'true',
  'projection.year.type'       = 'integer',
  'projection.year.range'      = '2022,2030',
  'projection.month.type'      = 'integer',
  'projection.month.range'     = '1,12',
  'projection.day.type'        = 'integer',
  'projection.day.range'       = '1,31',
  'projection.hour.type'       = 'integer',
  'projection.hour.range'      = '0,23',
  'storage.location.template'  = 's3://YOUR_BUCKET_NAME/workflow_executions/year=${year}/month=${month}/day=${day}/hour=${hour}/'
);
projection.year.range is set to 2022,2030. Widen the upper bound if you’ll be querying data beyond 2030.
For column meanings, see the table schemas in the S3 v2.0 doc.

4. Run your first query

You’re ready to query. The cardinal rule with Athena is to always filter on the partition columns (year, month, day, hour) where you can—Athena bills per TB scanned, so partition pruning is the main way to keep costs predictable.
SELECT *
FROM suprsend_db.ss_workflow_executions
WHERE year = '2026'
LIMIT 100;
Trace a single request across all three tables using idempotency_key, which is shared across Requests, Workflow Executions, and Messages:
SELECT
  r.idempotency_key,
  w.execution_id,
  m.message_id,
  m.message_status
FROM suprsend_db.ss_requests       AS r
LEFT JOIN suprsend_db.ss_workflow_executions AS w
  ON w.idempotency_key = r.idempotency_key
LEFT JOIN suprsend_db.ss_messages  AS m
  ON m.wf_execution_id = w.execution_id
WHERE r.year = '2026'
  AND r.idempotency_key = 'YOUR_IDEMPOTENCY_KEY';
For the canonical relational join (Requests → Workflow Executions via UNNEST(requests.executions).exec_id = workflow_executions.execution_id), see the linking columns reference.

Points to note

Some columns are stored as string in Parquet even though they hold JSON, booleans, integers, or timestamps. Cast them explicitly:
  • JSON (payload, response, metadata, properties, webhook_data, execution_failure_reason) → json_extract(column, '$.field')
  • Integer (vendor_fallback_level, wait_time_in_seconds) → CAST(column AS BIGINT)
  • Boolean (vendor_fallback_applicable, is_smart) → compare against 'true' / 'false'
  • Timestamp (success_achieved_at) → from_iso8601_timestamp(column) or date_parse(column, '<format>')
Full column list in the table schemas.
The connector rewrites hourly Parquet files when upstream data changes. The next Athena query picks up the new state automatically — no MSCK REPAIR TABLE or Glue Crawler refresh needed.
If you point the connector at a new bucket or prefix, existing tables stop returning new data. Drop and recreate each table with the updated LOCATION and storage.location.template.