Product documentation
Patchworks
Patchworks
  • Patchworks product documentation
  • Welcome
    • Welcome to Patchworks!
    • What is Patchworks?
    • Patchworks demos
    • Product roadmap
  • Getting Started
    • Getting started introduction
    • Core subscription tiers
    • Key concepts & terminology
    • Multi-language support
    • Patchworks quickstart guide
    • Technical overview
      • Patchworks infrastructure
        • Auto-scaling
      • ISO certification
      • SOC 2
      • Tech stack
      • Patchworks IPs
  • Registration
    • Registration introduction
    • Registration & sign-in summary
    • Password control
    • Registering for a Patchworks account
      • Simple registration
      • Google account registration
    • Two-factor authentication (2FA)
    • SSO
      • Azure AD / Entra
      • Okta
      • PingOne
  • Company Management
    • About company profiles
    • Accessing your company profile
    • Adding & managing company profile banners
    • Multi-company profiles
      • Adding & linking a new company
      • Switching in & out of a linked company
      • Viewing linked companies for your multi-company profile
      • Downloading a linked companies summary
      • Un-linking a company from your multi-company profile
    • Company insights
      • Accessing your company insights
      • Company insights overview
        • About operations
        • About data usage
  • Users, Roles & Permissions
    • Users, roles & permissions introduction
    • Roles & permissions summary
    • Viewing all users for your company profile
    • Creating a new user account for your company profile
    • Updating general details for an existing user account
    • Updating the role for an existing user account
    • Triggering a password reset for another user
    • Managing your own user account
    • Managing team members & users for multi-company profiles
      • Working with your team members
        • Viewing team members for your own multi-company profile
        • Creating a new team member to manage linked companies
        • Assigning user roles for an existing team member
        • Viewing team members who manage a linked company
        • Granting & revoking permission for an existing team member to manage a linked company
        • Triggering a password reset for an existing team member
        • Removing a team member account from your multi-company profile
      • Working with 'native' company users
        • Viewing 'native' company users for a linked company
        • Creating a new 'native' user for a linked company
        • Assigning roles for an existing 'native' linked company user
        • Triggering a password reset for an existing 'native' linked company user
        • Removing a 'native' linked company user from their company profile
  • Marketplace
    • The Patchworks marketplace
    • Marketplace blueprints
      • Submitting a blueprint to the public marketplace
    • Marketplace connectors
      • Submitting a connector for the public marketplace
    • Marketplace process flows
    • Marketplace scripts
    • Marketplace cross-reference lookups
    • The notification centre
    • Private marketplaces
      • Accessing your private marketplace
      • Uploading private marketplace resources
        • Building & uploading blueprints for your private marketplace
        • Uploading custom connectors to your private marketplace
        • Uploading process flows to your private marketplace
        • Uploading scripts to your private marketplace
        • Uploading cross-reference lookups to your private marketplace
      • Changing private marketplace resources
    • Marketplace troubleshooting
      • Installed process flows not working as expected
  • Blueprints
    • Blueprints introduction
    • The anatomy of a blueprint
    • Installing a blueprint
    • Building a blueprint
    • Patchworks blueprints
      • Lightspeed X-Series & Shopify
      • SEKO Logistics & Shopify
      • Shopify & NetSuite
      • Shopify & Descartes Peoplevox
      • Shopify & Virtualstock Supplier
        • Available process flows (Shopify & Virtualstock Supplier)
          • Orders (Shopify & Virtualstock Supplier)
          • Fulfillments (Shopify & Virtualstock Supplier)
          • Inventory (Shopify & Virtualstock Supplier)
        • Installation guide (Shopify & Virtualstock Supplier)
          • Stage 1: Create a Shopify app/sales channel for Patchworks
          • Stage 2: Add required products to your Patchworks sales channel (Shopify & Virtualstock Supplier)
          • Stage 3: Define instances & install the app (Shopify & Virtualstock Supplier)
            • Adding connector instances AFTER installation
          • Stage 4: Update cross-reference lookups (Shopify to Virtualstock Supplier)
          • Stage 5: Review & test the Orders process flow (Shopify & Virtualstock Supplier)
            • Changing the supplier delivery date calculation for orders
          • Stage 6: Review & test the Fulfillments process flow (Shopify to Virtualstock Supplier)
          • Stage 7: Review & test the Inventory process flow (Shopify to Virtualstock Supplier)
          • Stage 8: Go live (Shopify to Virtualstock Supplier)
      • Veeqo & TikTok
  • Connectors & instances
    • Connectors & instances introduction
    • Patchworks connectors
      • Adobe Commerce - Magento (prebuilt connector)
      • Aero Commerce (prebuilt connector)
      • Airtable (prebuilt connector)
      • Akeneo (prebuilt connector)
      • Algolia (prebuilt connector)
      • Amazon Seller Partner API (prebuilt connector)
      • Avasam (prebuilt connector)
      • BigCommerce (prebuilt connector)
      • BigCommerce B2B Edition (prebuilt connector)
      • Bleckmann API (prebuilt connector)
      • Bloomreach Engagement (prebuilt connector)
      • Braze (prebuilt connector)
      • Brightpearl (prebuilt connector)
      • Business Central (prebuilt connector)
      • CacheFlow (prebuilt connector)
      • Carma (prebuilt connector)
      • Centra (prebuilt connector)
      • ChannelEngine (prebuilt connector)
      • Cin7 (prebuilt connector)
      • Clarus WMS (prebuilt connector)
      • Clerk (prebuilt connector)
      • Cloudshelf (prebuilt connector)
      • Commerce Layer (prebuilt connector)
      • Commercetools (prebuilt connector)
      • Cybertill (prebuilt connector)
        • Cybertill post-request script
      • Deposco (prebuilt connector)
      • Descartes Peoplevox (prebuilt connector)
        • The Peoplevox setData script
          • Using the Peoplevox setData script in process flows
        • Peoplevox XML to JSON conversion script
        • Using the Peoplevox search filter when pulling data
      • Dotdigital (prebuilt connector)
      • EdiFabric (prebuilt connector)
      • EKM Insight (prebuilt connector)
      • Emarsys (prebuilt connector)
        • Emarsys pre request script
      • Ergonode (prebuilt connector)
      • EVA (prebuilt connector)
      • Flexport (prebuilt connector)
      • Fluent Commerce (prebuilt connector)
      • Fredhopper by Crownpeak (prebuilt connector)
      • Freshdesk (prebuilt connector)
      • FTP (prebuilt connector)
      • Fulfillmenttools (prebuilt connector)
      • Google BigQuery (prebuilt connector)
      • Google Pub Sub (prebuilt connector)
      • Google Sheets (prebuilt connector)
        • Working with the Google Sheets connector
      • Gorgias (prebuilt connector)
      • GXO Logistics (prebuilt connector)
      • Happy Returns (prebuilt connector)
      • Huboo (prebuilt connector)
      • Hubspot (prebuilt connector)
      • InPost (prebuilt connector)
      • Inriver (prebuilt connector)
      • Jira (prebuilt connector)
      • Klaviyo (prebuilt connector)
      • Lightspeed Restaurant (K-Series) (prebuilt connector)
      • Lightspeed Retail X-Series (prebuilt connector)
      • Linnworks (prebuilt connector)
      • LionWheel (prebuilt connector)
      • Mailchimp (prebuilt connector)
      • Mailjet (prebuilt connector)
      • Mapp Marketing (prebuilt connector)
      • Marketplacer - Operator (prebuilt connector)
      • Marketplacer - Seller (prebuilt connector)
      • Mirakl (prebuilt connector)
      • MongoDB (prebuilt connector)
      • NetSuite (prebuilt connector)
        • Preparing your NetSuite environment to work with Patchworks
        • OAuth 2 (client credentials) authentication
        • OAuth 1 authentication
      • Occtoo (prebuilt connector)
      • Octopus Energy (prebuilt connector)
      • Odoo (prebuilt connector)
      • Ometria (prebuilt connector)
      • OnBuy (prebuilt connector)
      • OneStock (prebuilt connector)
      • OpenAI (prebuilt connector)
      • Orderwise (prebuilt connector)
      • OroCommerce Storefront (prebuilt connector)
      • Paddle (prebuilt connector)
      • PDM Automotive (prebuilt connector)
      • PagerDuty (prebuilt connector)
      • Pimberly (prebuilt connector)
      • Pimcore REST API (prebuilt connector)
      • Plytix (prebuilt connector)
        • Plytix post-request script
      • Prima Solutions (prebuilt connector)
        • Setting up a Prima Solutions connection
      • Quickbooks (prebuilt connector)
      • RabbitMQ (prebuilt connector)
      • Rebound (prebuilt connector)
      • ReturnGo (prebuilt connector)
      • Returnless (prebuilt connector)
      • Reveni (prebuilt connector)
      • REVIEWS.io (prebuilt connector)
      • Sage 200 (prebuilt connector)
      • Salesforce Commerce Cloud (prebuilt connector)
      • Salsify (prebuilt connector)
      • Sanity.io (prebuilt connector)
      • SCAYLE (prebuilt connector)
      • SEKO (prebuilt connector)
      • SFTP (prebuilt connector)
      • ShipBob (prebuilt connector)
      • Shiptheory (prebuilt connector)
      • Shopify (prebuilt connector)
        • Shopify token authentication
        • Adding a custom (GraphQL) endpoint for Shopify
      • Shopline (prebuilt connector)
      • Shopware (prebuilt connector)
      • Sitoo (prebuilt connector)
      • Snowflake (prebuilt connector)
      • SparkLayer (prebuilt connector)
      • Stok.ly (prebuilt connector)
      • Swan Retail System (prebuilt connector)
        • Swan Retail System - pagination update timestamp response script
      • Swap Commerce (prebuilt connector)
      • Tempo (prebuilt connector)
      • The Edge by John Lewis (prebuilt connector)
      • TikTok Shop (prebuilt connector)
      • Torque (prebuilt connector)
      • Trello (prebuilt connector)
      • Twilio (prebuilt connector)
      • Veeqo (prebuilt connector)
      • Virtualstock (prebuilt connector)
      • Visual Next (prebuilt connector)
      • Visualsoft (prebuilt connector)
      • Voyado (prebuilt connector)
      • Whistl (prebuilt connector)
      • WooCommerce (prebuilt connector)
      • Xero (prebuilt connector)
      • Zendesk (prebuilt connector)
      • ZigZag (prebuilt connector)
    • Working with connectors
      • Accessing your connectors
      • Installing a connector
      • Updating a connector
      • Removing a connector
    • Working with instances
      • Accessing instances
      • Adding an instance
      • Updating an instance
      • Removing an instance
    • Event connectors (BETA)
      • Accessing your event connectors
      • Adding a new event connector
      • Updating an existing event connector
      • Removing an existing event connector
      • Message queues/topics
        • Adding a new message queue/topic
        • Renaming a message queue/topic
        • Removing a message queue/topic
  • Process flows
    • About process flows
    • The process flow home page
    • Prebuilt process flows
      • Installing a prebuilt process flow
      • The anatomy of a prebuilt process flow
    • Building process flows
      • Approaching your first process flow
      • Techniques for building process flows
      • Best practice for building process flows
        • Payload size - best practice
        • Scripts - best practice
        • Multi environment management - best practice
        • Targeted syncs - best practice
      • Understanding how data flows through shapes
      • Process flow versioning
      • Adding a new process flow
      • The process flow canvas
      • Process flow settings
      • Process flow shapes
        • Standard shapes
          • Assert shape
          • Branch shape
          • Connector shape
            • Configuring a database connection
            • Using connector shape response scripts
            • Configuring SFTP connections
            • Configuring FTP connections
          • Filter shape
            • Using regex for string-type filters
            • Using contains one of many or does not contain one of many for string filters
          • Flow control shape
          • Manual payload shape
          • Map shape
            • Importing & exporting map shape configurations
            • Working with field mappings
            • Working with field transformations
              • Available transform functions
                • Array transform functions
                  • Array join transform function
                • Date transform functions
                  • Custom dynamic date transform function
                  • Custom static date transform function
                  • Format date transform function
                  • Round date transform function
                • Number transform functions
                  • Cast to string transform function
                  • Custom number transform function
                  • Math transform function
                  • Round number transform function
                • Other transform functions
                  • Cache lookup transform function
                  • Cast boolean to string transform function
                  • Cast to boolean transform function
                  • Custom boolean transform function
                  • Null to string transform function
                  • Null to zero transform function
                  • Null value transform function
                  • Script transform function
                • String transform functions
                  • Contains one of many transform function
                  • Custom string transform function
                  • Cast to float transform function
                  • Concatenate transform function
                  • Does not contain one of many transform function
                  • Cast to number transform function
                  • First word transform function
                  • JSON encode transform function
                  • Last word transform function
                  • Pad transform function
                  • Replace transform function
                  • Split string transform function
            • Mappings - tips & tricks
              • Mapping an array & a parent field
          • Notify shape
          • Route shape
          • Run process flow shape
          • Set variables shape
          • Split shape
          • Track data shape
            • The tracked data page
          • Trigger shape
            • Trigger shape (schedule)
            • Trigger shape (callback)
            • Trigger shape (event listener)
            • Trigger shape (webhook)
          • Try/Catch shape
        • Advanced shapes
          • Cache
            • Add to cache shape
              • Generating dynamic cache keys with variables
              • Appending data to a cache
              • Cache pagination options
            • Load from cache shape
              • What cached data do you want to load?
                • Loading all cached data from a static cache key
                • Loading multiple items from dynamic cache keys
                • Loading a single item from a dynamic cache key
            • Referencing a cache in mapping transformations
            • Cache maintenance
          • De-Dupe shape
            • Adding & configuring a de-dupe shape
            • Working with data pools
            • Importing & exporting de-dupe data
          • Script shape
          • Callback shape
      • Skipping shapes
      • Shape notes
      • Dynamic variables
        • Payload variables
        • Metadata variables
        • Flow variables
          • Adding & managing flow variables
          • Referencing flow variables in a process flow
          • Referencing flow variables in custom scripts
          • Referencing flow & meta variables in mapping transformations
      • Connection pools
    • Managing process flows
      • Accessing your process flows
      • Enabling & disabling a process flow
      • Renaming a process flow
      • Duplicating a process flow
        • Duplicating a process flow between linked companies
        • Shape configuration in a duplicated process flow
      • Process flow labels
        • Process flow label management
        • Using process flow labels
      • Initialising a process flow manually
      • Initialising a process flow manually with a payload
      • Stopping a running process flow
      • Removing a process flow
    • Error reporting & exception handling
      • Real-time run logs
      • Run logs & queue
        • Working with run logs
          • Viewing logs
          • Viewing logs (classic)
          • Downloading run logs
          • Retrying a failed process flow run
        • Working with your run queue
      • Email notifications for failed process flow runs
    • Cross-reference lookups
      • Accessing cross-reference lookups
      • Installing cross-reference lookups
      • Adding a cross-reference lookup
      • Importing & exporting cross-reference lookups
      • Using a cross-reference lookup in field transformations
    • Troubleshooting process flows
      • Unable to edit a process flow
      • Process flow not running
      • Process flow run failures
      • Re-syncing a previous payload
      • Required fields are not being tracked
      • Process flow timeout retrieving large payload
      • Process flow error when using a webhook connector
      • Your third-party systems go offline
      • Using a pre-request script for some process flows but not others
  • General Settings
    • General settings introduction
    • Audit logs
    • Notification groups
  • Developer Hub
    • Patchworks developer hub
    • Postman importer
    • Connector builder
      • Accessing the connector builder
      • Building your own connector
        • 1: Basic details
        • 2: Connector variables
        • 3: Authentication methods
          • Adding a new authentication method
          • Authentication method options
            • Auth variables
            • Connector variables
            • URL parameters
            • Header
            • Body
            • Pre-request script
            • Post-request script
          • Supported authentication types
            • Basic authentication
              • Configuring basic authentication
            • Token-based authentication
              • Configuring token-based authentication
            • OAuth 2
              • OAuth 2 (authorisation code)
                • Configuring OAuth 2 authentication (authorisation code)
              • OAuth 2 (client credentials)
                • Configuring OAuth 2 authentication (client credentials)
            • OAuth 1
              • Configuring OAuth 1 authentication
            • SOAP authentication
            • DB user pass authentication
            • No authentication
              • Configuring No Auth authentication
        • 4: Endpoints
          • Adding a new endpoint
          • Endpoint options
            • Authentication
            • Endpoint variables
            • URL
            • Header
            • Body
            • Schema / Taxonomy
              • Schema options
              • Field tagging
                • Working with field tags
                • Patchworks field tagging taxonomy
                  • Patchworks field tagging taxonomy: orders
                  • Patchworks field tagging taxonomy: customers
                  • Patchworks field tagging taxonomy: refunds
                  • Patchworks field tagging taxonomy: products
                  • Patchworks field tagging taxonomy: fulfillments
                  • Patchworks field tagging taxonomy: inventory
            • Pre-request script
            • Post-request script
            • Pagination
              • Custom relative URI pagination method
              • GraphQL cursor pagination method
              • Limit-offset pagination method
              • Link header pagination method
              • Next page token pagination method
              • Next page URL pagination method
              • Page number parameter pagination method
              • PeopleVox pagination method
              • NetSuite SOAP pagination method
              • Script pagination
          • Enabling an authentication method for an endpoint
        • Techniques for working with variables & parameters
          • Working with variables
          • Working with parameters
        • Building a database connector
          • Working with queries
      • Maintaining your own connectors
    • Custom scripting
      • Accessing custom scripts
      • Installing custom scripts
      • Creating & testing custom scripts
        • Creating a script manually
        • Creating a script with AI
          • AI conversation history
        • Testing scripts
        • Custom script messages for logs
      • Custom scripting technical overview
      • Custom script examples (general)
      • Pagination scripts
    • Patchworks API
      • Core API
      • Core API authentication
        • API keys
        • OAuth 2 (client credentials)
      • Core API spotlights
        • Initialising a process flow & sending data via the Patchworks API
          • Quickstart guide
          • The steps
            • Preparing your data
            • Mapping payload data
            • Obtaining process flow & version IDs for API requests
            • Obtaining a token for Patchworks API authentication
            • Initialising a process flow & sending data
        • Working with cross-reference lookup API requests
      • Core API general information
        • HTTP response status codes
  • Patchworks bolt-ons
    • Patchworks bolt-ons
    • Stockr
      • Stockr overview
      • The Stockr summary
  • RELEASE INFORMATION
    • Release information introduction
    • Core release notes
      • 2025 05 29 release notes (core)
      • 2025 05 14 release notes (core)
      • 2025 04 16 release notes (core)
      • 2025 04 03 release notes (core)
      • 2025 03 19 release notes (core)
      • 2025 03 05 release notes (core)
      • 2025 02 26 release notes (core)
      • 2025 02 19 release notes (core)
      • 2025 01 15 release notes (core)
      • 2024 12 16 release notes (core)
      • 2024 12 04 release notes (core)
      • 2024 11 07 release notes (core)
      • 2024 10 30 release notes (core)
      • 2024 10 16 release notes (core)
      • 2024 10 02 release notes (core)
      • 2024 09 11 release notes (core)
      • 2024 08 29 release notes (core)
      • 2024 08 21 release notes (core)
      • 2024 08 14 release notes (core)
      • 2024 08 08 release notes (core)
      • 2024 08 07 release notes (core)
      • 2024 07 31 release notes (core)
      • 2024 07 24 release notes (core)
      • 2024 07 17 release notes (core)
      • 2024 07 03 release notes (core)
      • 2024 06 27 release notes (core)
      • 2024 06 26 release notes (core)
      • 2024 06 18 release notes (core)
      • 2024 06 12 release notes (core)
      • 2024 06 05 release notes (core)
      • 2024 05 30 release notes (core)
      • 2024 05 23 release notes (core)
      • 2024 05 15 release notes (core)
      • 2024 05 01 release notes (core)
      • 2024 04 18 release notes (core)
      • 2024 04 11 release notes (core)
      • 2024 03 21 release notes (core)
      • 2024 03 13 release notes (core)
      • 2024 03 07 (2) release notes (core)
      • 2024 03 07 release notes (core)
      • 2024 02 29 release notes (core)
      • 2024 02 27 release notes (core)
      • 2024 02 13 release notes (core)
      • 2024 01 25 release notes (core)
      • 2024 01 18 release notes (core)
      • 2024 01 12 release notes (core)
      • 2024 01 09 release notes (core)
      • 2024 01 04 release notes (core)
      • 2023 12 21 release notes (core)
      • 2023 12 14 release notes (core)
      • 2023 12 05 release notes (core)
      • 2023 11 16 release notes (core)
      • 2023 11 07 release notes (core)
      • 2023 10 26 release notes (core)
      • 2023 10 16 release notes (core)
      • 2023 10 05 release notes (core)
      • 2023 09 21 release notes (core)
      • 2023 08 17 release notes (core)
      • 2023 08 15 release notes (core)
      • 2023 08 10 release notes (core)
      • 2023 08 01 release notes (core)
      • 2023 07 27 release notes (core)
      • 2023 07 26 release notes (core)
      • 2023 07 25 release notes (core)
      • 2023 07 24 release notes (core)
      • 2023 07 14 release notes (core)
      • 2023 06 26 release notes (core)
    • Tapestry release notes
      • 2023 04 30 release notes (dashboard)
      • 2023 03 31 release notes (dashboard)
      • 2023 02 23 release notes (dashboard)
      • 2023 01 31 release notes (dashboard)
      • 2022 11 07 release notes (dashboard)
      • 2022 11 01 release notes (dashboard)
      • 2022 10 24 release notes (dashboard)
      • 2022 10 10 release notes (dashboard)
      • 2022 09 26 release notes (dashboard)
      • 2022 08 23 release notes (dashboard)
    • Stockr release notes
      • 2023 01 31 release notes (Stockr)
  • Training & Support
    • Patchworks Help Centre
  • The Patchworks Academy
  • Raising a support ticket
Powered by GitBook
On this page
  • Introduction
  • Need to know
  • How it works
  • Behaviour
  • Data pools
  • Key field
  • How duplicate data is handled
  • More information
Export as PDF
  1. Process flows
  2. Building process flows
  3. Process flow shapes
  4. Advanced shapes

De-Dupe shape

PreviousCache maintenanceNextAdding & configuring a de-dupe shape

Last updated 2 months ago

Introduction

The de-dupe shape can be used to handle duplicate records found in incoming payloads. It can be used in three modes:

  • Filter. Filters out duplicated data so only new data continues through the flow.

  • Track. Tracks new data but does not check for duplicated data.

  • Filter & track. Filters out duplicated data and also tracks new data.

A process flow might include a single de-dupe shape set to one of these modes (e.g. filter & track), or multiple de-dupe shapes at different points in a flow, with different behaviours.

A single incoming payload for should not exceed 500MB.

We recommend processing multiple, smaller payloads rather than one single payload (1000 x 0.5MB payloads are more efficient than 1 x 500MB payload!).

For payloads up to 500MB, consider adding a to batch data into multiple, smaller payloads. Payloads exceeding 500MB should be batched at source.

Need to know

  • Tracked de-dupe data is retained for 90 days after it's added to a data pool.

  • Tracked de-dupe data can be interrogated via the - by default it's available here for 15 days.

  • The de-dupe shape is not atomic - as such we advise against multiple process flows attempting to update the same data pool at the same time.

  • The de-dupe shape works with incoming payloads from a , and also from a , , or .

  • JSON and XML payloads are supported.

How it works

Behaviour

As noted previously, the de-dupe shape can be used in three modes, which are summarised below.

Mode
Summary

Filter

Remove duplicate data from the incoming payload so only new data continues through the flow. New data is NOT tracked.

Track

Log each new key value received in the data pool.

Filter & track

Remove duplicate data from the incoming payload AND log each new key value received.

Data pools

You can have multiple de-dupe shapes (either in the same process flow or in different process flows) sharing the same data pool. Typically, you would create one data pool for each entity type that you are processing. For example, if you are syncing orders via an 'orders' endpoint and products via a 'products' endpoint, you'd create two data pools - one for orders and another for products.

Tracked de-dupe data is retained for 90 days after it's added to a data pool.

Key field

The key field is the data field that should be used to match records. This would typically be some sort of id that uniquely identifies payload records - for example, an order id if you're processing orders, a customer id if you're processing customer data, etc.

How duplicate data is handled

When duplicate data is identified it is removed from the payload however, exactly what gets removed depends on the configured key field.

If your given key field is a top-level field for a simple payload, the entire record will be removed. However, if the payload structure is more complex and the key field is within an array, then duplicates will be removed from that array but the parent record will remain.

Let's look at a couple of examples.

More information

The de-dupe shape is configured with a , a , and a :

Why are there separate filter and track behaviour options?

These options provide flexibility for what happens to new/duplicate data, and when it happens. Let's look at an example below:

Here, we receive an incoming payload from the first connection shape and send it into a de-dupe shape which is configured to filter & track. This means that any duplicate records (based on the key value) will be removed and the key value for any new records is logged in the data pool before the updated payload continues to the next shape.

Often this is fine - but let's take a closer look at our sample process flow. Following the de-dupe shape, we have another four shapes to process before completion. If this run were to fail for any reason, we'd want to re-send the data - but because we've already tracked the new data in this payload, those records wouldn't be sent again (they would be filtered out as duplicates).

To avoid this scenario, we could add TWO de-dupe shapes to our process flow, where:

  • Shape 1 is placed immediately after the first connection shape, with its behaviour set to filter. Any duplicate records are removed right at the start.

  • Shape 2 is placed at the very end of the process flow (after data has been pushed to the final endpoint), with its behaviour set to track. At this point, we know that the data has been pushed successfully, so we can safely log new records.

For example:

The above shows the kind of flexibility offered by the three behaviour modes for the de-dupe shape. The approach described may not be appropriate for every case, but it does illustrate the importance of considering where the de-dupe shape is placed in a process flow, and whether using multiple shapes with different behaviours could be beneficial.

Data pools are and are used to organise de-dupe data. Once a data pool has been created it becomes available for selection when configuring a de-dupe shape for a process flow.

When data passes through a de-dupe shape which is set for tracked behaviour, the value associated with the for each new record is logged in the data pool. So, the data pool will contain all unique key field values that have passed through the shape.

Example 1: Simple payload & top-level key field
[
    {
        "customerID": 10000201,
        "first_name": "Beyonce",
        "last_name": "Knowles",        
        "item1": "pears",
        "item2": "apples",
        "item3": "oranges",
        "item4": "peaches",
    }
]

In the example above we have a simple payload with single-level customer records in an array - there are no nested arrays. If we were to specify customerID as the de-dupe key field and a match is found, the entire record will be removed.

Let's say that customerID of 10000201 passes through the same de-dupe shape (and therefore the same data pool) twice. In this case, a match would be made and the payload output from the de-dupe shape would be:

[]
Example 2: Complex payload & nested key field
[
    {
        "customerID": 10000201,
        "first_name": "Beyonce",
        "last_name": "Knowles",
        "orders": [
            {
                "customerID": 10000201,
                "order_id": 222222,
                "item1": "pears",
                "item2": "apples",
                "item3": "oranges",
                "item4": "peaches"
            },
            {
                "customerID": 10000201,
                "order_id": 333333,
                "item1": "grapes",
                "item2": "plums",
                "item3": "peaches",
                "item4": "lychees"
            }
        ]
    }
]

In the example above we have a more complex payload with orders in an array. If we were to specify orders.customerID as the de-dupe key field and a match is found, the associated order(s) will be removed but any parent data will remain.

Let's say that customerID of 10000201 passes through the same de-dupe shape (and therefore the same data pool) twice. In this case, a match would be made and the payload output from the de-dupe shape would be:

[
    {
        "customerID": 10000201,
        "first_name": "Beyonce",
        "last_name": "Knowles",
        "orders": []
    }
]

any process flow shape
flow control shape
tracked data page
connection shape
manual payload
API call
webhook
behaviour
data pool
key
created in general settings
key field
Adding a data pool
Adding & configuring a de-dupe shape