De-Dupe shape

Introduction

The de-dupe shape can be used to handle duplicate records found in incoming payloads. It can be used in three behaviour modes:

  • Filter. Filters out duplicated data so only new data continues through the flow.

  • Track. Tracks new data but does not check for duplicated data.

  • Filter & track. Filters out duplicated data and also tracks new data.

A process flow might include a single de-dupe shape set to one of these modes (e.g. filter & track), or multiple de-dupe shapes at different points in a flow, with different behaviours.

Need to know

  • Tracked de-dupe data is retained for 90 days after it's added to a data pool.

  • The de-dupe shape works with incoming payloads from a connection shape, and also from a manual payload, inbound API or webhook.

  • JSON and XML payloads are supported.

How it works

The de-dupe shape is configured with a behaviour, a data pool, and a key:

Behaviour

As noted previously, the de-dupe shape can be used in three modes, which are summarised below.

ModeSummary

Filter

Remove duplicate data from the incoming payload so only new data continues through the flow. New data is NOT tracked.

Track

Log each new key value received in the data pool.

Filter & track

Remove duplicate data from the incoming payload AND log each new key value received.

Data pools

Data pools are created in general settings and are used to organise de-dupe data. Once a data pool has been created it becomes available for selection when configuring a de-dupe shape for a process flow.

When data passes through a de-dupe shape which is set for tracked behaviour, the value associated with the key field for each new record is logged in the data pool. So, the data pool will contain all unique key field values that have passed through the shape.

You can have multiple de-dupe shapes (either in the same process flow or in different process flows) sharing the same data pool. Typically, you would create one data pool for each entity type that you are processing. For example, if you are syncing orders via an 'orders' endpoint and products via a 'products' endpoint, you'd create two data pools - one for orders and another for products.

Tracked de-dupe data is retained for 90 days after it's added to a data pool.

Key field

The key field is the data field that should be used to match records. This would typically be some sort of id that uniquely identifies payload records - for example, an order id if you're processing orders, a customer id if you're processing customer data, etc.

How duplicate data is handled

When duplicate data is identified it is removed from the payload however, exactly what gets removed depends on the configured key field.

If your given key field is a top-level field for a simple payload, the entire record will be removed. However, if the payload structure is more complex and the key field is within an array, then duplicates will be removed from that array but the parent record will remain.

Let's look at a couple of examples.

More information

The de-dupe shape supports JSON and XML payloads.

Last updated