1 of 4

De-Dupe shape

Introduction

The de-dupe shape can be used to handle duplicate records found in incoming payloads. It can be used in three behaviour modes:

Filter. Filters out duplicated data so only new data continues through the flow.
Track. Tracks new data but does not check for duplicated data.
Filter & track. Filters out duplicated data and also tracks new data.

A process flow might include a single de-dupe shape set to one of these modes (e.g. filter & track), or multiple de-dupe shapes at different points in a flow, with different behaviours.

Need to know

Tracked de-dupe data is retained for 90 days after it's added to a data pool.
The de-dupe shape is not atomic - as such we advise against multiple process flows attempting to update the same data pool at the same time.
The de-dupe shape works with incoming payloads from a , and also from a , , or .
JSON and XML payloads are supported.

How it works

The de-dupe shape is configured with a , a , and a :

Behaviour

As noted previously, the de-dupe shape can be used in three modes, which are summarised below.

Data pools

You can have multiple de-dupe shapes (either in the same process flow or in different process flows) sharing the same data pool. Typically, you would create one data pool for each entity type that you are processing. For example, if you are syncing orders via an 'orders' endpoint and products via a 'products' endpoint, you'd create two data pools - one for orders and another for products.

Tracked de-dupe data is retained for 90 days after it's added to a data pool.

Key field

The key field is the data field that should be used to match records. This would typically be some sort of id that uniquely identifies payload records - for example, an order id if you're processing orders, a customer id if you're processing customer data, etc.

How duplicate data is handled

When duplicate data is identified it is removed from the payload however, exactly what gets removed depends on the configured key field.

If your given key field is a top-level field for a simple payload, the entire record will be removed. However, if the payload structure is more complex and the key field is within an array, then duplicates will be removed from that array but the parent record will remain.

Let's look at a couple of examples.

More information

The de-dupe shape supports JSON and XML payloads.

Adding & configuring a de-dupe shape

Introduction

The de-dupe shape is used to identify and then remove duplicate entries from an incoming payload. For more background information please see our De-dupe shape page.

Need to know

Tracked de-dupe data is retained for 90 days after it's added to a data pool.

Currently, the de-dupe shape supports JSON payloads.

Adding a de-dupe shape

To add and configure a new de-dupe shape, follow the steps below.

Step 1 In your process flow, add the de-dupe shape in the usual way:

Step 2 Select a source integration and endpoint to determine where the incoming payload to be de-duped originates - for example:

If your incoming data is via manual payload, API request, or webhook then you can remove any default source instance and endpoint selections:

Step 3 Move down to the behaviour field and select the required option.

For more information about these options please see our De-dupe shape behaviour section.

Step 4 Move down to the data pool field and select the required data pool.

If necessary, you can create a data pool 'on the fly' using the create data pool option. For more information please see Adding a new data pool via the de-dupe shape.

Step 5 In the key field, select/enter the data field to be used for matching duplicate records. How you do this depends on how the incoming data is being received - please see the options below:

The selection that you make here determines how the payload is adjusted when duplicate data is removed. For more information please see How duplicate data is handled.

Step 5 Select the payload format:

Step 6 Save the shape.

Working with data pools

Introduction

Data pools store data entities that have been tracked via and shapes.

Data pools are created and managed via the data pools option in general settings. From here you can add a new data pool, or view/update an existing data pool.

For more background information on data pools please see and pages.

Need to know

Tracked de-dupe data is retained for 90 days after it's added to a data pool.

Adding a data pool

De-dupe data pools can be created in two ways:

Adding a new data pool via the de-dupe shape

If you're setting up a de-dupe shape and find that the data pool you want to use doesn't exist, you can create it directly from the de-dupe shape settings drawer. Once created, the de-dupe pool can be used in de-dupe shapes defined for the current process flow, and any other process flows in your company profile.

Step 1 Drop a de-dupe shape onto the canvas - settings for this shape are displayed automatically.

Step 2 Click the create data pool link beneath the data pool field:

Step 3 Enter a name and description for the required data pool, then click save:

Step 4 The de-dupe pool is created and is assigned to this shape automatically.

Adding a new data pool via general settings

You can add and manage data pools via general settings. Once created, the de-dupe pool can be used in de-dupe shapes defined for any process flows in your company profile.

Step 1 Select the settings option from the bottom of the dashboard navigation bar:

Step 2 Select data pools:

Step 3 Click the create data pool option:

...a create data pool form is displayed:

Step 4 Enter a name and a description for this data pool, then confirm with the create button:

The name field is displayed to users when .

Viewing & updating data pools

You can access existing data pools from general settings.

Step 1 Select the settings option from the bottom of the dashboard navigation bar:

Step 2 Select data pools:

...all existing data pools are displayed:

For each data pool you can see the creation date, and the data that it was last updated by a process flow run.

Step 3 To view details for a specific data pool, click the associated name in the list:

...details for the data pool are displayed:

In the top panel you can change the data pool name/description (click the update button to confirm changes) or - if the data pool is not currently in use by a process flow - you can choose to delete it.

In the lower panel you can see all data in the pool. This data is listed with the most recent entries first - the following details are shown:

Importing & exporting de-dupe data

Introduction

If required, you can import existing data into a de-dupe pool. For example, you may have records that you know have been processed elsewhere and want to ensure that they aren't processed via Patchworks.

Conversely, you can export de-dupe pool data to a CSV file, for use outside of Patchworks.

Need to know

Export file format

De-dupe data exports are completed in CSV format, delimited ONLY with a single comma between fields.

The exported file includes two columns with value and entity_type_id headers. For example:

value,entity_type_id
[email protected],47
[email protected],47
[email protected],47

Imports

Import approach

When de-dupe data values are imported:

All records in the import file are added to the data pool as new items
Any existing items in the data pool are unchecked and unchanged

Import file format

To import de-dupe values, the import file must be in the same format as export files above, with the same headers. I.e.:

value,entity_type_id
value,id
value,id
value,id

Where:

The value is the key field value that you are matching on
The entity_type_id is the internal Patchworks id for the entity type associated with the key field that you are using to match duplicates. This id must be present for every entry in your CSV file. You can download a list of ids by following steps detailed later in this page.

Import files cannot exceed 5MB.

Exporting a de-dupe data pool

To export/download a de-dupe data pool, follow the steps below.

Step 1 Log into the Patchworks dashboard, then select the settings option:

...followed by the file data pools option:

Step 2 Click the name of the data pool that you want to export:

Alternatively, you can create a new data pool.

Step 3 With the data pool in edit mode, move to the lower tracked de-dupe data panel and click the download button:

Step 4 The download job is added to a queue and a confirmation message is displayed:

Step 5 When your download is ready, you'll receive an email which includes a link to retrieve the file from the file downloads page. If you can't/don't want to use this link, you can access this page manually - click data pools in the breadcrumb trail at the top of the page:

...followed by the settings element option:

Step 6 Select the file downloads option from the settings page:

Step 7 On the file downloads page, you'll find any exports that have been completed for your company profile in the last hour. Click the download button for your job - the associated CSV file is saved to the default downloads folder for your browser.

This list may include exports from different parts of the dashboard, not just data pools (for example, run log and cross-reference lookup data exports are added here).

Step 8 Click the download button for your job - the associated CSV file is saved to the default downloads folder for your browser.

Download files are cleared after one hour. If you don't manage to download your file within this time, don't worry - just run the export again to create a new one.

Downloading the Patchworks entity id type list

If you want to import data into a de-dupe data pool, you need to ensure that each record in your CSV file includes an entity_type_id. To find which id you should use, follow the steps below to download a current list.

Step 1 Log into the Patchworks dashboard, then select the settings option:

...followed by the file data pools option:

Step 2 Click the download entity types button at the top of the page:

Step 3 A CSV file is saved to the default downloads folder for your browser.

Importing de-dupe data

To import data into a de-dupe data pool, follow the steps below.

Step 1 Log into the Patchworks dashboard, then select the settings option:

...followed by the file data pools option:

Step 2 If you want to import data into an existing data pool, click the name of the required data pool from the list:

Alternatively, you can create a new data pool.

Step 3 Move to the lower tracked de-dupe data panel and click the import button:

Step 4 Navigate to the CSV file that you want to import and select it:

Step 5 The file is uploaded and displayed as a button - click this button to complete the import:

Step 6 The import is completed - existing values are updated and new values are added:

You may need to refresh the page to view the updated data pool.

Importing & exporting de-dupe data

Introduction

Conversely, you can export de-dupe pool data to a CSV file, for use outside of Patchworks.

Need to know

Export file format

De-dupe data exports are completed in CSV format, delimited ONLY with a single comma between fields.

The exported file includes two columns with value and entity_type_id headers. For example:

value,entity_type_id
[email protected],47
[email protected],47
[email protected],47

Imports

Import approach

When de-dupe data values are imported:

All records in the import file are added to the data pool as new items
Any existing items in the data pool are unchecked and unchanged

Import file format

To import de-dupe values, the import file must be in the same format as export files above, with the same headers. I.e.:

value,entity_type_id
value,id
value,id
value,id

Where:

The value is the key field value that you are matching on
The entity_type_id is the internal Patchworks id for the entity type associated with the key field that you are using to match duplicates. This id must be present for every entry in your CSV file. You can download a list of ids by following steps detailed later in this page.