If required, you can import existing data into a de-dupe pool. For example, you may have records that you know have been processed elsewhere and want to ensure that they aren't processed via Patchworks.
Conversely, you can export de-dupe pool data to a CSV file, for use outside of Patchworks.
De-dupe data exports are completed in CSV format, delimited ONLY with a single comma between fields.
The exported file includes two columns with value
and entity_type_id
headers. For example:
When de-dupe data values are imported:
All records in the import file are added to the data pool as new items
Any existing items in the data pool are unchecked and unchanged
To import de-dupe values, the import file must be in the same format as export files above, with the same headers. I.e.:
Where:
The value
is the key field value that you are matching on
The entity_type_id
is the internal Patchworks id for the entity type associated with the key field that you are using to match duplicates. This id must be present for every entry in your CSV file. You can download a list of ids by following steps detailed later in this page.
Import files cannot exceed 5MB.
To export/download a de-dupe data pool, follow the steps below.
Step 1 Log into the Patchworks dashboard, then select the settings option:
...followed by the file data pools option:
Step 2 Click the name of the data pool that you want to export:
Alternatively, you can create a new data pool.
Step 3 With the data pool in edit mode, move to the lower tracked de-dupe data panel and click the download button:
Step 4 The download job is added to a queue and a confirmation message is displayed:
Step 5 When your download is ready, you'll receive an email which includes a link to retrieve the file from the file downloads page. If you can't/don't want to use this link, you can access this page manually - click data pools in the breadcrumb trail at the top of the page:
...followed by the settings element option:
Step 6 Select the file downloads option from the settings page:
Step 7 On the file downloads page, you'll find any exports that have been completed for your company profile in the last hour. Click the download button for your job - the associated CSV file is saved to the default downloads folder for your browser.
This list may include exports from different parts of the dashboard, not just data pools (for example, run log and cross-reference lookup data exports are added here).
Step 8 Click the download button for your job - the associated CSV file is saved to the default downloads folder for your browser.
Download files are cleared after one hour. If you don't manage to download your file within this time, don't worry - just run the export again to create a new one.
If you want to import data into a de-dupe data pool, you need to ensure that each record in your CSV file includes an entity_type_id. To find which id you should use, follow the steps below to download a current list.
Step 1 Log into the Patchworks dashboard, then select the settings option:
...followed by the file data pools option:
Step 2 Click the download entity types button at the top of the page:
Step 3 A CSV file is saved to the default downloads folder for your browser.
To import data into a de-dupe data pool, follow the steps below.
Step 1 Log into the Patchworks dashboard, then select the settings option:
...followed by the file data pools option:
Step 2 If you want to import data into an existing data pool, click the name of the required data pool from the list:
Alternatively, you can create a new data pool.
Step 3 Move to the lower tracked de-dupe data panel and click the import button:
Step 4 Navigate to the CSV file that you want to import and select it:
Step 5 The file is uploaded and displayed as a button - click this button to complete the import:
Step 6 The import is completed - existing values are updated and new values are added:
You may need to refresh the page to view the updated data pool.
The de-dupe shape is used to identify and then remove duplicate entries from an incoming payload. For more background information please see our De-dupe shape page.
Tracked de-dupe data is retained for 90 days after it's added to a data pool.
Currently, the de-dupe shape supports JSON payloads.
To add and configure a new de-dupe shape, follow the steps below.
Step 1 In your process flow, add the de-dupe shape in the usual way:
Step 2 Select a source integration and endpoint to determine where the incoming payload to be de-duped originates - for example:
If your incoming data is via manual payload, API request, or webhook then you can remove any default source instance and endpoint selections:
Step 3 Move down to the behaviour field and select the required option.
For more information about these options please see our De-dupe shape behaviour section.
Step 4 Move down to the data pool field and select the required data pool.
If necessary, you can create a data pool 'on the fly' using the create data pool option. For more information please see Adding a new data pool via the de-dupe shape.
Step 5 In the key field, select/enter the data field to be used for matching duplicate records. How you do this depends on how the incoming data is being received - please see the options below:
The selection that you make here determines how the payload is adjusted when duplicate data is removed. For more information please see How duplicate data is handled.
Step 5 Select the payload format:
Step 6 Save the shape.
The de-dupe shape can be used to handle duplicate records found in incoming payloads. It can be used in three behaviour modes:
Filter. Filters out duplicated data so only new data continues through the flow.
Track. Tracks new data but does not check for duplicated data.
Filter & track. Filters out duplicated data and also tracks new data.
A process flow might include a single de-dupe shape set to one of these modes (e.g. filter & track), or multiple de-dupe shapes at different points in a flow, with different behaviours.
Tracked de-dupe data is retained for 90 days after it's added to a data pool.
The de-dupe shape is not atomic - as such we advise against multiple process flows attempting to update the same data pool at the same time.
The de-dupe shape works with incoming payloads from a , and also from a , , or .
JSON and XML payloads are supported.
The de-dupe shape is configured with a , a , and a :
As noted previously, the de-dupe shape can be used in three modes, which are summarised below.
You can have multiple de-dupe shapes (either in the same process flow or in different process flows) sharing the same data pool. Typically, you would create one data pool for each entity type that you are processing. For example, if you are syncing orders via an 'orders' endpoint and products via a 'products' endpoint, you'd create two data pools - one for orders and another for products.
Tracked de-dupe data is retained for 90 days after it's added to a data pool.
The key field
is the data field that should be used to match records. This would typically be some sort of id
that uniquely identifies payload records - for example, an order id
if you're processing orders, a customer id
if you're processing customer data, etc.
When duplicate data is identified it is removed from the payload however, exactly what gets removed depends on the configured key field
.
If your given key field is a top-level field for a simple payload, the entire record will be removed. However, if the payload structure is more complex and the key field is within an array, then duplicates will be removed from that array but the parent record will remain.
Let's look at a couple of examples.
The de-dupe shape supports JSON and XML payloads.
Data pools store data entities that have been tracked via and shapes.
Data pools are created and managed via the data pools option in general settings. From here you can add a new data pool, or view/update an existing data pool.
For more background information on data pools please see and pages.
Tracked de-dupe data is retained for 90 days after it's added to a data pool.
De-dupe data pools can be created in two ways:
You can access existing data pools from general settings.
Step 1 Select the settings option from the bottom of the dashboard navigation bar:
Step 2 Select data pools:
...all existing data pools are displayed:
For each data pool you can see the creation date, and the data that it was last updated by a process flow run.
Step 3 To view details for a specific data pool, click the associated name in the list:
...details for the data pool are displayed:
In the top panel you can change the data pool name/description (click the update button to confirm changes) or - if the data pool is not currently in use by a process flow - you can choose to delete it.
In the lower panel you can see all data in the pool. This data is listed with the most recent entries first - the following details are shown:
Mode | Summary |
---|
Data pools are and are used to organise de-dupe data. Once a data pool has been created it becomes available for selection when configuring a de-dupe shape for a process flow.
When data passes through a de-dupe shape which is set for tracked behaviour, the value associated with the for each new record is logged in the data pool. So, the data pool will contain all unique key field values that have passed through the shape.
Column | Summary |
---|
Filter | Remove duplicate data from the incoming payload so only new data continues through the flow. New data is NOT tracked. |
Track | Log each new key value received in the data pool. |
Filter & track | Remove duplicate data from the incoming payload AND log each new key value received. |
Value |
Created by | The name of the process flow where this entry was tracked into the data pool. Click this name to open the associated process flow. |
Updated at | The date and time that the record was added to the pool (UTC time). |
The value of the field that was identified as a match for duplicate records. This is the field defined as the key
to be used for de-dupe shapes - for example:
In this example, the de-dupe ke
y is set to id
, so the value
field shown in the data pool will display id
values.