# Adding & configuring a de-dupe shape

## Introduction

The **de-dupe** shape is used to identify and then remove duplicate entries from an incoming payload. For more background information please see our [De-dupe shape](https://doc.wearepatchworks.com/product-documentation/process-flows/building-process-flows/process-flow-shapes/advanced-shapes/de-dupe-shape) page.

## Need to know

{% hint style="danger" %}
Tracked de-dupe data is retained for 90 days after it's added to a data pool.
{% endhint %}

{% hint style="info" %}
Currently, the de-dupe shape supports JSON payloads.
{% endhint %}

## Adding a de-dupe shape

To add and configure a new **de-dupe** shape, follow the steps below.

**Step 1**\
In your process flow, add the **de-dupe** shape in the usual way:

<div align="left"><figure><img src="https://2440044887-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FLYNcUBVQwSkOMG6KjZfz%2Fuploads%2FaTLUVzs6nj90wtnawoSS%2Fadd%20dedupe%20shape%201.png?alt=media&#x26;token=1848cc93-fe92-44a9-9dc3-46149c4ded75" alt="" width="375"><figcaption></figcaption></figure></div>

**Step 2**\
Select a **source integration** and **endpoint** to determine where the incoming payload to be de-duped originates - for example:

<div align="left"><figure><img src="https://2440044887-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FLYNcUBVQwSkOMG6KjZfz%2Fuploads%2FaKxxqp2ruZIWQU2TLzfk%2Fadd%20dedupe%20shape%20-%20endpoint.png?alt=media&#x26;token=a0869c65-5b8f-44f8-a529-6c1ceef42bd0" alt="" width="352"><figcaption></figcaption></figure></div>

{% hint style="info" %}
If your incoming data is via [manual payload](https://doc.wearepatchworks.com/product-documentation/process-flows/building-process-flows/process-flow-shapes/standard-shapes/manual-payload-shape), [API request](https://doc.wearepatchworks.com/product-documentation/process-flows/building-process-flows/process-flow-shapes/advanced-shapes/de-dupe-shape/broken-reference), or [webhook](https://doc.wearepatchworks.com/product-documentation/process-flows/building-process-flows/process-flow-shapes/standard-shapes/trigger-shape/trigger-shape-webhook) then you can remove any default source instance and endpoint selections:

![](https://2440044887-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FLYNcUBVQwSkOMG6KjZfz%2Fuploads%2FabgIh9eneWCBLkaOHks5%2Fdedupe%20endpoint%20removal.png?alt=media\&token=33ed1f77-121d-447c-996f-4d8b2c2d514f)
{% endhint %}

**Step 3**\
Move down to the behaviour field and select the required option.&#x20;

{% hint style="info" %}
For more information about these options please see our [De-dupe shape behaviour](https://doc.wearepatchworks.com/product-documentation/process-flows/building-process-flows/process-flow-shapes/advanced-shapes/de-dupe-shape/..#behaviour) section.
{% endhint %}

**Step 4**\
Move down to the **data pool** field and select the required [data pool](https://doc.wearepatchworks.com/product-documentation/process-flows/building-process-flows/process-flow-shapes/advanced-shapes/de-dupe-shape/..#data-pools).

<div align="left"><figure><img src="https://2440044887-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FLYNcUBVQwSkOMG6KjZfz%2Fuploads%2F0dfnwBf1XuQK56u8aP2Y%2Fadd%20dedupe%20shape%20-%20data%20pool%20options.png?alt=media&#x26;token=c790448a-b541-4646-acbe-9d6d77cdbbb1" alt="" width="357"><figcaption></figcaption></figure></div>

{% hint style="info" %}
If necessary, you can create a data pool 'on the fly' using the **create data pool** option. For more information please see [Adding a new data pool via the de-dupe shape](https://doc.wearepatchworks.com/product-documentation/process-flows/building-process-flows/process-flow-shapes/advanced-shapes/working-with-data-pools#adding-a-new-data-pool-via-the-de-dupe-shape).
{% endhint %}

**Step 5**\
In the **key field**, select/enter the data field to be used for matching duplicate records. How you do this depends on how the incoming data is being received - please see the options below:

<details>

<summary><img src="https://2440044887-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FLYNcUBVQwSkOMG6KjZfz%2Fuploads%2FPq80F8deQ3i4P3iazHRj%2Ficon%20decision.svg?alt=media&#x26;token=61be5bf3-7235-45fe-af5f-726229df8dd6" alt="" data-size="line"> I want to choose a field from the schema associated with a connector endpoint</summary>

If the incoming payload for the de-dupe shape is received from a connection shape, you'll find that the de-dupe shape settings default to the same connection instance and endpoint. In this case, the `key field` allows you to navigate the schema that's associated with the endpoint, and select the required data item:

<img src="https://2440044887-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FLYNcUBVQwSkOMG6KjZfz%2Fuploads%2FkR5rgVo6KjivonlkcVMa%2Fdedupe%20key%20field%20-%20choose%20from%20schema.png?alt=media&#x26;token=844e0018-0424-4343-a4ea-462c08fd5e20" alt="" data-size="original">

</details>

<details>

<summary><img src="https://2440044887-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FLYNcUBVQwSkOMG6KjZfz%2Fuploads%2FPq80F8deQ3i4P3iazHRj%2Ficon%20decision.svg?alt=media&#x26;token=61be5bf3-7235-45fe-af5f-726229df8dd6" alt="" data-size="line"> I want to specify a field manually</summary>

If the incoming payload for the de-dupe shape is received via [manual payload](https://doc.wearepatchworks.com/product-documentation/process-flows/building-process-flows/process-flow-shapes/standard-shapes/manual-payload-shape), [API request](https://doc.wearepatchworks.com/product-documentation/process-flows/building-process-flows/process-flow-shapes/advanced-shapes/de-dupe-shape/broken-reference), or [webhook](https://doc.wearepatchworks.com/product-documentation/process-flows/building-process-flows/process-flow-shapes/standard-shapes/trigger-shape/trigger-shape-webhook), there is no associated instance/endpoint and therefore no known data schema. In this case, you should enter the required `key field` value manually - enter the dot notation path to the required field in your data - for example:  `*.customerID`:

<img src="https://2440044887-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FLYNcUBVQwSkOMG6KjZfz%2Fuploads%2FrBFUUKAnMNuJxMC3hAb2%2Fdedupe%20key%20field%20-%20manual%20entry.png?alt=media&#x26;token=b9e05838-96c4-4962-8e60-e8607169097e" alt="" data-size="original">

</details>

<details>

<summary><img src="https://2440044887-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FLYNcUBVQwSkOMG6KjZfz%2Fuploads%2FPq80F8deQ3i4P3iazHRj%2Ficon%20decision.svg?alt=media&#x26;token=61be5bf3-7235-45fe-af5f-726229df8dd6" alt="" data-size="line"> I want to use variables to define a dynamic key field  </summary>

If the incoming payload for the de-dupe shape is received via [manual payload](https://doc.wearepatchworks.com/product-documentation/process-flows/building-process-flows/process-flow-shapes/standard-shapes/manual-payload-shape), [API request](https://doc.wearepatchworks.com/product-documentation/process-flows/building-process-flows/process-flow-shapes/advanced-shapes/de-dupe-shape/broken-reference), or [webhook](https://doc.wearepatchworks.com/product-documentation/process-flows/building-process-flows/process-flow-shapes/standard-shapes/trigger-shape/trigger-shape-webhook), you can generate the key field value dynamically using [payload](https://doc.wearepatchworks.com/product-documentation/process-flows/building-process-flows/dynamic-variables/payload-variables), [flow](https://doc.wearepatchworks.com/product-documentation/process-flows/building-process-flows/dynamic-variables/flow-variables) and [metadata](https://doc.wearepatchworks.com/product-documentation/process-flows/building-process-flows/dynamic-variables/metadata-variables) variables.&#x20;

<img src="https://2440044887-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FLYNcUBVQwSkOMG6KjZfz%2Fuploads%2FfgkLhL95Yc5w3jXz3wDo%2Fdedupe%20key%20field%20-%20variables.png?alt=media&#x26;token=bcd237a2-7d65-4f7c-b818-be6d745efc1c" alt="" data-size="original">

Any combination of [payload](https://doc.wearepatchworks.com/product-documentation/process-flows/building-process-flows/dynamic-variables/payload-variables), [flow](https://doc.wearepatchworks.com/product-documentation/process-flows/building-process-flows/dynamic-variables/flow-variables) and [metadata](https://doc.wearepatchworks.com/product-documentation/process-flows/building-process-flows/dynamic-variables/metadata-variables) variables can be used to form cache key names. For more information please see our [Dynamic variables](https://doc.wearepatchworks.com/product-documentation/process-flows/building-process-flows/dynamic-variables) section.

</details>

{% hint style="info" %}
The selection that you make here determines how the payload is adjusted when duplicate data is removed. For more information please see [How duplicate data is handled](https://doc.wearepatchworks.com/product-documentation/process-flows/building-process-flows/process-flow-shapes/advanced-shapes/de-dupe-shape/..#how-duplicate-data-is-handled).
{% endhint %}

**Step 5**\
Select the payload format:

<div align="left"><figure><img src="https://2440044887-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FLYNcUBVQwSkOMG6KjZfz%2Fuploads%2F5bE0Yp7TCVQZZGChLQBN%2Fadd%20dedupe%20shape%20-%20payload%20format.png?alt=media&#x26;token=64b5e713-d230-448d-9ddc-a249ba90d4c1" alt="" width="359"><figcaption></figcaption></figure></div>

**Step 6**\
Save the shape.&#x20;
