Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This approach assumes that the cache to be loaded was added with a payload variable for the cache key, and is comprised of multiple, single-record payloads (having been through a flow control shape).
Each of these payloads has its own, unique cache key
(when data was added to the cache, this key was generated dynamically by resolving a cache key
payload variable).
For more information about this stage, please see Generating dynamic cache keys with payload variables.
When we come to load this data, we must target the required cache keys. In the same way that we use a payload variable to add data to a cache with dynamic cache keys, we can use a payload variable to load data from these keys.
To do this, you configure a load from cache shape with a 'multi-pick' payload variable in the cache key
, and ensure that data passed into this shape contains the values required to resolve this variable.
In summary, you can drop a single load from cache shape into a process flow and specify a payload variable as the required cache key
. This must be in the form:
...where <element>
should be replaced with whichever data element you will be passing in to to resolve the cache key. For example:
The <element>
defined here will be the same data element that was specified in the payload variable for the corresponding add to cache shape.
You then need to pass in any <element>
values that should be used to resolve required cache key
names. This might be achieved via a connection shape (if values are being generated from another system), or perhaps a manual payload shape. Whichever shape you use must be placed immediately before the load from cache shape.
To help understand how this approach works, we will step through an example.
Suppose we have the scenario where a process flow has been built to receives incoming orders, and another process flow needs to target specific orders received from this flow.
Process flow 1: Add to cache
To allow the second process flow access to orders processed by the first, we must add all incoming orders to a company
type cache in the first process flow (remember that company
type caches can be accessed by any other process flow created for your company profile). To ensure that we can go on to target specific orders from this cache later, we will cache every order in its own cache key, using a payload variable.
Process flow 2: Load from cache To retrieve specific orders from the cache created in the first process flow, we will pass the required order ids into a load from cache shape. These ids will be used to resolve dynamic cache keys, using a payload variable.
Here, we will batch an 'orders' payload into single order payloads - then we'll add each payload to its own cache key, which is created dynamically from a payload variable. Let's break these steps down:
Here, we will pass the required order ids into a load from cache shape. These ids are then used to resolve dynamic cache keys (via a payload variable) to determine which orders should be loaded. Let's break these steps down:
The add to cache shape is used to cache (i.e. store a copy of) the payload as it stands at that point in the process flow.
You can add as many add to cache shapes as you like in a process flow. For example, you might place want to cache a payload as soon as it gets pulled from a source connection, and again later after it's been transformed. For example:
How long a cached payload remains available depends on the cache level selected when you configured the add to cache shape in your process flow.
During routine platform maintenance, cached data may be cleared. While we make a best effort to retain data for up to 7 days, it could be cleared sooner. Please design your process flows accordingly.
The default behaviour is for the existing cache to be overwritten each time it is updated. Please see the Appending data to a cache page for information about appending data.
The maximum cache size is 50MB.
Cache names must not include full stop (.) or colon (:) characters.
Cached data is stored in Amazon S3.
To add an add to cache shape to a process flow, follow the steps below.
Step 1 Find the point in your process flow where you want to cache the payload - typically this would be after a 'GET' connection shape, or perhaps after data has been mapped or manipulated via a script.
Step 2 Select the add to cache shape from the shapes palette:
Step 3 Click the create cache option:
...cache options are displayed:
Step 4 Click in the cache level > select cache field to choose when/where this cache will be available:
Choose from the following options:
Flow run
The data associated with this cache is only available while the process flow is running. When a process flow run
completes, existing cached data is deleted. How this happens depends on whether the process flow is enabled & deployed.
Enabled & deployed process flows
In this case, the flow run
cache is cleared as soon as the process flow completes.
Draft/inactive process flows
In this case, we use a TTL (Time to Live) with a default of 2 hours to determine when the cached data is deleted. There's no chance that flow run
cached data could be re-used in the TTL deletion window - each time a flow run cache is used, a unique flow run id
is added to the cache key used to get and set data. Because every process flow run has a unique run id
, there's no possibility for another flow run to access the data from a previous run.
Flow
Data in the cache is retained after the process flow is run, so it can be loaded again within this process flow if required.
Cache retention
When you choose to add a flow
cache, retention options are available so you can decide how long cached data should be retained (you can set a time limit in seconds, minutes, hours, or days).
The default setting is 2 hours. This can be updated to a maximum of 7 days.
Company
The data associated with the latest update to this cache is available for use in this process flow and in any other process flows created within your company profile.
Step 5 Enter a name for this cache:
The cache name must not include full stop (.) or colon (:) characters.
Step 6 If you have chosen a flow-level or company-level cache, you can set a data retention period to determine when this data will expire - for example:
The data retention period for a flow run-level cache is always 2 hours - this cannot be changed. The maximum retention period for a flow-level or company-level cache is 7 days.
Step 7 Save changes to exit back to add to cache settings where you can continue with your newly created cache.
Step 8 Click in the select a cache field and select your new cache from the list:
Step 9 Enter a cache key to identify this cache object - for example:
Your cache key
can be:
Static
Data is cached to the key exactly as it is specified. Typically used when your aim it to load the entire cache later in the flow (or in other flows).
orders
Dynamic
order-[[payload.0.id]]
A cache key cannot exceed 128 characters.
If you are adding a company-level cache, you may want to make a note of the key that you specify here, so it can be shared with other users in your organisation who may want to reference this cache in their process flows.
Step 10 If you have multiple incoming payloads (typically where source data is paginated or has been through flow control), you should consider how these payloads are cached. The save all pages option determines cache behaviour for multiple incoming payloads:
Save all pages toggled ON. All incoming payloads are saved for your cache key. If you access the cache, you'll see each page listed with a page number - for example:
Save all pages toggled OFF. Data associated with the given cache key is overwritten each time one of the multiple payloads is saved - so only the final payload is saved - for example:
It's important to understand how the save all pages option works in conjunction with the append option. If you aren't sure, please see our Cache pagination options page before proceeding.
Step 11 Set the append option as required. If this option is toggled ON, incoming data is appended to the existing cache key each time an update is made. If this option is toggled OFF, the cache key is overwritten with new data each time.
For more information see our Appending data to a cache page.
Step 12 Save changes. The add to cache shape is added to your process flow, displaying the given name and key - for example:
Yes. As with any other process flow shape, you can view the associated payload for an add from cache shape after the process flow has run. To do this, click the shape's tick icon and then select the payload tab in the run log panel - for example:
If you place an add to cache shape before a shape which generates multiple payloads (typically, a flow control shape), you can see each payload that is created via the payload dropdown - for example:
Cached data can be loaded via our load from cache shape. Please refer to the Load from cache shape section for more information.
The load from cache shape is used to retrieve a stored payload from an existing cache key (created from an add to cache shape).
You might configure a load from cache shape in the same process flow as the original add to cache step or - if a cache was added and set to company level - you might choose to load it in a different process flow.
To add a load from cache shape to a process flow, follow the steps below.
Step 1 Find the point in your process flow where you want to load the payload from a cache - this could be at the very start of a process flow, or perhaps somewhere further down.
Step 2 Select the load from cache shape from the shapes palette:
Step 3 Click in the select cache field and choose which cache you want to retrieve:
In this list, you'll find any caches that have been added to this process flow (via the add to cache shape), together with any caches that have been added to other process flows and set to a cache level of company.
Step 4 Enter the cache key that you want to retrieve - for example:
Your given cache key might be static or dynamic, depending on how the cache was configured in the corresponding add to cache shape:
Static
Data is cached to the key exactly as it is specified. Typically used when your aim it to load the entire cache later in the flow (or in other flows).
orders
Dynamic
order-[[payload.0.id]]
OR
order-[payload.*.id]]
For detailed information about each of these approaches, please see What cached data do you want to load?
The cache key must be associated with an existing add to cache shape, either in the same process flow or (in the case of company-level caches) in another process flow.
Step 5 If you want this process flow to fail if for any reason this cache can't be retrieved, tick the fail on cache miss option:
If you leave this option un-ticked, the process flow will continue to run if the cache can't be loaded.
Step 6 If the cache that you're loading was created with the save all pages option toggled ON, you should toggle the load all pages option ON when loading this data:
When paginated data is pulled from a connection shape, a payload is created for each page. If the save all pages option is toggled ON when a cache is created, the payload for each page is saved to its own cache key (with key names generated dynamically from a specified key and page numbers). If the save all pages option is toggled OFF, all pages are saved to a single cache key. For more information please see our Cache pagination options.
Step 7 Save changes. The load from cache shape is added to your process flow, displaying the given name and key - for example:
Yes. As with any other process flow shape, you can view the associated payload for a load from cache shape after the process flow has run. To do this, click the shape's tick icon and then select the payload tab in the run log panel - for example:
Understanding how pagination options impact what data is cached.
When you drop an add to cache shape into a process flow, there are two options that you should consider if your selected endpoint paginates the data that is received OR you generate multiple payloads in some other way (for example, via the flow control shape). These options are: save all pages
and append
.
Together, these two options determine how multiple payloads are cached, so it's important to understand the implications of each.
On this page we focus on paginated data however, the same principles apply whenever multiple payloads are cached, irrespective of whether those payloads are generated via pagination or some other means (for example, via the flow control shape).
When paginated data is pulled from a connection shape, a payload is created for each page - you can see these in the run log payload tab:
If you are caching paginated data and choose to toggle the save all pages
option to on
, the payload for each page is saved with its page number and a unique key
. For example:
The unique key
is generated dynamically, by adding the page number to your specified cache key. If the cache is a flow run
type, the unique key
will also incorporate the flow run id
.
It's important to note that every time a connection shape pulls paginated data, page numbers reset to 1.
When the append option is toggled ON, incoming payloads are appended to cache keys. How this works depends on the save all pages option:
The given cache key
is overwritten each time a payload is cached. As such, the cache key
will only ever include data from the LAST payload received.
The first time that multiple payloads are received, each one is saved to its own unique key
, against your specified cache key
.
Next time the cache receives data, any existing unique keys
associated with your specified cache key
are overwritten. Additional unique keys
are created for new payloads/pages as needed).
As such, each unique key
will only ever contain the latest data for the correlating payload/page number.
Each payload is appended to your specified cache key
. As such, data in the cache key continues to grow with each data pull - nothing is overwritten.
The first time that multiple payloads are received, each one is saved to its own unique key
, against your specified cache key
.
Next time the cache receives data, any existing unique keys
associated with your specified cache key
are appended with the latest data from correlating payload/page numbers. Additional unique keys
are created for new payloads/pages as needed).
As such, data associated with existing unique keys
continues to grow with each data pull.
The diagram below illustrates this:
For information about setting the append option, please see our Appending data to a cache page.
This approach is the simplest - all incoming data is cached with a static cache key.
In the example below, all incoming customer records will be added to a cache named ALLcustomers
and a static cache key named customers
:
When the data is cached, it's likely that the cache will include multiple records - for example:
To retrieve this cache, we simply drop a load from cache shape where required in the process flow and specify the same cache and cache key that were defined in the corresponding add to cache shape:
This approach assumes The load from cache shape works as normal to retrieve cached data where the cache was created with a payload variable - you choose the cache name and key to be loaded:
However, the important point to consider is that the cache key that you specify here will have been generated from the payload variable that was specified when the cache was created.
If a payload variable has been used to cache data, you would typically have included a flow control shape to create multiple payloads - for example:
So you will have multiple cache keys that can be loaded. To do this, you can add one load from cache shape for every cache key
that you want to retrieve, specifying the required key in each case. For example:
Alternatively, you can add a single load from cache shape and target specific cache keys by passing in the required ids.
This approach assumes that the cache to be loaded was added with a payload variable for the cache key, and is comprised of multiple, single-record payloads (having been through a flow control shape).
Each of these payloads has its own, unique cache key
(when data was added to the cache, this key was generated dynamically by resolving a cache key
payload variable).
For more information about this stage, please see Generating dynamic cache keys with payload variables.
When we come to load this data, we must target the required cache key. If you only want a single item, the quickest way is to specify the resolved cache key.
The load from cache shape works as normal - you choose the cache
and cache key
to be loaded:
However, the important point to consider is that the cache key
that you specify here will have been generated dynamically by resolving the payload variable that was specified when the cache was added.
Consider the following process flow:
Here, our manual payload contains customer data as below:
To allow us to target specific customer records from this payload, we send it through a flow control shape, which is set to creating one payload for customer:
...so now we have lots of payloads to be cached:
If we look at the payload for the first of these, we can see it contains a single customer record - notice that there's an id
field with a value of 1000000001
. This field uniquely identifies each record.
Next we define an add to cache shape - we create a new cache and use a payload variable to generate a dynamic cache key for each incoming payload:
Here, they payload variable is defined as:
where:
customer-
is static text to prefix the resolved variable.
[[payload.]]
instructs the shape that this variable should be resolved from the incoming payload.
0
denotes that the first occurrence of the following item found in the payload should be used to resolve this variable.
id
is the name of the field in the payload to be used to resolve this variable
So, if we take our first payload above:
...our payload variable would resolve to the following cache key:
This is what we use in our load from cache shape:
You can view and manage all existing caches from the data caches page - to access this page, select caches from the dashboard navigation menu.
During routine platform maintenance, cached data may be cleared. While we make a best effort to retain data for up to 7 days, it could be cleared sooner. Please design your process flows accordingly.
The data caches page is split into three sections: flow run caches, flow caches, and company caches:
Each cache is listed with the following details:
Name
Flow
For flow
and flow run
caches, this is the name of the process flow which is using the cache. This information is not shown for company
caches, as the cache might be used in any process flow within a company profile.
Created
The date and time that the cache was created.
Last accessed
The date and time that the cache was last accessed by a process flow. The cache may or may not have been updated with data at this time (even if there is no data to be added, the access date/time is logged).
Keys
Size
The current size of the cache in proportion to the limit.
Delete the cache (and all associated data).
If you have a lot of caches, you can search by name:
To access cache details for a particular cache, click on its name:
When you select a cache from the list, an edit cache page is displayed:
From here you can:
To change the name of the cache, simply update the name field in the upper cache details panel, then click the save button.
When the name is updated and saved, the change is immediately reflected in any add to cache shapes in process flows, where this cache is used.
The cache name must not include full stop (.) or colon (:) characters.
You can use the maximum age slider to change the cache retention period for a cache:
Note that:
The maximum age for a flow run cache is 2 hours - this cannot be changed
The maximum age for a flow or company cache can be changed to anything up to 7 days
The usage panel shows general usage information about the cache:
Here you can see:
Size
The current size of the cache, shown with a percentage use indicator. The maximum cache size is 50MB
Created
The date and time that the cache was created.
Last accessed
The date and time that the cache was last accessed. This timestamp is updated even when no data was added to the cache.
Keys
The number of keys associated with this cache.
The cache contents panel displays an entry for each cache key update. Information shown varies, depending on the cache type.
The following details are displayed for each cache item in a flow run-level cache:
Flow Run ID
The unique id of the process flow run that updated the cache key.
Started at
The date and time that the process flow run was started.
Key
The cache key name.
Page
Unique key
The internal cache key.
Size
The size of the cache key.
The following details are displayed for each cache item:
Key
The cache key name.
Page
Unique key
The internal cache key.
Size
The size of the cache key.
To clear all current content in the cache, click the clear cache button:
This removes any existing data but leaves the cache in place so it can still be used in process flows.
If caches have been added to your process flow or company-level caches have been added for use in any process flow, you can reference these in field mapping transformations.
Using a cache lookup transformation function, you can look up values from a cache and map them to fields in a target system.
If you've added/updated a map shape before, you'll be used to selecting a source field and a target field. However, when referencing a cache we don't select a source field - the specified cache data is our source.
Step 1 In your process flow, access settings for the map shape that you want to update:
Step 2 Click the add mapping rule option - for example:
Step 3 Click the add transform button:
Step 4 Click the add transform button:
Step 5 Click in the name field to access a list of all available transform functions, then select cache lookup:
Step 6 Cache reference fields are displayed:
Complete these fields using the table below as a guide:
Cache
Use the dropdown list to select the cache that you want to reference. Available caches will be:
All flow-level caches added for this process flow
All company-level caches added from any process flow
All flow run-level caches created for this run
Key
Lookup
You can use dot notation to look up specific elements from the cached payload. If you leave this field blank, the full cached payload is retrieved.
Default
If required, specify a default value to be used if the cache lookup transform doesn't find a value to return.
Load all pages
Fail on miss
If this option is togged ON, the map shape (and therefore the process flow) will fail if the cache lookup can't be fulfilled.
Step 7 Accept your changes:
...then save the transformation:
Step 8 Now you can select a target field in the usual way. Once your mapping is complete, the row should be displayed without a source field - for example:
From here you can save changes or add more mapping rules as needed. Next time the process flow runs, the specified cache values will be mapped to the target field.
The steps detailed above show how to configure the cache lookup transform with a known cache key. However, it's possible to populate the cache key automatically, using the output from a previous transform function.
To do this, you add a mapping row in the usual way and define any required transform functions to produce the required value for cache keys. Once this is done, add a cache lookup transform function (as shown above) but leave the key
field blank.
When the key
field is blank, output from the previous transform function for the mapping is applied.
Suppose you have a cache where multiple cache keys have been defined in the form:
itemref
-last_name
For example:
1000021-Smith
Now suppose you want to define a cache lookup transformation which will determine the key by manipulating mapped fields. You would:
Add a mapping row with two source fields - one for itemref
and another for last_name
.
Select itemref
as the target field.
Add a concatenate transform function to join itemref
and last_name
fields with a hyphen.
Add a cache lookup transform function as defined above, but leave the key
field blank
When the process flow runs, output from the concatenate transform function will be applied as the key
for the cache lookup transform function.
The example above describes how you might use a concatenate transform function as the means to generate a cache key however, the output from any transform function can be used.
When a process flow runs, the payload for received data flows through to subsequent steps. In a straightforward scenario we pull data from one connection, then perhaps apply filters and/or scripts before mapping/transforming data fields and finally pushing the payload into a target connection. This is a very linear example - we start with a payload and it flows all the way through to completion.
However, more complex scenarios might need to use a payload that was generated several steps previously, or even from a different process flow. This is where the add to cache and load from cache shapes come in.
Wherever you place an add to cache shape shape in a process flow, it will cache (i.e. store a copy of) the payload as it stands at that point in the process flow. You can then use a load from cache shape to reference this payload elsewhere in the same process flow and/or in other process flows for your organisation (depending on how the add to cache shape is configured).
For more information please see:
Data pools store data entities that have been tracked via de-dupe shape and track data shapes.
Data pools are created and managed via the data pools option in general settings. From here you can add a new data pool, or view/update an existing data pool.
For more background information on data pools please see de-dupe shape and track data pages.
Tracked de-dupe data is retained for 90 days after it's added to a data pool.
De-dupe data pools can be created in two ways:
You can access existing data pools from general settings.
Step 1 Select the settings option from the bottom of the dashboard navigation bar:
Step 2 Select data pools:
...all existing data pools are displayed:
For each data pool you can see the creation date, and the data that it was last updated by a process flow run.
Step 3 To view details for a specific data pool, click the associated name in the list:
...details for the data pool are displayed:
In the top panel you can change the data pool name/description (click the update button to confirm changes) or - if the data pool is not currently in use by a process flow - you can choose to delete it.
In the lower panel you can see all data in the pool. This data is listed with the most recent entries first - the following details are shown:
Value
Created by
The name of the process flow where this entry was tracked into the data pool. Click this name to open the associated process flow.
Updated at
The date and time that the record was added to the pool (UTC time).
If you have defined custom scripts for use in process flows, use the script shape to select a script to apply at a given point in a process flow.
You can use any version of a script which has been saved and deployed.
Creating a custom script is an advanced feature which requires some in-house development expertise.
Step 1 In your process flow, add the script shape in the usual way:
Step 2 You're prompted to select an existing script:
Step 3 Select the script that you want to use at this point in the process flow:
The list of available scripts only includes scripts which are currently deployed for use.
Step 4 The latest deployed version of the script is added to the shape - for example:
Code is displayed in view-mode. If you need to change the script, save your shape now and then use the left-hand navigation bar to access process flows > custom scripts.
Step 5 Unless you have a specific reason to do otherwise, we advise using the latest version of scripts. However, if you do need to use a previous version of the script, select the 'versions' dropdown field to make your selection - for example:
Step 6 Save the shape:
To view/change the selected script for an existing script shape, click the associated 'cog' icon:
From here, the existing script is displayed - you can either select a different script, or a different version of the existing script:
Remember that the script code can't be changed here. If you need to change the script, save your shape now and then use the left-hand navigation bar to access process flows > custom scripts.
A script will time out if it runs for more than 120 seconds.
Loading data from a cache is very straightforward using the load from cache shape, however you do need to consider what data you want to load. You can:
Each of these options requires a slightly different approach, as summarised in the diagram below and explained in subsequent sections:
It's not currently possible to access different versions of a cache. So, each time a process flow runs with the same add to cache shape, the payload for that cache is overwritten/ with the latest data and it's this that will be available to load from a company
cache.
Cache retention
When you choose to add a company
cache, retention options are available so you can decide how long cached data should be retained (you can set a time limit in seconds, minutes, hours, or days).
The default setting is 2 hours. This can be updated to a maximum of 7 days.
The cache key resolves dynamically using variables Typically used when your aim it to load single or multiple items from the cache later in the flow (or in other flows). For more information please see our .
The cache key resolves dynamically based on a payload variable. Typically used when your aim it to load single or multiple items from the cache later in the flow (or in other flows). For more information please see our .
The name that was specified when the cache was created. Caches are created via the .
The number of cache keys associated with this cache. Cache keys are created via the .
If multiple pages are added to a cache (for example, if incoming data is paginated or batched via ) and the save all pages option is togged ON), each page is listed individually.
Click the 'eye' icon to view the content associated with this key. For example:
If multiple pages are added to a cache (for example, if incoming data is paginated or batched via ) and the save all pages option is togged ON), each page is listed individually.
Click the 'eye' icon to view the content associated with this key. For example:
Enter the key that was specified in the for the cache that you want to access here. Alternatively, if this transformation is preceded by another transformation function, you can leave this field blank and pick up a value from the output of the previous function. For further information please see the section.
When paginated data is pulled from a , a payload is created for each page. If the save all pages option is toggled ON when a cache is created, the payload for each page is saved to its own cache key (with key names generated dynamically from a specified key and page numbers). The load all pages option here can be used if you want to lookup all of these pages.
The value of the field that was identified as a match for duplicate records. This is the field defined as the key
to be used for de-dupe shapes - for example:
In this example, the de-dupe ke
y is set to id
, so the value
field shown in the data pool will display id
values.
Shapes detailed in this section are available for Professional and Enterprise core subscription tiers, or as a bolt-on for the Standard tier.
Step 1: Manual payload
The manual payload shape contains an 'orders' payload with 17 orders in total.
Step 2: Filter
The filter shape ensures that orders are only processed if the id
field is not empty.
Step 3: Flow control
The flow control shape is set to create batches of 1 from the payload root level - so every order will be added to its own payload.
Step 4: Add to cache
The add to cache shape is defined to add to a company
type cache, named CPT-722. The cache key
is created dynamically, where the first part is always order-followed by the value of the first id element found in the incoming payload. E.g. order-5697116045650
. All data from the incoming payload will be added to this cache key. Taking our example using flow control, the incoming payload will only ever be a single order.
Step 5: Run flow
When this process flow runs, checking payload information for the add to cache shape shows that 17 payloads have been cached - one payload for each order.
Step 1: Manual payload
The manual payload shape contains two order ids that we want to load from our cache.
Step 2: Load from cache
The load from cache shape is configured to load data from our CPT-722 cache, targeting dynamic cache keys from order-[[payload.*.id]]
. Here, the required cache key(s) will be resolved from all (*
) ids found in the incoming payload - in this case order-5693105439058
and order-5697116045650
.
Step 3: Run flow
When this process flow runs, checking payload information for the load from cache shape shows that two payloads have been loaded - one for each of our given ids.
If required, you can import existing data into a de-dupe pool. For example, you may have records that you know have been processed elsewhere and want to ensure that they aren't processed via Patchworks.
Conversely, you can export de-dupe pool data to a CSV file, for use outside of Patchworks.
De-dupe data exports are completed in CSV format, delimited ONLY with a single comma between fields.
The exported file includes two columns with value
and entity_type_id
headers. For example:
When de-dupe data values are imported:
All records in the import file are added to the data pool as new items
Any existing items in the data pool are unchecked and unchanged
To import de-dupe values, the import file must be in the same format as export files above, with the same headers. I.e.:
Where:
The value
is the key field value that you are matching on
The entity_type_id
is the internal Patchworks id for the entity type associated with the key field that you are using to match duplicates. This id must be present for every entry in your CSV file. You can download a list of ids by following steps detailed later in this page.
Import files cannot exceed 5MB.
To export/download a de-dupe data pool, follow the steps below.
Step 1 Log into the Patchworks dashboard, then select the settings option:
...followed by the file data pools option:
Step 2 Click the name of the data pool that you want to export:
Alternatively, you can create a new data pool.
Step 3 With the data pool in edit mode, move to the lower tracked de-dupe data panel and click the download button:
Step 4 The download job is added to a queue and a confirmation message is displayed:
Step 5 When your download is ready, you'll receive an email which includes a link to retrieve the file from the file downloads page. If you can't/don't want to use this link, you can access this page manually - click data pools in the breadcrumb trail at the top of the page:
...followed by the settings element option:
Step 6 Select the file downloads option from the settings page:
Step 7 On the file downloads page, you'll find any exports that have been completed for your company profile in the last hour. Click the download button for your job - the associated CSV file is saved to the default downloads folder for your browser.
This list may include exports from different parts of the dashboard, not just data pools (for example, run log and cross-reference lookup data exports are added here).
Step 8 Click the download button for your job - the associated CSV file is saved to the default downloads folder for your browser.
Download files are cleared after one hour. If you don't manage to download your file within this time, don't worry - just run the export again to create a new one.
If you want to import data into a de-dupe data pool, you need to ensure that each record in your CSV file includes an entity_type_id. To find which id you should use, follow the steps below to download a current list.
Step 1 Log into the Patchworks dashboard, then select the settings option:
...followed by the file data pools option:
Step 2 Click the download entity types button at the top of the page:
Step 3 A CSV file is saved to the default downloads folder for your browser.
To import data into a de-dupe data pool, follow the steps below.
Step 1 Log into the Patchworks dashboard, then select the settings option:
...followed by the file data pools option:
Step 2 If you want to import data into an existing data pool, click the name of the required data pool from the list:
Alternatively, you can create a new data pool.
Step 3 Move to the lower tracked de-dupe data panel and click the import button:
Step 4 Navigate to the CSV file that you want to import and select it:
Step 5 The file is uploaded and displayed as a button - click this button to complete the import:
Step 6 The import is completed - existing values are updated and new values are added:
You may need to refresh the page to view the updated data pool.
When an add to cache shape is dropped into a process flow, the entire incoming payload is cached and associated with the given cache key. Depending on the cache type, you can load this cache later in the same flow or in a different flow.
In the simplest scenario, your given cache key would be a static value (e.g. customers
) and you would use this to load the entire cache (containing perhaps tens, hundreds, even thousands of items) where required. But what if you want to load a specific item from a cache, rather than the whole thing?
This is where dynamic cache keys are so useful.
To load data from a cache, you configure a load from cache shape with the required cache
and a single cache key
. All data associated with your given cache key
is loaded.
Consider the example incoming payload below, where four records are cached with a static cache key
with a value of customers
:
If we were to configure a load from cache shape to access the customers
cache key, all four records would be loaded.
So, in order to load specific items from a cache, the incoming data must be added to a cache in such a way that we can easily target individual items. We need an efficient way to take incoming data, batch it into single-record payloads and add each of these to the cache with its own unique, identifying cache key - i.e.:
We can achieve this as follows:
The incoming payload is batched into multiple payloads - one payload per data
element (e.g. one order per payload, one
customer per payload, one product per payload, etc.).
Configure the add to cache shape and specify a payload variable as the cache key, where the variable looks for the first occurrence of a uniquely identifying element in the payload (typically an id or reference number).
The add to cache shape receives and
caches single-record payloads from the flow
control shape. The cache key for each
payload is generated dynamically by
resolving the payload variable from each
incoming payload.
Flow control is an easy way to batch incoming data into single-record payloads, however you may prefer an alternative approach. The important point is that the add to cache shape must receive single-record payloads - how you achieve this is up to you.
When you specify a dynamic variable as the cache key, the value for that variable is injected into the key. To prevent the case where large amounts of data are passed into the key, there is a character limit is 128 characters.
Follow the steps below to configure an add to cache shape with a payload variable for generating dynamic cache keys.
These steps assume that you have already defined a flow control shape (or some other means) to ensure that the add to cache shape receives single-record payloads.
Step 1 Drop an add to cache shape into your process flow, where required.
Step 2 In the add to cache shape settings, choose to create cache:
Step 3 Set the cache level and name as required and save changes.
For more information on these fields please see the add to cache shape page.
Step 4 Select the cache that you just created - for example:
Step 5 Move down to the cache key field and enter the required key. Here, you use standard payload variable syntax to define your target data element:
...where schema notation
should be replaced with the notation path to the first occurrence of the required element in the payload which should be used to form the cache key. If required, you can also include a static prefix or suffix. For example:
The output of the payload variable will be used as the cache key.
Our example uses dynamic payload variables however, you can also use metadata variables and/or flow variables. For more information please see dynamics variables section.
Step 6 Save the add to cache shape settings.
Cached data can be loaded via our load from cache shape. Please refer to the Load from cache shape section for more information.
We've already noted how the add to cache shape can be added to a process flow to cache the entire payload at a given point in the flow. The default behaviour is that when a process flow runs and hits an add to cache shape, any existing data associated with that cache is overwritten with a new payload from the new run.
However, it is possible to append data to a cache, so each time the process flow runs and the add to cache shape is reached, the current cache is appended to the existing cache. This works for any cache type (flow, flow run, and company).
Paginated data. If your connection shape receives paginated data, it's important to understand how the save all pages option works in conjunction with append. For more information please see our cache pagination options page.
Cache size. Theoretically, if a cache is set to append data and then runs on a regular basis indefinitely, the cache size may grow to an unmanageable size. With this in mind, a limit is in place to ensure that a single cache cannot exceed 50MB.
Append data format. Appending cached data is supported for JSON only.
Shared caches. The append to cache operation is not atomic - as such we advise against multiple process flows attempting to update the same cache at the same time.
To use the append option, follow the steps below.
Step 1 Drop an add to cache shape into your process flow in the normal way - create your cache, then select it and add your cache key.
Step 2 Ensure that the save all pages option is set as needed. For more information about how this option affects appended data please see our cache pagination options page.
Step 3 Enable the append option:
Step 4 A path to append to field is displayed:
Here, you need to consider the structure of the payload that you're passing in and specify a path that ensures that each new payload is appended in the right place.
If required, flow variables can be specified here.
Step 5 Save the shape. Next time the process flow runs the data will be cached and appended.
If you choose to view the payload for an add to cache shape, the payload will always show data from the latest run - for example:
However, when you add a load from cache shape, the payload will show ALL appended data so far - for example:
The de-dupe shape can be used to handle duplicate records found in incoming payloads. It can be used in three behaviour modes:
Filter. Filters out duplicated data so only new data continues through the flow.
Track. Tracks new data but does not check for duplicated data.
Filter & track. Filters out duplicated data and also tracks new data.
A process flow might include a single de-dupe shape set to one of these modes (e.g. filter & track), or multiple de-dupe shapes at different points in a flow, with different behaviours.
Tracked de-dupe data is retained for 90 days after it's added to a data pool.
The de-dupe shape is not atomic - as such we advise against multiple process flows attempting to update the same data pool at the same time.
The de-dupe shape works with incoming payloads from a connection shape, and also from a manual payload, API call, or webhook.
JSON and XML payloads are supported.
The de-dupe shape is configured with a behaviour, a data pool, and a key:
As noted previously, the de-dupe shape can be used in three modes, which are summarised below.
Filter
Remove duplicate data from the incoming payload so only new data continues through the flow. New data is NOT tracked.
Track
Log each new key value received in the data pool.
Filter & track
Remove duplicate data from the incoming payload AND log each new key value received.
Data pools are created in general settings and are used to organise de-dupe data. Once a data pool has been created it becomes available for selection when configuring a de-dupe shape for a process flow.
When data passes through a de-dupe shape which is set for tracked behaviour, the value associated with the key field for each new record is logged in the data pool. So, the data pool will contain all unique key field values that have passed through the shape.
You can have multiple de-dupe shapes (either in the same process flow or in different process flows) sharing the same data pool. Typically, you would create one data pool for each entity type that you are processing. For example, if you are syncing orders via an 'orders' endpoint and products via a 'products' endpoint, you'd create two data pools - one for orders and another for products.
Tracked de-dupe data is retained for 90 days after it's added to a data pool.
The key field
is the data field that should be used to match records. This would typically be some sort of id
that uniquely identifies payload records - for example, an order id
if you're processing orders, a customer id
if you're processing customer data, etc.
When duplicate data is identified it is removed from the payload however, exactly what gets removed depends on the configured key field
.
If your given key field is a top-level field for a simple payload, the entire record will be removed. However, if the payload structure is more complex and the key field is within an array, then duplicates will be removed from that array but the parent record will remain.
Let's look at a couple of examples.
The de-dupe shape supports JSON and XML payloads.
The de-dupe shape is used to identify and then remove duplicate entries from an incoming payload. For more background information please see our De-dupe shape page.
Tracked de-dupe data is retained for 90 days after it's added to a data pool.
Currently, the de-dupe shape supports JSON payloads.
To add and configure a new de-dupe shape, follow the steps below.
Step 1 In your process flow, add the de-dupe shape in the usual way:
Step 2 Select a source integration and endpoint to determine where the incoming payload to be de-duped originates - for example:
If your incoming data is via manual payload, API request, or webhook then you can remove any default source instance and endpoint selections:
Step 3 Move down to the behaviour field and select the required option.
For more information about these options please see our De-dupe shape behaviour section.
Step 4 Move down to the data pool field and select the required data pool.
If necessary, you can create a data pool 'on the fly' using the create data pool option. For more information please see Adding a new data pool via the de-dupe shape.
Step 5 In the key field, select/enter the data field to be used for matching duplicate records. How you do this depends on how the incoming data is being received - please see the options below:
The selection that you make here determines how the payload is adjusted when duplicate data is removed. For more information please see How duplicate data is handled.
Step 5 Select the payload format:
Step 6 Save the shape.
Place a shape immediately before the add to cache shape and configure it to create batches of 1 at the appropriate level for your data.
cache: customerData cache key: customer-1000000001
cache: customerData cache key: customer-1000000002
cache: customerData cache key: customer-1000000003
cache: customerData cache key: customer-1000000004