Skip to main content

Deduplicate Brick

PurposeLearn about the various configurations and parameters required for the Deduplicate Brick to function correctly.
Last updatedAugust 05, 2024

What is the Deduplicate Brick?

As its name suggests, the Deduplicate Brick specializes in eliminating duplicate incoming data. This functionality ensures that only unique data is processed, enhancing efficiency and preventing redundant operations.


Deduplicate Brick Parameters

  1. Label: Allows the user to assign a name to the brick.

  2. Key: Describes the input data to be deduplicated, in other words, which fields are to be checked for duplicity.

    • Examples
      • %python.group
      • string!(%python.title) + (string(%python.website) ?? "")
      • .href
      • .ip

The syntax used for this type of Brick are VRL and Jinja2.

for more information on VRL and Jinja2 syntax, follow the links below:

Learn more about VRL

Learn more about Jinja2

  1. Period: For how long to take messages into consideration,

    • Example
      • 1y: Will only take into consideration messages from 1 year prior and no more than that.
  2. Lookback: How many messages should the Brick take into consideration.

    • Example
      • 1000: Will only take into consideration 1000 message sprior and no more than that.

Deduplicate Brick output format

In this section of the documentation the user will learn where are the outputs of the brick stored and its format.

The output variables may vary depending on the input brick providing the data, here are some examples:

  1. Name of variable: python
    • Format:
      • from (String): The sender's information, including the name and email address.
      • to (List): List of recipient email addresses, each represented as a string.
      • subject (String): The subject of the email, which contains text and may include special characters or emojis.
      • message_id (String): A unique identifier for the email message.
      • content-type (String): The MIME type of the email content and the boundary used to separate different parts of the email.
      • headers (List): A list of header objects, each representing an email header.
      • parts (List): A list of parts, each representing a section of the email content.