Skip to main content

Create Pageshot of URL Brick

PurposeLearn about the various configurations and parameters required for the Deduplicate Brick to function correctly.
Last updatedAugust 05, 2024

What is the Create Pageshot of URL Brick

The Create Pageshot of URL Brick is versatile, allowing us to perform various actions such as capturing screenshots of specific URLs and extracting information from HTML elements on web pages. It's a useful tool for visually documenting web content and retrieving specific data points from online sources.

Create Pageshot of URL Brick configuration supporting video


Pageshot of URL Brick parameters

  1. Use Proxy: A proxy is a mediator server for a client to access a destination server.

Specify whether to use a proxy by entering the proxy address. For example, you can use a SOCKS5 address like socks5://127.0.0.1:9050 for accessing Tor pages.

For more information on Jinja2 syntax follow link below:

Learn more about Jinja2

  1. User Agent: Enter a standard user agent header string. This string helps servers and network peers identify the application, operating system, vendor, and/or version of the requesting user agent.

Follow link to learn more about user agent header string:

Learn more about user agent header string.

For more information on Jinja2 syntax follow link below:

Learn more about Jinja2

  1. Cookies: Configure cookies using the cookie_name=cookie_value command.

For more information on Jinja2 syntax follow link below:

Learn more about Jinja2

  1. Task:

In this section of the Brick is where you configure the task to be taken into consideration, for example:

[
{
"type": "navigate",
"url": "https://dtact.com/"
},
{
"type": "wait-visible",
"selector": ".main-content",
"enabled": true
},
{
"type": "evaluate",
"name": "pageTitle",
"script": "document.title",
"enabled": true
}
]

In this code we can tell the Brick to get to a website URL, wait until an element is displayed, and evaluate a JavaScript expression to display the title of the webpage.

  • Most common tasks:
  1. the process of capturing an image of a desired webpage through a screenshot that has been obtained by the URL. This process is done by taking screenshots of the websites's visual presentation at a specific point on a given day. Nevertheless, this picture of a website, called a "pageshot" or "screenshot", has the purpose of capturing the appearance, text, and design of the website by changing it into an image file that can be kept or shared with others.

  2. 'Navigate': Navigating to a specific URL.

  3. 'wait': This option is used when you're uncertain about the exact name of the HTML element you're waiting for, or if you need to add a delay to avoid overloading the server.

  4. 'duration': Specifies the time to wait.

  5. 'Wait-visible': Argument is the selector parameter; which can be the CSS selector (class or ID) stating the element on the webpage that should be waiting for visibility. This function is responsible for the script that can wait until a needed element happens to be visible and then can process the code further.

  6. 'timeout': Determines how long Pageshot will wait for the action to complete. If it exceeds this time, the task will be canceled.

  7. 'click': This identifies the HTML segment to be clicked, usually a button type, but it can be a specific named button, for instance.

  8. 'or': This option contains two blocks: 'left' and 'right'. Pageshot will first attempt to execute the left action. If it times out or encounters another error, it will then try the right action.

  9. 'send-keys': Used to send information to an HTML element, such as filling in an email/password.

  10. 'Evaluate': One of the functions of this argument involves two commands. To begin with, the "name" field is the output field that is used for the results of the operation in the output. Next, script parameter can be used to put some JavaScript code which should be run on the webpage. This evaluation process generates output and then stores the output value in the specified field for further processing.

For more information on Jinja2 syntax follow link below:

Learn more about Jinja2

Pageshot of URL Brick output format

In this section of the documentation the user will learn where are the outputs of the brick stored and its format.

The output variables may vary depending on the input brick providing the data, here are some examples:

  1. Name of variable: pageshot

    • Format:
      • actions (List): List of actions taken.
      • console (Null): Console data, which is null in this case.
      • cookies (List): List of cookies, which is empty in this case.
      • createdAt (String): The timestamp when the data was created, in ISO 8601 format.
      • data (Dictionary): Contains request data.
      • meta (Dictionary): Metadata, which is empty in this case.
      • page (Dictionary): Contains page details.
      • pageshotId (String): The identifier for the page screenshot.
      • results (Dictionary): Contains the analysis results.
  2. Name of variable: python

    • Format:
      • from (String): The sender's email address.
      • to (List): List of recipient email addresses.
      • subject (String): The subject of the email.
      • message_id (String): The unique message identifier.
      • content-type (String): The content type of the email.
      • headers (List): List of email headers.
      • parts (List): List of parts of the email.
      • html_body (String): The HTML body content of the email.