docs
Request Options

GetScrapingParams

Overview

The GetScrapingParams object has the following structure:

  • url (required)
  • method (required)
  • response_type (optional)
  • body (optional)
  • js_rendering_options (optional)
    • render_js
    • wait_millis
    • wait_for_request
    • wait_for_selector
    • intercept_request
      • intercepted_url_regex
      • intercepted_url_method
      • request_number
      • return_json
    • programmable_browser
      • actions
        • type
        • selector
        • javascript
        • wait_millis
  • cookies (optional)
  • headers (optional)
  • omit_default_headers (optional)
  • use_isp_proxy (optional)
  • use_residential_proxy (optional)
  • use_mobile_proxy (optional)
  • use_own_proxy (optional)
  • retry_config (optional)
    • num_retries
    • success_status_codes
    • success_selector
  • timeout_millis (optional)

Parameters

Required Parameters

  • url: string

    • The URL to scrape - should include http:// or https://
  • method: 'GET' | 'POST'

    • The method to use when requesting this URL
    • Can be GET or POST

Optional Parameters

  • response_type: 'json' | 'buffer' | 'text'

    • The expected response type
    • Default: 'text'
  • body: string

    • The payload to include in a post request
    • Only used when method = 'POST'
  • js_rendering_options: object

    • When defined, your GetScraping deployment will route the request through a browser with the ability to render JavaScript and perform certain actions on the webpage
    • Properties:
      • render_js: boolean
        • Whether to render JavaScript or not
        • Default: false
      • wait_millis: number
        • The time in milliseconds to wait before returning the result
        • Only valid when render_js is true
      • wait_for_request: string
        • The URL (or regex matching the URL) that needs to be requested on page load before returning the response
        • Only valid when render_js is true
      • wait_for_selector: string
        • CSS or XPATH selector that needs to be present before returning the response
        • Only valid when render_js is true
      • intercept_request: object
        • Causes the API to return the response from the request specified by intercepted_url_regex
        • Only valid when render_js is true
        • Properties:
          • intercepted_url_regex: string (required)
            • The regex matching the URL to be intercepted - be as specific as possible
          • intercepted_url_method: 'GET' | 'POST' | 'PUT'
            • The method of the request to intercept
            • Default: 'GET'
          • request_number: number
            • If the URL regex will match multiple requests when loading a page, define the request number to return the correct response
            • Default: 1 (returning the first request)
          • return_json: boolean
            • True if the response should be parsed and returned as JSON
            • Default: false
      • programmable_browser: object
        • Configuration for the programmable browser
        • Properties:
          • actions: array of objects
            • The actions to perform on the page
            • Each action object has the following properties:
              • type: 'click' | 'hover' | 'wait_for_selector' | 'wait_millis' | 'scroll' | 'execute_js'
              • selector: string (optional)
                • The selector that triggers the action, and the target of the action if the type is click, hover, or scroll
              • javascript: string (optional)
                • The JavaScript to execute
              • wait_millis: number (optional)
                • The amount of time to wait for the action to complete
  • cookies: array of strings

    • Define any cookies you need included in your request
    • Example: ['SID=1234', 'SUBID=abcd', 'otherCookie=5678']
  • headers: object

    • The headers to attach to the scrape request
    • We fill in missing/common headers by default
  • omit_default_headers: boolean

    • Set to true to pass only the headers you define in the scrape request
    • Default: false
  • use_isp_proxy: boolean

    • Set to true to route requests through our ISP proxies
    • Note: This may incur additional API credit usage
  • use_residential_proxy: boolean

    • Set to true to route requests through our residential proxies
    • Note: This may incur additional API credit usage
  • use_mobile_proxy: boolean

    • Set to true to route requests through our mobile proxies
    • Note: This may incur additional API credit usage
  • use_own_proxy: string

    • If you'd like to use your own proxy server for this request, include the URL here
    • If necessary, include any authentication information in the format: http://${user}:${password}@${proxyUrl}:${proxyPort}
  • retry_config: object

    • Configuration for when and how to retry a request
    • Properties:
      • num_retries: number (required)
        • How many times to retry unsuccessful requests
        • Default: 0 (only attempt the request once with no retries)
      • success_status_codes: array of numbers
        • The status codes that will render the request successful
        • Default: [200-399]
      • success_selector: string
        • A CSS selector that needs to be present for a request to be considered successful
        • If both success_status_codes and success_selector are defined, both will need to pass for the request to succeed
  • timeout_millis: number

    • How long to wait for the request to complete in milliseconds before returning a timeout error
    • Default: 30 seconds