GetScrapingParams

Overview

The GetScrapingParams object has the following structure:

url (required)
method (required)
response_type (optional)
body (optional)
js_rendering_options (optional)
- render_js
- wait_millis
- wait_for_request
- wait_for_selector
- intercept_request
  - intercepted_url_regex
  - intercepted_url_method
  - request_number
  - return_json
- programmable_browser
  - actions
    - type
    - selector
    - javascript
    - wait_millis
cookies (optional)
headers (optional)
omit_default_headers (optional)
use_isp_proxy (optional)
use_residential_proxy (optional)
use_mobile_proxy (optional)
use_own_proxy (optional)
retry_config (optional)
- num_retries
- success_status_codes
- success_selector
timeout_millis (optional)

Parameters

Required Parameters

url: string
- The URL to scrape - should include http:// or https://
method: 'GET' | 'POST'
- The method to use when requesting this URL
- Can be GET or POST

Optional Parameters

response_type: 'json' | 'buffer' | 'text'
- The expected response type
- Default: 'text'
body: string
- The payload to include in a post request
- Only used when method = 'POST'
js_rendering_options: object
- When defined, your GetScraping deployment will route the request through a browser with the ability to render JavaScript and perform certain actions on the webpage
- Properties:
  - render_js: boolean
    - Whether to render JavaScript or not
    - Default: false
  - wait_millis: number
    - The time in milliseconds to wait before returning the result
    - Only valid when render_js is true
  - wait_for_request: string
    - The URL (or regex matching the URL) that needs to be requested on page load before returning the response
    - Only valid when render_js is true
  - wait_for_selector: string
    - CSS or XPATH selector that needs to be present before returning the response
    - Only valid when render_js is true
  - intercept_request: object
    - Causes the API to return the response from the request specified by intercepted_url_regex
    - Only valid when render_js is true
    - Properties:
      - intercepted_url_regex: string (required)
        
        The regex matching the URL to be intercepted - be as specific as possible
      - intercepted_url_method: 'GET' | 'POST' | 'PUT'
        
        The method of the request to intercept
        
        Default: 'GET'
      - request_number: number
        
        If the URL regex will match multiple requests when loading a page, define the request number to return the correct response
        
        Default: 1 (returning the first request)
      - return_json: boolean
        
        True if the response should be parsed and returned as JSON
        
        Default: false
  - programmable_browser: object
    - Configuration for the programmable browser
    - Properties:
      - actions: array of objects
        
        The actions to perform on the page
        
        Each action object has the following properties:
        
        type: 'click' | 'hover' | 'wait_for_selector' | 'wait_millis' | 'scroll' | 'execute_js'
        
        selector: string (optional)
        
        The selector that triggers the action, and the target of the action if the type is click, hover, or scroll
        
        javascript: string (optional)
        
        The JavaScript to execute
        
        wait_millis: number (optional)
        
        The amount of time to wait for the action to complete
cookies: array of strings
- Define any cookies you need included in your request
- Example: ['SID=1234', 'SUBID=abcd', 'otherCookie=5678']
headers: object
- The headers to attach to the scrape request
- We fill in missing/common headers by default
omit_default_headers: boolean
- Set to true to pass only the headers you define in the scrape request
- Default: false
use_isp_proxy: boolean
- Set to true to route requests through our ISP proxies
- Note: This may incur additional API credit usage
use_residential_proxy: boolean
- Set to true to route requests through our residential proxies
- Note: This may incur additional API credit usage
use_mobile_proxy: boolean
- Set to true to route requests through our mobile proxies
- Note: This may incur additional API credit usage
use_own_proxy: string
- If you'd like to use your own proxy server for this request, include the URL here
- If necessary, include any authentication information in the format: http://${user}:${password}@${proxyUrl}:${proxyPort}
retry_config: object
- Configuration for when and how to retry a request
- Properties:
  - num_retries: number (required)
    - How many times to retry unsuccessful requests
    - Default: 0 (only attempt the request once with no retries)
  - success_status_codes: array of numbers
    - The status codes that will render the request successful
    - Default: [200-399]
  - success_selector: string
    - A CSS selector that needs to be present for a request to be considered successful
    - If both success_status_codes and success_selector are defined, both will need to pass for the request to succeed
timeout_millis: number
- How long to wait for the request to complete in milliseconds before returning a timeout error
- Default: 30 seconds

Other Languages NodeJS