GetScrapingParams
Overview
The GetScrapingParams
object has the following structure:
url
(required)method
(required)response_type
(optional)body
(optional)js_rendering_options
(optional)render_js
wait_millis
wait_for_request
wait_for_selector
intercept_request
intercepted_url_regex
intercepted_url_method
request_number
return_json
programmable_browser
actions
type
selector
javascript
wait_millis
cookies
(optional)headers
(optional)omit_default_headers
(optional)use_isp_proxy
(optional)use_residential_proxy
(optional)use_mobile_proxy
(optional)use_own_proxy
(optional)retry_config
(optional)num_retries
success_status_codes
success_selector
timeout_millis
(optional)
Parameters
Required Parameters
-
url
: string- The URL to scrape - should include http:// or https://
-
method
: 'GET' | 'POST'- The method to use when requesting this URL
- Can be GET or POST
Optional Parameters
-
response_type
: 'json' | 'buffer' | 'text'- The expected response type
- Default: 'text'
-
body
: string- The payload to include in a post request
- Only used when method = 'POST'
-
js_rendering_options
: object- When defined, your GetScraping deployment will route the request through a browser with the ability to render JavaScript and perform certain actions on the webpage
- Properties:
render_js
: boolean- Whether to render JavaScript or not
- Default: false
wait_millis
: number- The time in milliseconds to wait before returning the result
- Only valid when render_js is true
wait_for_request
: string- The URL (or regex matching the URL) that needs to be requested on page load before returning the response
- Only valid when render_js is true
wait_for_selector
: string- CSS or XPATH selector that needs to be present before returning the response
- Only valid when render_js is true
intercept_request
: object- Causes the API to return the response from the request specified by
intercepted_url_regex
- Only valid when render_js is true
- Properties:
intercepted_url_regex
: string (required)- The regex matching the URL to be intercepted - be as specific as possible
intercepted_url_method
: 'GET' | 'POST' | 'PUT'- The method of the request to intercept
- Default: 'GET'
request_number
: number- If the URL regex will match multiple requests when loading a page, define the request number to return the correct response
- Default: 1 (returning the first request)
return_json
: boolean- True if the response should be parsed and returned as JSON
- Default: false
- Causes the API to return the response from the request specified by
programmable_browser
: object- Configuration for the programmable browser
- Properties:
actions
: array of objects- The actions to perform on the page
- Each action object has the following properties:
type
: 'click' | 'hover' | 'wait_for_selector' | 'wait_millis' | 'scroll' | 'execute_js'selector
: string (optional)- The selector that triggers the action, and the target of the action if the type is click, hover, or scroll
javascript
: string (optional)- The JavaScript to execute
wait_millis
: number (optional)- The amount of time to wait for the action to complete
-
cookies
: array of strings- Define any cookies you need included in your request
- Example:
['SID=1234', 'SUBID=abcd', 'otherCookie=5678']
-
headers
: object- The headers to attach to the scrape request
- We fill in missing/common headers by default
-
omit_default_headers
: boolean- Set to true to pass only the headers you define in the scrape request
- Default: false
-
use_isp_proxy
: boolean- Set to true to route requests through our ISP proxies
- Note: This may incur additional API credit usage
-
use_residential_proxy
: boolean- Set to true to route requests through our residential proxies
- Note: This may incur additional API credit usage
-
use_mobile_proxy
: boolean- Set to true to route requests through our mobile proxies
- Note: This may incur additional API credit usage
-
use_own_proxy
: string- If you'd like to use your own proxy server for this request, include the URL here
- If necessary, include any authentication information in the format:
http://${user}:${password}@${proxyUrl}:${proxyPort}
-
retry_config
: object- Configuration for when and how to retry a request
- Properties:
num_retries
: number (required)- How many times to retry unsuccessful requests
- Default: 0 (only attempt the request once with no retries)
success_status_codes
: array of numbers- The status codes that will render the request successful
- Default: [200-399]
success_selector
: string- A CSS selector that needs to be present for a request to be considered successful
- If both success_status_codes and success_selector are defined, both will need to pass for the request to succeed
-
timeout_millis
: number- How long to wait for the request to complete in milliseconds before returning a timeout error
- Default: 30 seconds