GetScrapingParams
Overview
The GetScrapingParams object has the following structure:
url(required)method(required)response_type(optional)body(optional)js_rendering_options(optional)render_jswait_milliswait_for_requestwait_for_selectorintercept_requestintercepted_url_regexintercepted_url_methodrequest_numberreturn_json
programmable_browseractionstypeselectorjavascriptwait_millis
cookies(optional)headers(optional)omit_default_headers(optional)use_isp_proxy(optional)use_residential_proxy(optional)use_mobile_proxy(optional)use_own_proxy(optional)retry_config(optional)num_retriessuccess_status_codessuccess_selector
timeout_millis(optional)
Parameters
Required Parameters
-
url: string- The URL to scrape - should include http:// or https://
-
method: 'GET' | 'POST'- The method to use when requesting this URL
- Can be GET or POST
Optional Parameters
-
response_type: 'json' | 'buffer' | 'text'- The expected response type
- Default: 'text'
-
body: string- The payload to include in a post request
- Only used when method = 'POST'
-
js_rendering_options: object- When defined, your GetScraping deployment will route the request through a browser with the ability to render JavaScript and perform certain actions on the webpage
- Properties:
render_js: boolean- Whether to render JavaScript or not
- Default: false
wait_millis: number- The time in milliseconds to wait before returning the result
- Only valid when render_js is true
wait_for_request: string- The URL (or regex matching the URL) that needs to be requested on page load before returning the response
- Only valid when render_js is true
wait_for_selector: string- CSS or XPATH selector that needs to be present before returning the response
- Only valid when render_js is true
intercept_request: object- Causes the API to return the response from the request specified by
intercepted_url_regex - Only valid when render_js is true
- Properties:
intercepted_url_regex: string (required)- The regex matching the URL to be intercepted - be as specific as possible
intercepted_url_method: 'GET' | 'POST' | 'PUT'- The method of the request to intercept
- Default: 'GET'
request_number: number- If the URL regex will match multiple requests when loading a page, define the request number to return the correct response
- Default: 1 (returning the first request)
return_json: boolean- True if the response should be parsed and returned as JSON
- Default: false
- Causes the API to return the response from the request specified by
programmable_browser: object- Configuration for the programmable browser
- Properties:
actions: array of objects- The actions to perform on the page
- Each action object has the following properties:
type: 'click' | 'hover' | 'wait_for_selector' | 'wait_millis' | 'scroll' | 'execute_js'selector: string (optional)- The selector that triggers the action, and the target of the action if the type is click, hover, or scroll
javascript: string (optional)- The JavaScript to execute
wait_millis: number (optional)- The amount of time to wait for the action to complete
-
cookies: array of strings- Define any cookies you need included in your request
- Example:
['SID=1234', 'SUBID=abcd', 'otherCookie=5678']
-
headers: object- The headers to attach to the scrape request
- We fill in missing/common headers by default
-
omit_default_headers: boolean- Set to true to pass only the headers you define in the scrape request
- Default: false
-
use_isp_proxy: boolean- Set to true to route requests through our ISP proxies
- Note: This may incur additional API credit usage
-
use_residential_proxy: boolean- Set to true to route requests through our residential proxies
- Note: This may incur additional API credit usage
-
use_mobile_proxy: boolean- Set to true to route requests through our mobile proxies
- Note: This may incur additional API credit usage
-
use_own_proxy: string- If you'd like to use your own proxy server for this request, include the URL here
- If necessary, include any authentication information in the format:
http://${user}:${password}@${proxyUrl}:${proxyPort}
-
retry_config: object- Configuration for when and how to retry a request
- Properties:
num_retries: number (required)- How many times to retry unsuccessful requests
- Default: 0 (only attempt the request once with no retries)
success_status_codes: array of numbers- The status codes that will render the request successful
- Default: [200-399]
success_selector: string- A CSS selector that needs to be present for a request to be considered successful
- If both success_status_codes and success_selector are defined, both will need to pass for the request to succeed
-
timeout_millis: number- How long to wait for the request to complete in milliseconds before returning a timeout error
- Default: 30 seconds