Want to use Zenscrape API for web scraping? So you have to know how to use Zenscrape API. This article will give you a detailed guide. Let's go!
Documentation
Pro Tip: Register your free apikey here and all code snippets below will contain your private apikey. If you have already registered, login before viewing the documentation.
Postman Collection
To provide you with the best developer experience possible, we also created a Postman collection covering all of our endpoints, including plenty of examples. Run in Postman.
Credit Costs & Failed Requests
The number of credits that is counted towards your quota depends on the type request configuration that and the status code that the API endpoint returns. Hence, a request can cost between 1 and 25 credits. You can configure your request with our request builder inside your dashboard. It generates code snippets for the most common programming languages. You can find the list of our error codes 这里.
premium |
render |
Cost in credits |
---|---|---|
false | false | 1 |
false | 真 | 5 |
真 | false | 10 |
真 | 真 | 25 |
Basic Usage
This endpoint allows you to fetch the content of a website. For basic usage, only one parameter is required in addition to your apikey.
GET POST /get
Zenscrape adding the url
parameter to your request will fetch the HTML content from the target website. This request configuration will use standard proxies and will count as 1 credit towards your monthly limit.
卷曲 "https://app.zenscrape.com/api/v1/get?url=http://httpbin.org/ip" \
-H "apikey: YOUR-APIKEY"
will generate the following response:
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{
"origin": "80.102.66.13"
}
</pre>
</body>
</html>
Web Scraping API
GET POST /get
See Demo Response:
<html>
<head></head>
<body>
<pre>
{
"origin": "223.233.44.142"
}
</pre>
</body>
</html>
This endpoint accepts the following parameters:
Parameter | 类型 | Description |
---|---|---|
url | required | target site you want to scrape |
premium | optional, boolean, counts as 20 credits towards your quota | Uses residential proxies, unlocks sites that are hard to scrape |
location | optional, default: worldwide | 如果 premium=false possible locations are ‘na' (North America) and ‘eu' (Europe)如果 premium=true you can choose a location from our list of 230+ countries |
keep_headers | optional, boolean | Allows to pass through forward headers (e.g. user agents, cookies) |
device_type | optional, boolean | By default, a desktop user agent is set. When set to ‘mobile', it will be set to an iPhone or Android user agent |
render | optional, boolean, counts as 5 credits towards your quota | Use a headless browser to fetch content that relies on javascript |
wait_for | optional, integer | Max value: 15, only works together with render=true amount of seconds that a browser waits for content to render before it scrapes the HTML markup |
wait_for_css | optional, integer | Only works together with render=true , waits until the css-selector becomes visible |
session | optional, string | a random string if you want to reuse an IP, for example session=kdQ1VeQE |
scroll_to_bottom | optional, boolean | Only works together with render=true , scrolls to bottom of page before returning the page content |
Zenscrape is a REST-API and accepts HTTP requests through any programming language. The following example connects to the url https://httpbin.org/ip through a proxy and renders the content inside a browser, before it returns the markup to you.
https://app.zenscrape.com/api/v1/get?apikey=YOUR-APIKEY&url=https%3A%2F%2Fhttpbin.org%2Fip&premium=true&country=de&render=true
卷曲 "https://app.zenscrape.com/api/v1/get?apikey=YOUR-APIKEY&url=https%3A%2F%2Fhttpbin.org%2Fip&premium=true&country=de&render=true"
import requests headers = { "apikey": "YOUR-APIKEY"} params = ( ("url","https://httpbin.org/ip"), ("premium","true"), ("country","de"), ("render","true"), ); response = requests.get('https://app.zenscrape.com/api/v1/get', headers=headers, params=params); print(response.text)
变异 request = require('request'); 变异 options = { url: 'https://app.zenscrape.com/api/v1/get?apikey=YOUR-APIKEY&url=https://httpbin.org/ip&premium=true&country=de&render=true' }; function callback(error, response, body) { 如果 (!error && response.statusCode == 200) { console.log(body); } } request(options, callback);
$ch = curl_init(); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 真); curl_setopt($ch, CURLOPT_HEADER, false); $data = [ "url" => "https://httpbin.org/ip", "premium" => "true", "country" => "de", "render" => "true", ]; curl_setopt($ch, CURLOPT_URL, "https://app.zenscrape.com/api/v1/get?" . http_build_query($data)); curl_setopt($ch, CURLOPT_HTTPHEADER, array( "内容类型:应用程序/json", "apikey: YOUR-APIKEY", )); $response = curl_exec($ch); curl_close($ch); $json = json_decode($response); var_dump($json);
Proxy Mode
GET POST
See Demo Response:
will generate the following response:
<html>
<head></head>
<body>
<pre>
{
"origin": "223.233.44.142"
}
</pre>
</body> </html>
In addition to the REST API Zenscrape also provides an HTTP proxy interface. You can integrate any application that already relies on proxies. Simply use your API key as the username and use any parameters you usually supply as the password.
The HTTP proxy will return HTTP/1.1 407 Proxy Authentication Required
in case your credentials are invalid.
curl -k -x "http://YOUR-APIKEY:[email protected]:8282" https://quotes.toscrape.com/js
import requests proxy = { "http": "http://YOUR-APIKEY:[email protected]:8282", "https": "http://YOUR-APIKEY:[email protected]:8282" } response = requests.get('https://quotes.toscrape.com/js', proxies=proxy, verify=False); print(response.text)
$ch = curl_init(); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 真); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); curl_setopt($ch, CURLOPT_URL, "https://quotes.toscrape.com/js"); curl_setopt($ch, CURLOPT_PROXY, "proxy-server.zenscrape.com:8282"); curl_setopt($ch, CURLOPT_PROXYUSERPWD, "YOUR-APIKEY:render=true&wait_for_css=.author"); $response = curl_exec($ch); curl_close($ch); var_dump($response);
Premium Proxy Location List
The following list of locations can be used for the location
parameter, if premium
is set to 真
.
Authentification & Apikey Information
Zenscrape uses API keys to allow access to the API. You can register a new API key at our developer portal. You can register a new API key at our developer portal. The /status
route returns the number of left credits.
To authorize, you can use the following ways:
GET POST /status
Zenscrape looks for the API key in a header that looks like the following (recommended, works with all requests):
卷曲 "https://app.zenscrape.com/api/v1/status" \ -H "apikey: YOUR-APIKEY"
或
卷曲 "https://app.zenscrape.com/api/v1/status?apikey=YOUR-APIKEY"
或
卷曲 "https://app.zenscrape.com/api/v1/status -F "apikey=YOUR-APIKEY"
Error Codes
The Zenscrape API uses the following error codes:
HTTP Error Code | Meaning |
---|---|
403 | Forbidden — API key is wrong, you don't have enough credits or you don't have enough rights to access it. |
404 | Not Found — There were no results found. |
429 | Too many requests — You have reached the limit for concurrency. Please wait or upgrade |
500 | Internal Server Error |
The API returns errors in this template:
{
"errors": [{
"url": "missing"
}]
}
Common Use Cases
Using Premium Proxies
Zenscrape offers a large pool of premium proxies are the preferred choice when scraping sites that are difficult to scrape. In order to utilize the pool simply set premium=true
. In addition, you may specify a location, using the location parameter. We have chosen ‘se' (Sweden) for this example. You can see a list of all supported locations 这里.
https://app.zenscrape.com/api/v1/get?apikey=YOUR-APIKEY&url=https%3A%2F%2Fhttpbin.org%2Fip&premium=true&country=se
卷曲 "https://app.zenscrape.com/api/v1/get?apikey=YOUR-APIKEY&url=https%3A%2F%2Fhttpbin.org%2Fip&premium=true&country=se"
import requests headers = { "apikey": "YOUR-APIKEY"} params = ( ("url","https://httpbin.org/ip"), ("premium","true"), ("country","se"), ); response = requests.get('https://app.zenscrape.com/api/v1/get', headers=headers, params=params); print(response.text)
变异 request = require('request'); 变异 options = { url: 'https://app.zenscrape.com/api/v1/get?apikey=YOUR-APIKEY&url=https://httpbin.org/ip&premium=true&country=se' }; function callback(error, response, body) { 如果 (!error && response.statusCode == 200) { console.log(body); } } request(options, callback);
$ch = curl_init(); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 真); curl_setopt($ch, CURLOPT_HEADER, false); $data = [ "url" => "https://httpbin.org/ip", "premium" => "true", "country" => "se", ]; curl_setopt($ch, CURLOPT_URL, "https://app.zenscrape.com/api/v1/get?" . http_build_query($data)); curl_setopt($ch, CURLOPT_HTTPHEADER, array( "内容类型:应用程序/json", "apikey: YOUR-APIKEY", )); $response = curl_exec($ch); curl_close($ch); $json = json_decode($response); var_dump($json);
Setting a Custom Header
Setting a custom header to avoid being blocked is not necessary, since we manage headers on our end. If you still want to set a custom header, you can do so by setting keep_headers=true
. In this example we set a custom user-agent.
https://app.zenscrape.com/api/v1/get?apikey=YOUR-APIKEY&url=https%3A%2F%2Fhttpbin.org%2Fheaders&keep_headers=true&country=us
curl -H "User-Agent: 123" \ "https://app.zenscrape.com/api/v1/get?apikey=YOUR-APIKEY&url=https%3A%2F%2Fhttpbin.org%2Fheaders&keep_headers=true&country=us"
import requests headers = { "apikey": "YOUR-APIKEY", "User-Agent": "123" } params = ( ("url","https://httpbin.org/headers"), ("keep_headers","true"), ("country","us"), ); response = requests.get('https://app.zenscrape.com/api/v1/get', headers=headers, params=params); print(response.text)
变异 request = require('request'); 变异 headers = { 'User-Agent': '123' }; 变异 options = { url: 'https://app.zenscrape.com/api/v1/get?apikey=YOUR-APIKEY&url=https://httpbin.org/headers&keep_headers=true&country=us', headers: headers }; function callback(error, response, body) { 如果 (!error && response.statusCode == 200) { console.log(body); } } request(options, callback);
$ch = curl_init(); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 真); curl_setopt($ch, CURLOPT_HEADER, false); $data = [ "url" => "https://httpbin.org/headers", "keep_headers" => "true", "country" => "us", ]; curl_setopt($ch, CURLOPT_URL, "https://app.zenscrape.com/api/v1/get?" . http_build_query($data)); curl_setopt($ch, CURLOPT_HTTPHEADER, array( "内容类型:应用程序/json", "apikey: YOUR-APIKEY", "User-Agent: 123" )); $response = curl_exec($ch); curl_close($ch); $json = json_decode($response); var_dump($json);
Enable JS Rendering
A lot of websites use front-end frameworks like vue, react etc. In order to extract components that require javascript, please set render=true
.
https://app.zenscrape.com/api/v1/get?apikey=YOUR-APIKEY&url=https%3A%2F%2Fhttpbin.org%2Fheaders&keep_headers=true&country=us
curl -H "User-Agent: 123" \ "https://app.zenscrape.com/api/v1/get?apikey=YOUR-APIKEY&url=https%3A%2F%2Fhttpbin.org%2Fheaders&keep_headers=true&country=us"
import requests headers = { "apikey": "YOUR-APIKEY", "User-Agent": "123" } params = ( ("url","https://httpbin.org/headers"), ("keep_headers","true"), ("country","us"), ); response = requests.get('https://app.zenscrape.com/api/v1/get', headers=headers, params=params); print(response.text)
变异 request = require('request'); 变异 headers = { 'User-Agent': '123' }; 变异 options = { url: 'https://app.zenscrape.com/api/v1/get?apikey=YOUR-APIKEY&url=https://httpbin.org/headers&keep_headers=true&country=us', headers: headers }; function callback(error, response, body) { 如果 (!error && response.statusCode == 200) { console.log(body); } } request(options, callback);
$ch = curl_init(); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 真); curl_setopt($ch, CURLOPT_HEADER, false); $data = [ "url" => "https://httpbin.org/headers", "keep_headers" => "true", "country" => "us", ]; curl_setopt($ch, CURLOPT_URL, "https://app.zenscrape.com/api/v1/get?" . http_build_query($data)); curl_setopt($ch, CURLOPT_HTTPHEADER, array( "内容类型:应用程序/json", "apikey: YOUR-APIKEY", "User-Agent: 123" )); $response = curl_exec($ch); curl_close($ch); $json = json_decode($response); var_dump($json);
Getting around Cloudflare DDoS Protection
Quite a few websites that are offering interesting content have imposed cloudflare DDoS protection. Zenscrape automatically detects when cloudlare DDoS protection appears and returns the page content once the protection layer has disappeared. Hence, cloudflare DDoS protection is handled automatically and does not require any action from your end.
Blocking Particular Resources
In order to increase speed or to supress a certain page behaviour, it can be useful to block certain resources from loading. In the following example we have decided to block stylesheets
, image
and other media
from loading. Keep in mind that block_resources
only works in combination with render=true
.
https://app.zenscrape.com/api/v1/get?apikey=YOUR-APIKEY&url=https%3A%2F%2Fquotes.toscrape.com%2Fjs&render=true&block_resources=stylesheet%2Cimage%2Cmedia
卷曲 "https://app.zenscrape.com/api/v1/get?apikey=YOUR-APIKEY&url=https%3A%2F%2Fquotes.toscrape.com%2Fjs&render=true&block_resources=stylesheet%2Cimage%2Cmedia"
import requests headers = { "apikey": "YOUR-APIKEY"} params = ( ("url","https://quotes.toscrape.com/js"), ("render","true"), ("block_resources","stylesheet,image,media"), ); response = requests.get('https://app.zenscrape.com/api/v1/get', headers=headers, params=params); print(response.text)
变异 request = require('request'); 变异 options = { url: 'https://app.zenscrape.com/api/v1/get?apikey=YOUR-APIKEY&url=https://quotes.toscrape.com/js&render=true&block_resources=stylesheet,image,media' }; function callback(error, response, body) { 如果 (!error && response.statusCode == 200) { console.log(body); } } request(options, callback);
$ch = curl_init(); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 真); curl_setopt($ch, CURLOPT_HEADER, false); $data = [ "url" => "https://quotes.toscrape.com/js", "render" => "true", "block_resources" => "stylesheet,image,media", ]; curl_setopt($ch, CURLOPT_URL, "https://app.zenscrape.com/api/v1/get?" . http_build_query($data)); curl_setopt($ch, CURLOPT_HTTPHEADER, array( "内容类型:应用程序/json", "apikey: YOUR-APIKEY", )); $response = curl_exec($ch); curl_close($ch); $json = json_decode($response); var_dump($json);
Setting a Cookie
Cookies can also be passed to the request using keep_headers=true
. The header then simply needs to contain the cookie name and value.
https://app.zenscrape.com/api/v1/get?apikey=YOUR-APIKEY&url=https%3A%2F%2Fquotes.toscrape.com%2Fcookies&keep_headers=true
curl -H "Cookie: SESSIONID=27382738" \ "https://app.zenscrape.com/api/v1/get?apikey=YOUR-APIKEY&url=https%3A%2F%2Fquotes.toscrape.com%2Fcookies&keep_headers=true"
import requests headers = { "apikey": "YOUR-APIKEY", "Cookie": "SESSIONID=27382738" } params = ( ("url","https://quotes.toscrape.com/cookies"), ("keep_headers","true"), ); response = requests.get('https://app.zenscrape.com/api/v1/get', headers=headers, params=params); print(response.text)
变异 request = require('request');, 变异 headers = { 'User-Agent': '123', 'Cookie': 'SESSIONID=27382738' }; 变异 options = { url: 'https://app.zenscrape.com/api/v1/get?apikey=YOUR-APIKEY&url=https://quotes.toscrape.com/cookies&keep_headers=true' }; function callback(error, response, body) { 如果 (!error && response.statusCode == 200) { console.log(body); } } request(options, callback);
$ch = curl_init(); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 真); curl_setopt($ch, CURLOPT_HEADER, false); $data = [ "url" => "https://quotes.toscrape.com/cookies", "keep_headers" => "true", ]; curl_setopt($ch, CURLOPT_URL, "https://app.zenscrape.com/api/v1/get?" . http_build_query($data)); curl_setopt($ch, CURLOPT_HTTPHEADER, array( "内容类型:应用程序/json", "apikey: YOUR-APIKEY",, "Cookie: SESSIONID=27382738" )); $response = curl_exec($ch); curl_close($ch); $json = json_decode($response); var_dump($json);
参考资料
免责声明:这部分内容主要来自商家。如果商家不希望在我的网站上显示,请 联系我们 删除您的内容。
最后更新于 5 月 15, 2022