How to Use ScrapingRobot API

如何使用 Scraping Robot API?购买前学习手册

Want to buy Scraping Robot API for web scraping? Before that, you have to know how to use it. This article will give you a detailed guide.

Scraping Robot API

Basic Usage

This page will tell you about Scraping Robot's basic functionality

The Scraping Robot API exposes a single API endpoint. Simply send an http-request to https://api.scrapingrobot.com with your API key, passed as a query-parameter, and get the needed data.

All task-parameters are passed in a POST-body as a JSON-object.

POST Sample Code

https://api.scrapingrobot.com?token=<YOUR_SR_TOKEN>

Sample Code

{  
  "url": "https://www.scrapingrobot.com/",  
  "module": "HtmlChromeScraper" 
}

认证

Every request to the Scraping Robot API should contain an access token. If a user does not pass the token or it is invalid – Scraping Robot will respond with status 401 (Unauthorized).

The token can be passed in two ways:

  • As a query-parameter:
    • ?token=
  • In the Authorization header (bearer authentication):
    • Authorization: Bearer

Note: To receive your Scraping Robot API access token you must register an account at https://dashboard.scrapingrobot.com/sign-up.


Error Handling

Authorization Parameter Missing

If you forget to include your API key from your Scraping Robot dashboard as the “token” path parameter in your API call, we'll show you the error message on the right side of the screen.

Authorization Parameter Missing

{
   "status": "FAIL",
   "date": "Thu, 01 Mar 2020 10:00:00 GMT",
   "error": "Token query parameter not found"
}

To rectify this error, please include your API key from your Scraping Robot dashboard as the “token” path parameter in your API call, so that you can be authorized to make the request to our system.


Invalid Authorization Parameter

If you put an invalid API key as the “token” path parameter in your API call, we will show you the error message on the right side of the screen.

Invalid Authorization Parameter

{
   "status": "FAIL",
   "date": "Thu, 01 Mar 2020 10:00:00 GMT",
   "error": "Invalid client token"
}

To rectify this error, please include your API key from your Scraping Robot dashboard as the “token” path parameter in your API call, so that you can be authorized to make the request to our system.


Not Enough Credits

If you do not have enough credits in your account to perform the scraping task you're requesting, we will show you the error message on the right side of the screen.

Not Enough Credits

{
  "status": "FAIL",
  "date": "Thu, 01 Mar 2020 10:00:00 GMT",
  "error": "You do not have enough credits"
}

To rectify this error, please add scraping credits to your account through your Scraping Robot dashboard or contact our support team to get credits added to your account.


Request Body is not Valid JSON

If you send in a request that does not have valid JSON, we will show you the error message on the right side of the screen.

Invalid JSON includes anything that is not in JSON format and does not have the structure of JSON. This is different than the “Invalid Request Body” error explained next.

Request Body is not Valid JSON

{
  "status": "FAIL",
  "date":"Thu, 01 Mar 2020 10:00:00 GMT",
  "error": "Request-body is not a valid JSON"
}

To rectify this error, please fix the JSON in the request body and try your request again. You can find JSON validators online to help you with getting your JSON correctly formatted.


Invalid Request Body

If you send in a JSON body, but the body parameters are incorrect or out of order, we will show you the error message on the right side of the screen.

Invalid Request Body

{ 
  "status": "FAIL", 
  "date":"Thu, 01 Mar 2020 10:00:00 GMT", 
  "error": "Invalid json format: ..." 
}

To rectify this error, please review the API documentation for the request you're sending in to ensure all parameters are correct. If you cannot resolve the issue on your own, please contact our support team for assistance and we'll gladly help out!


Service Overloaded

If our service is overloaded at the time of your request, we will show you the error message on the right side of the screen.

Service Overloaded

{ 
  "status": "FAIL", 
  "date":"Thu, 01 Mar 2020 10:00:00 GMT", 
  "error": "The service is overloaded. Try again later" 
}

To rectify this error, please contact our support team for updates on the situation.


Page Load Failure

If Scraping Robot is unable to load the page you're trying to scrape, we will show you the error message on the right side of the screen.

Page Load Failure

{ 
  "status": "FAIL", 
  "date":"Thu, 01 Mar 2020 10:00:00 GMT", 
  "error": "Please retry this task again later" 
}

To rectify this error, please contact our support team for assistance.


Internal Error

If there is an issue with our system, we will show you the error message on the right side of the screen.

Internal Error

{ 
  "error": "Internal server error", 
  "date": "Thu, 01 Mar 2020 10:00:00 GMT", 
  "status": "FAIL" 
}

To rectify this error, please contact our support team for assistance.


Credits

Get Credit Balance

获取

cURLNodePHPPythonJavaScriptC++

要求

curl --request GET \
--url https://api.scrapingrobot.com/balance \
--header 'Accept: application/json'

要求

const sdk = require('api')('@staging-docs-scrapingrobot/v1.0#1b382ykufox5uf');

sdk.get('/balance')
.then(res => console.log(res))
.catch(err => console.error(err));

要求

 
<?php
require_once('vendor/autoload.php');

$client = new \GuzzleHttp\Client();

$response = $client->request('GET', 'https://api.scrapingrobot.com/balance', [
'headers' => [
'Accept' => 'application/json',
],
]);

echo $response->getBody();

要求

import requests

url = "https://api.scrapingrobot.com/balance"

headers = {"Accept": "application/json"}

response = requests.request("GET", url, headers=headers)

print(response.text)

要求

const options = {method: 'GET', headers: {Accept: 'application/json'}};

fetch('https://api.scrapingrobot.com/balance', options)
.then(response => response.json())
.then(response => console.log(response))
.catch(err => console.error(err));

要求

CURL *hnd = curl_easy_init();

curl_easy_setopt(hnd, CURLOPT_CUSTOMREQUEST, "GET");
curl_easy_setopt(hnd, CURLOPT_URL, "https://api.scrapingrobot.com/balance");

struct curl_slist *headers = NULL;
headers = curl_slist_append(headers, "Accept: application/json");
curl_easy_setopt(hnd, CURLOPT_HTTPHEADER, headers);

CURLcode ret = curl_easy_perform(hnd);
[/su_tab]

Get Credit Usage Statistics

This endpoint will show you your credit usage over the specified time period.

cURLNodePHPPython选项卡标题C++

要求

curl --request GET \
--url 'https://api.scrapingrobot.com/balance?token=<YOUR_SR_TOKEN>' \
--header 'Accept: application/json'

要求

const fetch = require('node-fetch');
const url = 'https://api.scrapingrobot.com/balance?token=';
const options = {method: 'GET', headers: {Accept: 'application/json'}};
fetch(url, options)
  .then(res => res.json())
  .then(json => console.log(json))
  .catch(err => console.error('error:' + err));

要求

<?php
require_once('vendor/autoload.php');

$client = new \GuzzleHttp\Client();

$response = $client->request('GET', 'https://api.scrapingrobot.com/balance?token=<YOUR_SR_TOKEN>', [
'headers' => [
'Accept' => 'application/json',
],
]);

echo $response->getBody(); 

要求

import requests

url = "https://api.scrapingrobot.com/balance?token=<YOUR_SR_TOKEN>"

headers = {"Accept": "application/json"}

response = requests.request("GET", url, headers=headers)

print(response.text)

要求

const options = {method: 'GET', headers: {Accept: 'application/json'}};

fetch('https://api.scrapingrobot.com/balance?token=<YOUR_SR_TOKEN>', options)
.then(response => response.json())
.then(response => console.log(response))
.catch(err => console.error(err));

要求

CURL *hnd = curl_easy_init();

curl_easy_setopt(hnd, CURLOPT_CUSTOMREQUEST, "GET");
curl_easy_setopt(hnd, CURLOPT_URL, "https://api.scrapingrobot.com/balance?token=<YOUR_SR_TOKEN>");

struct curl_slist *headers = NULL;
headers = curl_slist_append(headers, "Accept: application/json");
curl_easy_setopt(hnd, CURLOPT_HTTPHEADER, headers);

CURLcode ret = curl_easy_perform(hnd);
[/su_tab]

Html API

Create a task using POST-request

/

Create scraping task
cURLNodePHP PythonJavaScriptC++

要求

curl --request GET \
--url 'https://api.scrapingrobot.com/stats?token=<YOUR_SR_TOKEN>&type=daily' \
--header 'Accept: application/json'

要求

const fetch = require('node-fetch');

const url = 'https://api.scrapingrobot.com/stats?token=<YOUR_SR_TOKEN>&type=daily';
const options = {method: 'GET', headers: {Accept: 'application/json'}};

fetch(url, options)
.then(res => res.json())
.then(json => console.log(json))
.catch(err => console.error('error:' + err));

要求

<?php
require_once('vendor/autoload.php');

$client = new \GuzzleHttp\Client();

$response = $client->request('GET', 'https://api.scrapingrobot.com/stats?token=<YOUR_SR_TOKEN>&type=daily', [
'headers' => [
'Accept' => 'application/json',
],
]);

echo $response->getBody();

要求

import requests

url = "https://api.scrapingrobot.com/stats?token=<YOUR_SR_TOKEN>&type=daily"

headers = {"Accept": "application/json"}

response = requests.request("GET", url, headers=headers)

print(response.text)

要求

const options = {method: 'GET', headers: {Accept: 'application/json'}};

fetch('https://api.scrapingrobot.com/stats?token=<YOUR_SR_TOKEN>&type=daily', options)
.then(response => response.json())
.then(response => console.log(response))
.catch(err => console.error(err));

要求

CURL *hnd = curl_easy_init();

curl_easy_setopt(hnd, CURLOPT_CUSTOMREQUEST, "GET");
curl_easy_setopt(hnd, CURLOPT_URL, "https://api.scrapingrobot.com/stats?token=<YOUR_SR_TOKEN>&type=daily");

struct curl_slist *headers = NULL;
headers = curl_slist_append(headers, "Accept: application/json");
curl_easy_setopt(hnd, CURLOPT_HTTPHEADER, headers);

CURLcode ret = curl_easy_perform(hnd);
[/su_tab]

Create a task using GET-request

/

Create scraping task

cURLNodePHPPythonJavaScriptC++

要求

curl --request POST \
--url 'https://api.scrapingrobot.com/?responseType=json&waitUntil=load&noScripts=false&noImages=true&noFonts=true&noCss=true&contentType=text%2Fplain' \
--header 'Accept: application/json' \
--header 'Content-Type: application/json'

要求

const sdk = require('api')('@staging-docs-scrapingrobot/v1.0#1b382ykufox5uf');

sdk.server('https:/api.scrapingrobot.com/');
sdk.post('', {
responseType: 'json',
waitUntil: 'load',
noScripts: 'false',
noImages: 'true',
noFonts: 'true',
noCss: 'true',
contentType: 'text%2Fplain'
})
.then(res => console.log(res))
.catch(err => console.error(err));

要求

<?php
require_once('vendor/autoload.php');

$client = new \GuzzleHttp\Client();

$response = $client->request('POST', 'https://api.scrapingrobot.com/?responseType=json&waitUntil=load&noScripts=false&noImages=true&noFonts=true&noCss=true&contentType=text%2Fplain', [
'headers' => [
'Accept' => 'application/json',
'Content-Type' => 'application/json',
],
]);

echo $response->getBody();

要求

import requests

url = "https://api.scrapingrobot.com/?responseType=json&waitUntil=load&noScripts=false&noImages=true&noFonts=true&noCss=true&contentType=text%2Fplain"

headers = {
"Accept": "application/json",
"Content-Type": "application/json"
}

response = requests.request("POST", url, headers=headers)

print(response.text)

要求

const options = {
method: 'POST',
headers: {Accept: 'application/json', 'Content-Type': 'application/json'}
};

fetch('https://api.scrapingrobot.com/?responseType=json&waitUntil=load&noScripts=false&noImages=true&noFonts=true&noCss=true&contentType=text%2Fplain', options)
.then(response => response.json())
.then(response => console.log(response))
.catch(err => console.error(err));

要求

CURL *hnd = curl_easy_init();

curl_easy_setopt(hnd, CURLOPT_CUSTOMREQUEST, "POST");
curl_easy_setopt(hnd, CURLOPT_URL, "https://api.scrapingrobot.com/?responseType=json&waitUntil=load&noScripts=false&noImages=true&noFonts=true&noCss=true&contentType=text%2Fplain");

struct curl_slist *headers = NULL;
headers = curl_slist_append(headers, "Accept: application/json");
headers = curl_slist_append(headers, "Content-Type: application/json");
curl_easy_setopt(hnd, CURLOPT_HTTPHEADER, headers);

CURLcode ret = curl_easy_perform(hnd);
[/su_tab]

Get current credits balance

/balance

获取

Get current credits balance

cURLNodePHP PythonJavaScriptC++

要求

curl --request GET \
--url https://api.scrapingrobot.com/ \
--header 'Accept: application/json'

要求

const sdk = require('api')('@staging-docs-scrapingrobot/v1.0#1b382ykufox5uf');

sdk.server('https:/api.scrapingrobot.com/');
sdk.get('', {
responseType: 'json',
waitUntil: 'load',
noScripts: 'false',
noImages: 'true',
noFonts: 'true',
noCss: 'true'
})
.then(res => console.log(res))
.catch(err => console.error(err));

要求

<?php
require_once('vendor/autoload.php');

$client = new \GuzzleHttp\Client();

$response = $client->request('GET', 'https://api.scrapingrobot.com/?responseType=json&waitUntil=load&noScripts=false&noImages=true&noFonts=true&noCss=true', [
'headers' => [
'Accept' => 'application/json',
],
]);

echo $response->getBody();

要求

import requests

url = "https://api.scrapingrobot.com/?responseType=json&waitUntil=load&noScripts=false&noImages=true&noFonts=true&noCss=true"

headers = {"Accept": "application/json"}

response = requests.request("GET", url, headers=headers)

print(response.text)

要求

const options = {method: 'GET', headers: {Accept: 'application/json'}};

fetch('https://api.scrapingrobot.com/?responseType=json&waitUntil=load&noScripts=false&noImages=true&noFonts=true&noCss=true', options)
.then(response => response.json())
.then(response => console.log(response))
.catch(err => console.error(err));

要求

CURL *hnd = curl_easy_init();

curl_easy_setopt(hnd, CURLOPT_CUSTOMREQUEST, "GET");
curl_easy_setopt(hnd, CURLOPT_URL, "https://api.scrapingrobot.com/?responseType=json&waitUntil=load&noScripts=false&noImages=true&noFonts=true&noCss=true");

struct curl_slist *headers = NULL;
headers = curl_slist_append(headers, "Accept: application/json");
curl_easy_setopt(hnd, CURLOPT_HTTPHEADER, headers);

CURLcode ret = curl_easy_perform(hnd);
[/su_tab]

参考资料


免责声明:这部分内容主要来自商家。如果商家不希望在我的网站上显示,请 联系我们 删除您的内容。

最后更新于 5 月 15, 2022

您推荐代理服务吗?

点击奖杯即可颁奖!

平均评分 0 /5.计票: 0

目前没有投票!成为第一个给本帖评分的人。

发表评论

您的电子邮箱地址不会被公开。 必填项已用*标注

zh_CNChinese
滚动到顶部