Oxylabs Real-Time Crawler

如何使用 OxyLabs 实时爬虫 [第 1 部分]:谷歌实时爬虫

Do you know how to use OxyLabs Real-time Crawler for Google? This is the most comprehensive introduction from OxyLabs official.

快速入门

Real-Time Crawler is built for heavy-duty data retrieval operations. You can use Real-Time Crawler to access various Google pages, including regular search, hotel availability and Google Shopping. It enables effortless web data extraction from search engines without any delays or errors.

Real-Time Crawler for Google uses 基本 HTTP 身份验证 需要发送用户名和密码。

This is by far the fastest way to start using Real-Time Crawler for Google. You will send a query 阿迪达斯 至 google_search 使用 实时 整合方法。不要忘记替换 用户名 和 密码 使用代理用户凭据。

curl --user "USERNAME:PASSWORD" 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json" -d '{"source": "google_search", "domain": "com", "query": "adidas"}'

如果您有任何本文件未涉及的问题,请联系您的客户经理或我们的支持人员,地址是 [email protected].


Postman

Download and import this Postman collection to try out all of the Google crawler functions and data delivery methods documented on this page.

Real-Time Crawler for Google Postman


整合方法

Real-Time Crawler for Google supports three integration methods which have their unique benefits:

  • 推拉式.使用这种方法,现在需要与我们的端点保持活动连接,以检索数据。在发出请求后,我们的系统会在任务完成后自动 ping 用户服务器(请参阅 回调).这种方法可以节省计算资源,而且易于扩展。
  • 实时.该方法要求用户与我们的端点保持活动连接,以便在任务完成时成功获取结果。这种方法可以在一个服务中实现,而推拉法则需要两个步骤。
  • 超级用户接口.这种方法与实时方法非常相似,但用户可以使用 HTML Cralwer 作为代理,而不是向我们的端点发布数据。要检索数据,用户必须设置一个代理端点,并向所需的 URL 发送 GET 请求。必须使用标头添加其他参数。

我们推荐的数据提取方法是 推拉式.


推拉式

这是最简单、最可靠、最值得推荐的数据传输方法。在推拉式方案中,您向我们发送查询,我们向您返回工作 本我工作完成后,您可以使用 本我 中检索内容 /结果 endpoint. You can check job completion status yourself, or you can set up a simple listener that is able to accept POST queries. This way we will send you a callback message once the job is ready to be retrieved. In this particular example the results will be automatically 上传到您的 S3 存储桶 名为 您的邮筒名称.

You can also try and see how Push-Pull method works via Postman. Download this file to get started.


单一查询

以下端点将处理对一个关键字或 URL 的单次查询。API 将返回一条确认信息,其中包含任务信息,包括任务 本我.您可以使用它来检查任务完成状态 本我或者,您也可以要求我们在扫描任务完成后 ping 您的回调端点,方法是添加 回调URL 在查询中。

邮寄 https://data.oxylabs.io/v1/queries

您需要在 JSON 主体中将查询参数作为数据发布。

curl --user user:pass1 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json" 
 -d '{"source": "google_search", "domain": "com", "query": "adidas", "callback_url": "https://your.callback.url", "storage_type": "s3", "storage_url": "YOUR_BUCKET_NAME"}'

API 将以 JSON 格式响应查询信息,并将其打印在响应体中,与此类似:

{
  "callback_url": "https://your.callback.url",
  "client_id": 5,
  "context": [
    {
      "key": "results_language",
      "value": null
    },
    {
      "key": "safe_search",
      "value": null
    },
    {
      "key": "tbm",
      "value": null
    },
    {
      "key": "cr",
      "value": null
    },
    {
      "key": "filter",
      "value": null
    }
  ],
  "created_at": "2019-10-01 00:00:01",
  "domain": "com",
  "geo_location": null,
  "id": "12345678900987654321",
  "limit": 10,
  "locale": null,
  "pages": 1,
  "parse": false,
  "render": null,
  "query": "adidas",
  "source": "google_search",
  "start_page": 1,
  "status": "pending",
  "storage_type": "s3",
  "storage_url": "YOUR_BUCKET_NAME/12345678900987654321.json",
  "subdomain": "www",
  "updated_at": "2019-10-01 00:00:01",
  "user_agent_type": "desktop",
  "_links": [
    {
      "rel": "self",
      "href": "http://data.oxylabs.io/v1/queries/12345678900987654321",
      "method": "GET"
    },
    {
      "rel": "results",
      "href": "http://data.oxylabs.io/v1/queries/12345678900987654321/results",
      "method": "GET"
    }
  ]
}

检查工作状态

如果您的查询有 回调URL一旦完成刮擦任务,我们将向您发送一条包含内容链接的信息。但是,如果没有 回调URL 在查询中,您需要自己检查任务状态。为此,您需要使用 href 根据 rel:自我 在您向我们的 API 提交查询后收到的响应信息中。它应该与下面的内容相似: http://data.oxylabs.io/v1/queries/12345678900987654321.

GET https://data.oxylabs.io/v1/queries/{id}

查询该链接将返回工作信息,包括其 地位.有 3 种可能 地位 价值观

未决 任务仍在队列中,尚未完成。
完成的 任务完成后,您可以通过在 href 根据 rel:成果 : http://data.oxylabs.io/v1/queries/12345678900987654321/results
有问题 任务出了问题,我们无法完成,很可能是目标网站方面的服务器出错。
curl --user user:pass1 'http://data.oxylabs.io/v1/queries/12345678900987654321'

API 将在响应正文中打印 JSON 格式的查询信息。请注意,任务 地位 改为 完成的.现在您可以通过查询 http://data.oxylabs.io/v1/queries/12345678900987654321/results.

您还可以看到任务已被 updated_at 2019-10-01 00:00:15 - 查询需要 14 秒才能完成。

{
  "client_id": 5,
  "context": [
    {
      "key": "results_language",
      "value": null
    },
    {
      "key": "safe_search",
      "value": null
    },
    {
      "key": "tbm",
      "value": null
    },
    {
      "key": "cr",
      "value": null
    },
    {
      "key": "filter",
      "value": null
    }
  ],
  "created_at": "2019-10-01 00:00:01",
  "domain": "com",
  "geo_location": null,
  "id": "12345678900987654321",
  "limit": 10,
  "locale": null,
  "pages": 1,
  "parse": false,
  "render": null,
  "query": "adidas",
  "source": "google_search",
  "start_page": 1,
  "status": "done",
  "subdomain": "www",
  "updated_at": "2019-10-01 00:00:15",
  "user_agent_type": "desktop",
  "_links": [
    {
      "rel": "self",
      "href": "http://data.oxylabs.io/v1/queries/12345678900987654321",
      "method": "GET"
    },
    {
      "rel": "results",
      "href": "http://data.oxylabs.io/v1/queries/12345678900987654321/results",
      "method": "GET"
    }
  ]
}

检索工作内容

通过检查作业状态或接收我们的回调,一旦知道作业已准备就绪,您就可以使用以下 URL 获取作业 href 根据 rel:成果 在我们的初始响应或回调信息中。看起来应该类似于下面这样: http://data.oxylabs.io/v1/queries/12345678900987654321/results.

GET https://data.oxylabs.io/v1/queries/{id}/results

通过设置 "任务状态",可以自动检索结果,而无需定期检查任务状态。 回调 服务。用户需要指定运行回调服务的服务器的 IP 或域。当我们的系统完成一项任务时,它将向所提供的 IP 或域发送一条信息,回调服务将下载结果,如 回调实现示例.

curl --user user:pass1 'http://data.oxylabs.io/v1/queries/12345678900987654321/results'

API 将返回工作内容:

{
  "results": [
    {
      "content": "<!doctype html>
        CONTENT      
      ",
      "created_at": "2019-10-01 00:00:01",
      "updated_at": "2019-10-01 00:00:15",
      "page": 1,
      "url": "https://www.google.com/search?q=adidas&hl=en&gl=US",
      "job_id": "12345678900987654321",
      "status_code": 200
    }
  ]
}

回调

回调是一个 职位 我们会向您的机器发送请求,告知数据提取任务已完成,并提供下载刮擦内容的 URL。这意味着您不再需要 检查工作状态 手动操作。一旦数据到齐,我们会通知您,您现在需要做的就是 取回.

# 请查看 Python 和 PHP 代码示例。

回调输出示例

{  
   "created_at":"2019-10-01 00:00:01",
   "updated_at":"2019-10-01 00:00:15",
   "locale":null,
   "client_id":163,
   "user_agent_type":"desktop",
   "source":"google_search",
   "pages":1,
   "subdomain":"www",
   "status":"done",
   "start_page":1,
   "parse":0,
   "render":null,
   "priority":0,
   "ttl":0,
   "origin":"api",
   "persist":true,
   "id":"12345678900987654321",
   "callback_url":"http://your.callback.url/",
   "query":"adidas",
   "domain":"com",
   "limit":10,
   "geo_location":null,
   {...}
   "_links":[
      {  
         "href":"https://data.oxylabs.io/v1/queries/12345678900987654321",
         "method":"GET",
         "rel":"self"
      },
      {  
         "href":"https://data.oxylabs.io/v1/queries/12345678900987654321/results",
         "method":"GET",
         "rel":"results"
      }
   ],
}

批量查询

实时爬虫还支持执行多个关键字,每批最多可执行 1,000 个关键字。以下端点将向提取队列提交多个关键词。

邮寄 https://data.oxylabs.io/v1/queries/batch

您需要在 JSON 主体中将查询参数作为数据发布。

系统会将每个关键词作为一个单独请求处理。如果您提供了回调 URL,您将为每个关键字收到单独的调用。否则,我们的初始响应将包含工作 本我的所有关键字。例如,如果您发送了 50 个关键字,我们将返回 50 个唯一的职位。 本我s.

重要! 询问 是唯一一个可以有多个值的参数。所有其他参数对于该批次查询都是一样的。

curl --user user:pass1 'https://data.oxylabs.io/v1/queries/batch' -H 'Content-Type: application/json' -d '@keywords.json
 -d'@keywords.json'(关键词

keywords.json 内容:

{  
   "query":[  
      "adidas",
      "nike",
      "reebok"
   ],
   "source": "google_search",
   "domain": "com",
   "callback_url": "https://your.callback.url"
}

API 将以 JSON 格式响应查询信息,并将其打印在响应体中,与此类似:

{
  "queries": [
    {
      "callback_url": "https://your.callback.url",
      {...}
      "created_at": "2019-10-01 00:00:01",
      "domain": "com",
      "id": "12345678900987654321",
      {...}
      "query": "adidas",
      "source": "google_search",
      {...}
          "rel": "results",
          "href": "http://data.oxylabs.io/v1/queries/12345678900987654321/results",
          "method": "GET"
        }
      ]
    },
    {
      "callback_url": "https://your.callback.url",
      {...}
      "created_at": "2019-10-01 00:00:01",
      "domain": "com",
      "id": "12345678901234567890",
      {...}
      "query": "nike",
      "source": "google_search",
      {...}
          "rel": "results",
          "href": "http://data.oxylabs.io/v1/queries/12345678901234567890/results",
          "method": "GET"
        }
      ]
    },
    {
      "callback_url": "https://your.callback.url",
      {...}
      "created_at": "2019-10-01 00:00:01",
      "domain": "com",
      "id": "01234567899876543210",
      {...}
      "query": "reebok",
      "source": "google_search",
      {...}
          "rel": "results",
          "href": "http://data.oxylabs.io/v1/queries/01234567899876543210/results",
          "method": "GET"
        }
      ]
    }
  ]
}

获取通知程序 IP 地址列表

您可能希望将向您发送回调信息的 IP 列入白名单,或为其他目的获取这些 IP 的列表。这可以通过 获取在这个端点上: https://data.oxylabs.io/v1/info/callbacker_ips.

curl --user user:pass1 'https://data.oxylabs.io/v1/info/callbacker_ips'

API 将返回向您的系统发出回调请求的 IP 列表:

{
    "ips":[
        "x.x.x.x"、
        "y.y.y.y"
    ]
}

上传到存储器

默认情况下,RTC 任务结果存储在我们的数据库中。这意味着您需要查询我们的结果端点并自行检索内容。自定义存储功能允许您将结果存储在自己的云存储中。该功能的优势在于,您无需为了获取结果而发出额外请求,所有内容都会直接存储到您的存储桶中。

我们支持亚马逊 S3 和谷歌云存储。如果您想使用其他类型的存储,请联系您的客户经理,讨论功能交付时间表。

亚马逊 S3

要将作业结果上传到 Amazon S3 存储桶,请为我们的服务设置访问权限。为此,请访问 https://s3.console.aws.amazon.com/ > S3 > 存储 > 桶名称(如果没有,请新建) > 权限 > 桶策略

Real-Time Crawler for Google Upload to Storage1

您可以在此找到水桶政策 JSON 或右侧的代码示例区。不要忘记在 您的邮筒名称.通过该策略,我们可以向您的邮筒写入内容,允许您访问上传的文件,并了解邮筒的位置。

谷歌云存储

要将作业结果上传到您的 Google Cloud Storage 存储桶,请为我们的服务设置特殊权限。为此,请使用 存储.对象.创建 权限并将其分配给 Oxylabs 服务帐户电子邮件 [email protected].

Real-Time Crawler for Google Upload to Storage2

Real-Time Crawler for Google Upload to Storage3

使用方法

要使用此功能,请在请求中指定两个附加参数。了解更多信息 这里.

上传路径如下 YOUR_BUCKET_NAME/job_ID.json.您可以在提交请求后从我们收到的回复正文中找到职位 ID。在 本例 工作编号为 12345678900987654321.

{
    "版本":"2012-10-17",
    "Id":"Policy1577442634787",
    "声明":[
        {
            "Sid":"Stmt1577442633719"、
            "效果":"允许"、
            "校长":{
                "AWS":"arn:aws:iam::324311890426:user/oxylabs.s3.uploader"
            },
            "Action":"s3:GetBucketLocation"、
            "资源":"arn:aws:s3:::YOUR_BUCKET_NAME" }.
        },
        {
            "Sid":"Stmt1577442633719"、
            "效果":"允许"、
            "校长":{
                "AWS":"arn:aws:iam::324311890426:user/oxylabs.s3.uploader"
            },
            "Action":[
                "s3:PutObject"、
                "s3:PutObjectAcl"。
            ],
            "资源":"arn:aws:s3:::YOUR_BUCKET_NAME/*"。
        }
    ]
}

实时

数据提交方式与推拉方式相同,但在实时情况下,我们将在连接打开时返回内容。您向我们发送查询,连接保持打开,我们检索内容并发送给您。处理的端点如下:

邮寄 https://realtime.oxylabs.io/v1/queries

开放连接的超时限制为 150 秒,因此在极少数负载较重的情况下,我们可能无法确保将数据发送给您。

您需要在 JSON 主体中将查询参数作为数据发布。详情请参阅示例。

curl --user user:pass1 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json" 
 -d '{"source": "google_search", "domain": "com", "query": "adidas"}'

打开连接时将返回的响应体示例:

{
  "results": [
    {
      "content": "
      CONTENT
      "
      "created_at": "2019-10-01 00:00:01",
      "updated_at": "2019-10-01 00:00:15",
      "id": null,
      "page": 1,
      "url": "https://www.google.com/search?q=adidas&hl=en&gl=US",
      "job_id": "12345678900987654321",
      "status_code": 200
    }
  ]
}

超级用户接口

如果您曾经使用过普通代理进行数据搜刮,那么集成 SuperAPI 传输方法将轻而易举。只需将我们的入口节点用作代理,使用实时爬虫凭据进行授权,并忽略证书即可。在 cURL 这是 -k 或 --不安全.您的数据将通过开放连接发送给您。

GET realtime.oxylabs.io:60000

超级用户接口只支持少量参数,因为它 only works with a 直接 数据源 其中提供了完整的 URL。这些参数应作为标头发送。这是可接受的参数列表:

X-OxySERPs-User-Agent-Type 虽然无法指明特定的 User-Agent,但您可以让我们知道您使用的浏览器和平台。支持的用户代理列表如下所示 这里.
X-OxySERPs-地理位置 In some cases you may need to indicate the geographical location that the result should be adapted for. This parameter corresponds to the 地理位置. Read about our suggested 地理位置 parameter structures 这里.

如果您在设置超级用户接口时需要帮助,请致电 [email protected].

curl -k -x realtime.oxylabs.io:60000 -U user:pass1 -H "X-OxySERPs-User-Agent-Type: desktop_chrome" -H "X-OxySERPs-Geo-Location: New York,New York,United States" "https://www.google.com/search?q=adidas"

内容类型

Real-Time Crawler can return either 原始HTML或 structured (parsed) JSON. Bear in mind that not all data sources can be returned structured. An icon under each data source in this documentation will indicate whether we are able to parse it, or we can only return raw HTML.

Please see Parsed Data to see which fields we return with each Data Source.


数据来源

There are multiple approaches how to retrieve data from Google using Real-Time Crawler. You can give us full URL via 直接或通过专门构建的数据源指定参数,例如 搜索Shopping Product 或 Images.

Technically not a content type, but Real-Time Crawler is able to render JavaScript when scraping. This is necessary in some Google pages, such as Flights and Patents. A checkmark under 渲染 JS 将指示特定数据源是否可以在启用 JavaScript 的情况下进行刮擦。

如果您不确定选择哪种方式,请给我们留言 [email protected] 或联系您的客户经理。


直接

Real-Time Crawler for Google Direct

google source is designed to retrieve content of direct URLs of various Google pages. This means that instead of sending multiple parameters, you can provide us with a direct URL to required Google page. We do not strip any parameters or alter your URLs in any other way.

This data source also supports parsed data (Parsed JSON), as long as the URL submitted is for Google Search (SERP page). If we are unable to confirm this is a SERP page request, a failure message will be returned.

查询参数

参数 说明 默认值
消息来源 数据来源 google
网址 Direct URL (link) to Google page
用户代理类型 设备类型和浏览器。完整列表如下 这里。 桌面
给予 启用 JavaScript 渲染。在目标需要 JavaScript 加载内容时使用。仅适用于推拉(又称回调)方法。该参数有两个可用值:html(获取原始输出)和 png(获取 Base64 编码的截图)。
回调URL 回调端点的 URL
地理位置 The geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data. For more information, read about our suggested geo_location parameter structures 这里
解析 true will return parsed data, as long as the URL submitted is for Google Search. See Parsed Data for more information.
存储类型 存储服务提供商。我们支持 Amazon S3 和 Google Cloud Storage。这些存储服务提供商的 storage_type 参数值分别为 s3 和 gcs。完整的实现可以在 上传到存储器 页。此功能只能通过推拉(回调)方法使用。
存储URL 您的存储桶名称。仅适用于推挽(回调)方法。
   - 所需参数

In this example the API will retrieve Google Scholar search for keyword newton 推拉法

curl --user user:pass1 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google", "url": "https://scholar.google.com/scholar?hl=en&q=newton&btnG=&as_sdt=1%2C5&as_sdtp="}'

以下是实时模式下的相同示例:

curl --user user:pass1 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google", "url": "https://scholar.google.com/scholar?hl=en&q=newton&btnG=&as_sdt=1%2C5&as_sdtp="}'

并通过超级用户接口(SuperAPI):

curl -k -x realtime.oxylabs.io:60000 -U user:pass1 "https://scholar.google.com/scholar?hl=en&q=newton&btnG=&as_sdt=1%2C5&as_sdtp="

Real-Time Crawler for Google Search

google_search source is designed to retrieve Google Search results (SERP).

查询参数

参数 说明 默认值
消息来源 数据来源 google_search
领域 域名本地化
询问 UTF 编码的关键字
start_page 起始页码 1
页面 要检索的页数 1
限额 每页要检索的结果数量 10
地点 Accept-Language header value. This will change Google search page web interface language (not results). For example if you use domain com and use locale parameter de-DE, the results will still be American, but Accept-Language will be set to de-DE,de;q=0.8. This would imitate person from US searching in com domain, who has his browser's UI set to German. If you don't use this parameter we will set ‘Accept-Language' parameter according to domain (i.e. en-US for com). List of available Google locales can be found 这里。
地理位置 The geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data. For more information, read about our suggested geo_location parameter structures 这里
用户代理类型 设备类型和浏览器。完整列表如下 这里。 桌面
给予 启用 JavaScript 渲染。在目标需要 JavaScript 加载内容时使用。仅适用于推拉(又称回调)方法。该参数有两个可用值:html(获取原始输出)和 png(获取 Base64 编码的截图)。
回调URL 回调端点的 URL
解析 true will return parsed data. See Parsed Data for more information.
解析器类型 Leave blank to get the default layout, or set the value to v2 to make use of the updated Google Search parsed output schema and/or receive the result in CSV format (only works with Google Web Search). See Parsed Data for more information.
背景 Setting the fpstate value to aig will make Google load more apps. This parameter is only useful if used together with the render parameter.
fpstate
背景 true will turn off spelling auto-correction.
nfpr
背景 Results language. List of supported Google languages can be found 这里。
results_language
背景 To-be-matched or tbm parameter. Accepted values are: app, blg, bks, dsc, isch, nws, pts, plcs, rcp, lcl
tbm
背景 tbs parameter. This parameter is like a container for more obscure google parameters, like limiting/sorting results by date as well as other filters some of which depend on the tbm parameter (e.g. tbs=app_os:1 is only available with tbm value app). More info 这里。
tbs
存储类型 Storage service provider. We support Amazon S3 and Google Cloud Storage. The storage_type parameter values for these storage providers are, correspondingly, s3 and gcs. The full implementation can be found on the  上传到存储器  页。此功能只能通过推拉(回调)方法使用。
存储URL 您的存储桶名称。仅适用于推挽(回调)方法。
   - 所需参数

API 向 google.nl 检索关键字从第 11 到第 20 的搜索结果页面 阿迪达斯. The results will be displayed in French, since results_language parameter is also passed through via context.API 将向 your.callback.url 包含 URL,以便在数据检索任务成功完成后下载原始 HTML 输出。这就是推拉式:

curl --user user:pass1 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_search", "domain": "nl", "query": "adidas", "start_page": 11, "pages": 10, "callback_url": "https://your.callback.url", "context": [{"key": "results_language", "value": "fr"}]}}'

以下是实时模式下的相同示例:

curl --user user:pass1 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_search", "domain": "nl", "query": "adidas", "start_page": 11, "pages": 10, "callback_url": "https://your.callback.url", "context": [{"key": "results_language", "value": "fr"}]}}'

Ads

Real-Time Crawler for Google Ads

google_ads source is optimized to retrieve Google Search results page (SERP) with paid ads. This source will return only 10 results per page, ensuring highest changes of paid results showing up. Other than that, it supports the same parameters as regular 搜索

查询参数

参数 说明 默认值
消息来源 数据来源 google_ads
领域 域名本地化
询问 UTF 编码的关键字
start_page 起始页码 1
页面 要检索的页数 1
地点 Accept-Language header value. This will change Google search page web interface language (not results). For example if you use domain com and use locale parameter de-DE, the results will still be American, but Accept-Language will be set to de-DE,de;q=0.8. This would imitate person from US searching in com domain, who has his browser's UI set to German. If you don't use this parameter we will set ‘Accept-Language' parameter according to domain (i.e. en-US for com). List of available Google locales can be found 这里。
地理位置 The geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data. For more information, read about our suggested geo_location parameter structures 这里
用户代理类型 设备类型和浏览器。完整列表如下 这里。 桌面
回调URL 回调端点的 URL
解析 true will return parsed data. See Parsed Data for more information.
背景 true will turn off spelling auto-correction.
nfpr
背景 Results language. List of supported Google languages can be found 这里。
results_language
背景 To-be-matched or tbm parameter. Accepted values are: app, blg, bks, dsc, isch, nws, pts, plcs, rcp, lcl
tbm
背景 tbs parameter. This parameter is like a container for more obscure google parameters, like limiting/sorting results by date as well as other filters some of which depend on the tbm parameter (e.g. tbs=app_os:1 is only available with tbm value app). More info 这里。
tbs
存储类型 存储服务提供商。我们支持 Amazon S3 和 Google Cloud Storage。这些存储服务提供商的 storage_type 参数值分别为 s3 和 gcs。完整的实现可以在 上传到存储器 页。此功能只能通过推拉(回调)方法使用。
存储URL 您的存储桶名称。仅适用于推挽(回调)方法。
   - 所需参数

API 向 google.nl to retrieve search results for keyword 阿迪达斯.API 将向 your.callback.url 包含 URL,以便在数据检索任务成功完成后下载原始 HTML 输出。这就是推拉式:

curl --user user:pass1 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_ads", "domain": "nl", "query": "adidas", "callback_url": "https://your.callback.url"}'

以下是实时模式下的相同示例:

curl --user user:pass1 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_ads", "domain": "nl", "query": "adidas"}'

Hotels

Real-Time Crawler for Google Hotels

google_hotels data source is designed to retrieve Google Hotel search results.

查询参数

参数 说明 默认值
消息来源 数据来源 google_hotels
领域 域名本地化
询问 UTF 编码的关键字
start_page 起始页码 1
页面 要检索的页数 1
限额 每页要检索的结果数量 10
地点 Accept-Language header value. This will change Google search page web interface language (not results). For example if you use domain com and use locale parameter de-DE, the results will still be American, but Accept-Language will be set to de-DE,de;q=0.8. This would imitate person from US searching in com domain, who has his browser's UI set to German. If you don't use this parameter we will set ‘Accept-Language' parameter according to domain (i.e. en-US for com). List of available Google locales can be found 这里。
results_language Results language. List of supported Google languages can be found 这里。
地理位置 The geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data. For more information, read about our suggested geo_location parameter structures 这里
用户代理类型 设备类型和浏览器。完整列表如下 这里。 桌面
给予 启用 JavaScript 渲染。在目标需要 JavaScript 加载内容时使用。仅适用于推拉(又称回调)方法。该参数有两个可用值:html(获取原始输出)和 png(获取 Base64 编码的截图)。
回调URL 回调端点的 URL
背景 true will turn off spelling auto-correction.
nfpr
背景 Number of guests 2
hotel_occupancy
背景 Length for staying in the hotel, from – to. Example: 2017-07-12,2017-07-13
hotel_dates
存储类型 存储服务提供商。我们支持 Amazon S3 和 Google Cloud Storage。这些存储服务提供商的 storage_type 参数值分别为 s3 和 gcs。完整的实现可以在 上传到存储器 页。此功能只能通过推拉(回调)方法使用。
存储URL 您的存储桶名称。仅适用于推挽(回调)方法。
   - 所需参数

Please note that with Google hotels you always need to send a keyword with ‘hotels' word inside, for example ‘hotels in Los Angeles', ‘hotels in Paris, France', etc. Both ‘hotel' and ‘hotels' work. Google also supports local languages, so you can send ‘Hotelli Helsingissä' for hotels in Helsinki or ‘viešbučiai Vilnius' for hotels in Vilnius.

In this example API will retrieve first 3 pages of hotel availability for 1 guest between 2019-10-01 和 2019-10-10 对于 hotels in Paris 从 google.com. This is Push-Pull method.

curl --user user:pass1 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_hotels", "domain": "com", "pages": 3, "query": "hotels in Paris", "context": [{"key": "hotel_occupancy", "value": 1}, {"key": "hotel_dates", "value": "2019-10-01,2019-10-10"}]}'

This is in Realtime:

curl --user user:pass1 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_hotels", "domain": "com", "pages": 3, "query": "hotels in Paris", "context": [{"key": "hotel_occupancy", "value": 1}, {"key": "hotel_dates", "value": "2019-10-01,2019-10-10"}]}'

Travel: Hotels

Real-Time Crawler for Google Travel Hotels

google_travel_hotels data source is designed to retrieve Google Travel service's hotel search results.

查询参数

参数 说明 默认值
消息来源 数据来源 google_travel_hotels
领域 域名本地化
询问 UTF 编码的关键字
start_page 起始页码 1
地点 Accept-Language header value. This will change Google search page web interface language (not results). For example if you use domain com and use locale parameter de-DE, the results will still be American, but Accept-Language will be set to de-DE,de;q=0.8. This would imitate person from US searching in com domain, who has his browser's UI set to German. If you don't use this parameter we will set ‘Accept-Language' parameter according to domain (i.e. en-US for com). List of available Google locales can be found 这里。
地理位置 The geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data. Please note that this source can accept a limited number of geo_location values – please check this file to see geo_location values that don't yield accurate results.
用户代理类型 设备类型和浏览器。完整列表如下 这里。 桌面
给予 Enable JavaScript rendering. Use when the target requires JavaScript to load content. Only works via Push-Pull (a.k.a. Callback) method. There are two available values for this parameter: html(get raw output) and png (get a Base64-encoded screenshot). Please note that without JavaScript rendering, Google Travel Hotels will not return any useful content.
回调URL 回调端点的 URL
背景 Number of guests 2
hotel_occupancy
背景 Filter results by # of hotel stars. You may specify one or more values between 2 and 5. Example: [3,4]
hotel_classes
背景 Dates for staying at the hotel, from – to. Example: 2017-07-12,2017-07-13
hotel_dates
存储类型 Storage service provider. At the moment only Amazon S3 is supported: s3. Full implementation can be found on the 上传到存储器 page.
存储URL Your Amazon S3 bucket name
   - 所需参数

Please note that with Google hotels you always need to send a keyword with ‘hotels' word inside, for example ‘hotels in Los Angeles', ‘hotels in Paris, France', etc. Both ‘hotel' and ‘hotels' work. Google also supports local languages, so you can send ‘Hotelli Helsingissä' for hotels in Helsinki or ‘viešbučiai Vilnius' for hotels in Vilnius.

In this example API will retrieve the 2nd page of results for of hotel availability for 2 guests between 2020-10-01 和 2020-10-10 对于 hotels in Paris 从 google.com. The results will be filtered to only show 2 and 4 star hotels. This is Push-Pull method.

curl --user user:pass1 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_travel_hotels", "domain": "com", "start_page": 2, "query": "hotels in Paris", "callback_url": "https://your.callback.url", "context": [{"key": "hotel_occupancy", "value": 2}, {"key": "hotel_dates", "value": "2020-10-01,2020-10-10"}, {"key": "hotel_classes", "value": [2,4]}]}'

This is in Realtime:

curl --user user:pass1 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_travel_hotels", "domain": "com", "start_page": 2, "query": "hotels in Paris", "context": [{"key": "hotel_occupancy", "value": 2}, {"key": "hotel_dates", "value": "2020-10-01,2020-10-10"}, {"key": "hotel_classes", "value": [2,4]}]}'

Real-Time Crawler for Google Shopping Search

google_shopping_search source is designed to retrieve Google Shopping search results.

邮寄 https://data.oxylabs.io/v1/queries

查询参数

参数 说明 默认值
消息来源 数据来源 google_shopping_search
领域 域名本地化
询问 UTF 编码的关键字
start_page 起始页码 1
页面 要检索的页数 1
地点 Accept-Language header value. This will change Google search page web interface language (not results). For example if you use domain com and use locale parameter de-DE, the results will still be American, but Accept-Language will be set to de-DE,de;q=0.8. This would imitate person from US searching in com domain, who has his browser's UI set to German. If you don't use this parameter we will set ‘Accept-Language' parameter according to domain (i.e. en-US for com). List of available Google locales can be found 这里。
results_language Results language. List of supported Google languages can be found 这里。
地理位置 The geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data. For more information, read about our suggested geo_location parameter structures 这里
用户代理类型 设备类型和浏览器。完整列表如下 这里。 桌面
给予 启用 JavaScript 渲染。在目标需要 JavaScript 加载内容时使用。仅适用于推拉(又称回调)方法。该参数有两个可用值:html(获取原始输出)和 png(获取 Base64 编码的截图)。
回调URL 回调端点的 URL
解析 true will return parsed data. See Parsed Data for more information.
背景 true will turn off spelling auto-correction.
nfpr
背景 Sort product list by given criteria. r applies default Google sorting, rv – by review score, p – by price ascending, pd – by price descending r
sort_by
背景 Minimum price of products to filter
min_price
背景 Maximum price of products to filter
max_price
存储类型 存储服务提供商。我们支持 Amazon S3 和 Google Cloud Storage。这些存储服务提供商的 storage_type 参数值分别为 s3 和 gcs。完整的实现可以在 上传到存储器 页。此功能只能通过推拉(回调)方法使用。
存储URL 您的存储桶名称。仅适用于推挽(回调)方法。
   - 所需参数

The API will download first 4 pages of Google Shopping search for keyword 阿迪达斯, sorted by descending price and minimum price of $20. This is how it's done in Push-Pull:

curl --user user:pass1 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_shopping_search", "domain": "com", "query": "adidas", "pages": 4, "context": [{"key": "sort_by", "value": "pd"}, {"key": "min_price", "value": 20}]}'

以下是实时模式下的相同示例:

curl --user user:pass1 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_shopping_search", "domain": "com", "query": "adidas", "pages": 4, "context": [{"key": "sort_by", "value": "pd"}, {"key": "min_price", "value": 20}]}'

Shopping Product

Real-Time Crawler for Google Shopping Product

google_shopping_product source is designed to retrieve Google Shopping product page for specified product.

查询参数

参数 说明 默认值
消息来源 数据来源 google_shopping_product
领域 域名本地化
询问 UTF-encoded product code
start_page 起始页码 1
页面 要检索的页数 1
地点 Accept-Language header value. This will change Google search page web interface language (not results). For example if you use domain com and use locale parameter de-DE, the results will still be American, but Accept-Language will be set to de-DE,de;q=0.8. This would imitate person from US searching in com domain, who has his browser's UI set to German. If you don't use this parameter we will set ‘Accept-Language' parameter according to domain (i.e. en-US for com). List of available Google locales can be found 这里。
results_language Results language. List of supported Google languages can be found 这里。
地理位置 The geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data. For more information, read about our suggested geo_location parameter structures 这里
用户代理类型 设备类型和浏览器。完整列表如下 这里。 桌面
给予 启用 JavaScript 渲染。在目标需要 JavaScript 加载内容时使用。仅适用于推拉(又称回调)方法。该参数有两个可用值:html(获取原始输出)和 png(获取 Base64 编码的截图)。
回调URL 回调端点的 URL
解析 true will return parsed data. See Parsed Data for more information.
存储类型 存储服务提供商。我们支持 Amazon S3 和 Google Cloud Storage。这些存储服务提供商的 storage_type 参数值分别为 s3 和 gcs。完整的实现可以在 上传到存储器 页。此功能只能通过推拉(回调)方法使用。
存储URL 您的存储桶名称。仅适用于推挽(回调)方法。
   - 所需参数

Here the API will download product page for product ID 5007040952399054528 from Google Shopping on google.com. It will also get first 4 pages with pricing information. This is how it looks in Push-Pull:

curl --user user:pass1 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_shopping_product", "domain": "com", "query": "5007040952399054528"}'

The same in Realtime:

curl --user user:pass1 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_shopping_product", "domain": "com", "query": "5007040952399054528"}'

Shopping Product Pricing

Real-Time Crawler for Google Shopping Product Pricing

google_shopping_pricing source is designed to retrieve Google Shopping product pricing page for specified product.

查询参数

参数 说明 默认值
消息来源 数据来源 google_shopping_pricing
领域 域名本地化
询问 UTF-encoded product code
start_page 起始页码 1
页面 要检索的页数 1
地点 Accept-Language header value. This will change Google search page web interface language (not results). For example if you use domain com and use locale parameter de-DE, the results will still be American, but Accept-Language will be set to de-DE,de;q=0.8. This would imitate person from US searching in com domain, who has his browser's UI set to German. If you don't use this parameter we will set ‘Accept-Language' parameter according to domain (i.e. en-US for com). List of available Google locales can be found 这里。
results_language Results language. List of supported Google languages can be found 这里。
地理位置 The geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data. For more information, read about our suggested geo_location parameter structures 这里
用户代理类型 设备类型和浏览器。完整列表如下 这里。 桌面
给予 启用 JavaScript 渲染。在目标需要 JavaScript 加载内容时使用。仅适用于推拉(又称回调)方法。该参数有两个可用值:html(获取原始输出)和 png(获取 Base64 编码的截图)。
回调URL 回调端点的 URL
解析 true will return parsed data. See Parsed Data for more information.
存储类型 存储服务提供商。我们支持 Amazon S3 和 Google Cloud Storage。这些存储服务提供商的 storage_type 参数值分别为 s3 和 gcs。完整的实现可以在 上传到存储器 页。此功能只能通过推拉(回调)方法使用。
存储URL 您的存储桶名称。仅适用于推挽(回调)方法。
   - 所需参数

Here the API will download product pricing page for product ID 5007040952399054528 from Google Shopping on google.com. Here is an example in Push-Pull:

curl --user user:pass1 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_shopping_pricing", "domain": "com", "query": "5007040952399054528"}'

The same in Realtime:

curl --user user:pass1 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_shopping_pricing", "domain": "com", "query": "5007040952399054528"}'

Images

Real-Time Crawler for Google Images

google_images source is designed to retrieve Images search page for images that are similar to the one provided with 询问 parameter, as well as websites containing those images.

查询参数

参数 说明 默认值
消息来源 数据来源 google_images
领域 域名本地化
询问 URL to image
start_page 起始页码 1
页面 要检索的页数 1
地点 Accept-Language header value. This will change Google search page web interface language (not results). For example if you use domain com and use locale parameter de-DE, the results will still be American, but Accept-Language will be set to de-DE,de;q=0.8. This would imitate person from US searching in com domain, who has his browser's UI set to German. If you don't use this parameter we will set ‘Accept-Language' parameter according to domain (i.e. en-US for com). List of available Google locales can be found 这里。
地理位置 The geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data. For more information, read about our suggested geo_location parameter structures 这里
用户代理类型 设备类型和浏览器。完整列表如下 这里。 桌面
给予 启用 JavaScript 渲染。在目标需要 JavaScript 加载内容时使用。仅适用于推拉(又称回调)方法。该参数有两个可用值:html(获取原始输出)和 png(获取 Base64 编码的截图)。
回调URL 回调端点的 URL
背景 true will turn off spelling auto-correction.
nfpr
背景 Results language. List of supported Google languages can be found 这里。
results_language
存储类型 存储服务提供商。我们支持 Amazon S3 和 Google Cloud Storage。这些存储服务提供商的 storage_type 参数值分别为 s3 和 gcs。完整的实现可以在 上传到存储器 页。此功能只能通过推拉(回调)方法使用。
存储URL 您的存储桶名称。仅适用于推挽(回调)方法。
   - 所需参数

In this example the API will download Image search page of similar images for image https://newsneakernews-wpengine.netdna-ssl.com/wp-content/uploads/2017/03/adidas-boost-march-25-2017.jpg 从 google.com. This is Push-Pull method:

curl --user user:pass1 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_images", "domain": "com", "query": "https://newsneakernews-wpengine.netdna-ssl.com/wp-content/uploads/2017/03/adidas-boost-march-25-2017.jpg"}'

And this is the same request in Realtime:

curl --user user:pass1 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_images", "domain": "com", "query": "https://www.example.com/img/image.jpg"}'

Suggestions

Real-Time Crawler for Google Suggestions

google_suggest source is designed to retrieve Google keyword suggestions.

查询参数

参数 说明 默认值
消息来源 数据来源 google_suggest
询问 UTF 编码的关键字
地点 Accept-Language header value. This will change Google search page web interface language (not results). For example if you use domain com and use locale parameter de-DE, the results will still be American, but Accept-Language will be set to de-DE,de;q=0.8. This would imitate person from US searching in com domain, who has his browser's UI set to German. If you don't use this parameter we will set ‘Accept-Language' parameter according to domain (i.e. en-US for com). List of available Google locales can be found 这里。
地理位置 The geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data. For more information, read about our suggested geo_location parameter structures 这里
用户代理类型 设备类型和浏览器。完整列表如下 这里。 桌面
给予 启用 JavaScript 渲染。在目标需要 JavaScript 加载内容时使用。仅适用于推拉(又称回调)方法。该参数有两个可用值:html(获取原始输出)和 png(获取 Base64 编码的截图)。
回调URL 回调端点的 URL
存储类型 存储服务提供商。我们支持 Amazon S3 和 Google Cloud Storage。这些存储服务提供商的 storage_type 参数值分别为 s3 和 gcs。完整的实现可以在 上传到存储器 页。此功能只能通过推拉(回调)方法使用。
存储URL 您的存储桶名称。仅适用于推挽(回调)方法。
   - 所需参数

API makes request to Google Suggestions page to retrieve suggestions for keyword 阿迪达斯. The API will post a JSON payload to your.callback.url containing the URL to download the result once the task is finished. Here is an example with Push-Pull:

curl --user user:pass1 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_suggest", "query": "adidas", "callback_url": "https://your.callback.url"}'

The same request with Realtime:

curl --user user:pass1 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_suggest", "query": "adidas"}'

Keyword Data

Real-Time Crawler for Google Keyword Data

google_msv data source will retrieve Google keyword data for specified keywords, as well as suggested keywords (unless passing ideas=False in context). Keywords are passed in query parameter as a string. Keywords are separated by commas. Commas within a keyword are not supported, so a keyword “Water Bottle 5,0L” will actually be interpreted as 2 keywords: “Water Bottle 5” and “0L”. See output example on the right for more details.

查询参数

参数 说明 默认值
消息来源 数据来源 google_msv
询问 UTF-encoded keywords, separated by commas
地理位置 The geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data. For more information, read about our suggested geo_location parameter structures 这里
背景 Language, for example english or french. No parameter or empty value will return results for all languages.
language
背景 3-symbol currency code EUR
currency
背景 If true, returns keyword ideas, false will return only data for requested keywords TRUE
ideas
背景 When fetching ideas, will limit the number of idea keywords to retrieve to provided limit rounded up to the nearest multiple of 50 (e.g. 20 -> 50, 123 -> 150). 0 means no limit. 0
ideas_limit
背景 When fetching ideas, will filter out idea keywords that have a lower average monthly search volume than the provided number. 0 means no filter. 0
min_amsv
背景 When fetching ideas, will filter out idea keywords that have a higher average monthly search volume than the provided number. 0 means no filter. 0
max_amsv
背景 When fetching ideas, will filter out idea keywords that do not fall into the provided category. Available categories in . 无效
category
存储类型 存储服务提供商。我们支持 Amazon S3 和 Google Cloud Storage。这些存储服务提供商的 storage_type 参数值分别为 s3 和 gcs。完整的实现可以在 上传到存储器 页。此功能只能通过推拉(回调)方法使用。
存储URL 您的存储桶名称。仅适用于推挽(回调)方法。
   - 所需参数
In this example API will keyword data for meilleur restaurant and all suggested keywords. Keyword language is french, and geo location is Paris,Ile-de-France,France and currency is EUR.

curl --user user:pass1 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json"
-d '{"source": "google_msv", "query": "meilleur restaurant", "geo_location": "Paris,Ile-de-France,France", "context": [{"key": "language", "value": "french"},{"key": "currency", "value": "EUR"}, {"key": "ideas", "value": true}]}'

# OR if you don't want ideas:

curl --user user:pass1 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json"
-d '{"source": "google_msv", "query": "meilleur restaurant", "geo_location": "Paris,Ile-de-France,France", "context": [{"key": "language", "value": "french"},{"key": "currency", "value": "EUR"}, {"key": "ideas", "value": false}]}'

Sample output (historicalSearchVolume entries and ideas entries cut):

{
    "results": [
    {
        "content":
        {
            "ideas": [
            {
                "cpc": 4.712038,
                "keyword": "meilleur restaurant a paris",
                "currency": "EUR",
                "competition": 0.3385383889238515,
                "averageSearchVolume": 1900,
                "historicalSearchVolume": [
                {
                    "date": "201803",
                    "searchVolume": 1600
                },
                {
                    "date": "201802",
                    "searchVolume": 1900
                }]
            }],
            "seeds": [
            {
                "cpc": 4.05351,
                "keyword": "meilleur restaurant",
                "currency": "EUR",
                "competition": 0.3385341239238515,
                "averageSearchVolume": 2900,
                "historicalSearchVolume": [
                {
                    "date": "201803",
                    "searchVolume": 3600
                },
                {
                    "date": "201802",
                    "searchVolume": 2900
                }]
            }]
        }
    }]
}

Parsed data

Google Web Search (SERP) page is the only one that is extensively supported in parsed data delivery. Below you can find which particular SERP page fields we parse. Structure data is available with 搜索 (all the time) and 直接 (as long as SERP page URL is submitted).

Google Web Search ("source": "google_search") supports CSV output. To access it, please include these parameters in your Google Web Search job {"source": "google_search", "parse": true, "parser_type": "v2"}. The result retrieval URL for a CSV job is structured like this: http://data.oxylabs.io/v1/queries/{job_id}/results/normalized?format=csv.


搜索

Organic & Paid

Real-Time Crawler for Google Organic & Paid

"results": {
  "paid": [
    {
      "pos": 1,
      "url": "https://www.adidas.com/us",
      "desc": "New York · 10 locations nearby",
      "title": "adidas.com | adidas® Official Site | Official adidas® Online Store‎",
      "url_shown": "www.adidas.com/Official/Site",
      "pos_overall": 1
    }
  ],
  "organic": [
    {
      "pos": 1,
      "url": "https://www.adidas.com/us",
      "desc": "Welcome to adidas Shop for adidas shoes, clothing and view new collections for adidas Originals, running, football, training and much more.",
      "title": "adidas Official Website | adidas US",
      "url_shown": "https://www.adidas.com › ...",
      "pos_overall": 2
    },
    {
      "pos": 2,
      "url": "https://www.mena.adidas.com/",
      "desc": "Browse for adidas shoes, clothing and collections, adidas Originals, Running, Football, Training and more on the official adidas website.",
      "title": "adidas Official Website | adidas",
      "url_shown": "https://www.mena.adidas.com",
      "pos_overall": 6
    },
    {
      "pos": 3,
      "url": "https://www.adidas-group.com/",
      "desc": "adidas AG Supervisory Board announces candidates as shareholder ... adidas celebrates its 70th anniversary and the opening of the Arena building. August 9 ...",
      "title": "adidas - Home",
      "url_shown": "https://www.adidas-group.com",
      "pos_overall": 7
    },
    {
      "pos": 4,
      "url": "https://www.nycgo.com/shopping/the-adidas-store",
      "desc": "You don't so much shop in this flagship Adidas store as you experience it. With an interior modeled on a high school stadium, this four-story Midtown outlet—the  ...",
      "title": "The Adidas Store (Midtown) | NYCgo - NYCgo.com",
      "url_shown": "https://www.nycgo.com › shopping › the-adidas-store",
      "pos_overall": 8
    },
    {
      "pos": 5,
      "url": "https://www.yelp.com/search?find_desc=adidas+store&find_loc=Manhattan%2C+NY",
      "desc": "Reviews on Adidas Store in Manhattan, NY - Adidas, Adidas Originals New York SoHo, adidas Sport Performance, Upper 90 Soccer - Manhattan, Nike Soho, ...",
      "title": "Adidas Store Manhattan, NY - Last Updated August 2019 - Yelp",
      "url_shown": "https://www.yelp.com › search › find_desc=adidas+store",
      "pos_overall": 9
    },
    {
      "pos": 6,
      "url": "https://en.wikipedia.org/wiki/Adidas",
      "desc": "Adidas AG is a multinational corporation, founded and headquartered in Herzogenaurach, Germany, that designs and manufactures shoes, clothing and ...",
      "title": "Adidas - Wikipedia",
      "url_shown": "https://en.wikipedia.org › wiki › Adidas",
      "pos_overall": 10
    }
  ]

Product Listing Ads

Real-Time Crawler for Google Product Listing Ads

"pla": [
  {
    "pos": 1,
    "url": "http://www.adidas.com/us/asweego-shoes/F37038.html?cm_mmc=AdieSEM_Feeds-_-GoogleProductAds-_-NA-_-F37038&cm_mmca1=US&cm_mmca2=NA&kpid=F37038&sourceid=543457011",
    "price": "$40.00",
    "title": "adidas Asweego Shoes Black 10.5 - Mens Running Shoes",
    "seller": "adidas",
    "source": ""
  },
  {
    "pos": 2,
    "url": "http://www.adidas.com/us/baseline-shoes/AW4299.html?cm_mmc=AdieSEM_Feeds-_-GoogleProductAds-_-NA-_-AW4299&cm_mmca1=US&cm_mmca2=NA&kpid=AW4299&sourceid=543457011",
    "price": "$50.00",
    "title": "adidas Baseline Shoes White 13K - Originals Shoes",
    "seller": "adidas",
    "source": ""
  },
  ...
  {
    "pos": 29,
    "url": "https://www.zappos.com/product/8466374/color/21766",
    "price": "$79.95",
    "title": "adidas Superstar W Originals Women's Classic Shoes White/Black/White : 9 B - Medium",
    "seller": "Zappos.com",
    "source": ""
  }
]

Top Stories

Real-Time Crawler for Google Top Stories

"top_stories": [
  {
    "url": "https://www.cnet.com/news/spacex-starhopper-prototype-takes-giant-leap-for-elon-musk/",
    "source": "Cnet",
    "headline": "SpaceX Starhopper rocket prototype takes giant leap for Elon Musk",
    "timeframe": "13 hours ago"
  },
  {
    "url": "https://electrek.co/2019/08/27/elon-musk-tesla-china-made-model-3-rumor/",
    "source": "Electrek",
    "headline": "Elon Musk is rumored to unveil first China-made Tesla Model 3 at event this \nweek",
    "timeframe": "16 hours ago"
  },
  {
    "url": "https://www.bloomberg.com/news/articles/2019-08-28/musk-to-join-china-ai-summit-despite-trump-ordering-firms-out",
    "source": "Bloomberg",
    "headline": "Elon Musk and Jack Ma Will Debate AI at China Summit",
    "timeframe": "4 hours ago"
  }
]

Real-Time Crawler for Google Featured Snippet

"featured_snippet": [
  {
    "url": "https://en.wikipedia.org/wiki/Contract_for_difference",
    "desc": "In finance, a contract for difference (CFD) is a contract between two parties, typically described as \"buyer\" and \"seller\", stipulating that the seller will pay to the buyer the difference between the current value of an asset and its value at contract time (if the difference is negative, then the buyer pays instead to ...",
    "title": "Contract for difference - Wikipedia",
    "url_shown": "https://en.wikipedia.org › wiki › Contract_for_difference",
    "pos_overall": 1
  }
]

Knowledge Base

Real-Time Crawler for Google Knowledge Base

"knowledge": {
  "title": "Adidas",
  "factoids": [
    {
      "title": "Stock price",
      "content": "ADDDF (OTCMKTS) $291.81 +2.74 (+0.95%)Aug 23, 4:00 PM EDT - Disclaimer"
    },
    {
      "title": "Founder",
      "content": "Adolf Dassler"
    },
    {
      "title": "Founded",
      "content": "August 18, 1949, Herzogenaurach, Germany"
    },
    {
      "title": "Headquarters",
      "content": "Herzogenaurach, Germany"
    },
    {
      "title": "Subsidiaries",
      "content": "Reebok, Five Ten Footwear, Runtastic, Ashworth, MORE"
    },
    {
      "title": "Website",
      "content": "https://www.adidas.com/us"
    }
  ],
  "subtitle": "Design company",
  "description": "DescriptionAdidas AG is a multinational corporation, founded and headquartered in Herzogenaurach, Germany, that designs and manufactures shoes, clothing and accessories. It is the largest sportswear manufacturer in Europe, and the second largest in the world, after Nike. Wikipedia"
}

Local Pack

Real-Time Crawler for Google Local Pack

"local_pack": [
  {
    "links": [
      {
        "href": "https://www.adidas.com/us?utm_source=gmb&utm_medium=organic&utm_campaign=US470198_local",
        "title": "Website"
      },
      {
        "href": "#",
        "title": "Directions"
      }
    ],
    "phone": "",
    "title": "adidas Originals Flagship Store",
    "rating": 0,
    "address": "Open ⋅ Closes 7PM",
    "subtitle": "(212) 966-0954",
    "pos_overall": 3,
    "rating_count": 0
  }
]

Twitter Feed

Real-Time Crawler for Google Twitter Feed

"twitter": [
  {
    "pos": 1,
    "url": "https://twitter.com/elonmusk",
    "title": "Elon Musk (@elonmusk) · Twitter",
    "tweets": [
      {
        "url": "https://twitter.com/elonmusk/status/1166081488648949760?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet",
        "content": "Starhopper flight currently tracking to 5pm Texas time for 150m / ~500ft hover test",
        "timeframe": "11 hours ago"
      },
      {
        "url": "https://twitter.com/elonmusk/status/1165377786338406400?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet",
        "content": "Looks like @SpaceX Starhopper flight may be as soon as Monday. FAA support is much appreciated!",
        "timeframe": "2 days ago"
      },
      {
        "url": "https://twitter.com/elonmusk/status/1165371975528640512?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet",
        "content": "If you’re a utility or public utilities commission, please consider using the Tesla Megapack. Better for the environment & usually lower cost than fossil fuel peaker plants! www.tesla.com/megapack",
        "timeframe": "2 days ago"
      }
    ],
    "pos_overall": 1
  }
]

Job Listings

Real-Time Crawler for Google Job Listings

"jobs": {
  "listings": [
    {
      "title": "SR SOFTWARE DEVELOPER",
      "source": "via LinkedIn",
      "employer": "Jobs @ TheJobNetwork",
      "location": "Tulsa, OK",
      "extra_details": [
        "1 day ago",
        "Full-time"
      ]
    },
    {
      "title": "Autonomous Vehicle Simulation Software Engineer",
      "source": "via Built In Colorado",
      "employer": "Azevtec",
      "location": "United States",
      "extra_details": [
        "17 hours ago",
        "Full-time"
      ]
    },
    {
      "title": "Senior Software Engineer - Oracle Transportation Management",
      "source": "via LinkedIn",
      "employer": "XPO Logistics, Inc.",
      "location": "United States",
      "extra_details": [
        "21 hours ago",
        "Full-time"
      ]
    }
  ],
  "location_header": "Near United States"
}

Real-Time Crawler for Google Carousel

"item_carousel": {
  "items": [
    {
      "title": "Chris Evans",
      "subtitle": "Captain America"
    },
    {
      "title": "Mark Ruffalo",
      "subtitle": "Hulk"
    },
    {
      "title": "Tom Holland",
      "subtitle": "Spider-Man"
    },
    {
      "title": "Stan Lee",
      "subtitle": "Old Man in TV Report, Bus Driver"
    },
    {
      "title": "Chris Pratt",
      "subtitle": "Star-Lord"
    }
  ],
  "title": "The Avengers/Cast"
}

Images

Real-Time Crawler for Google Parsed data Images

"images": [
  {
    "alt": "Image result for contemporary wall clock",
    "href": "/search?q=contemporary+wall+clock&safe=off&hl=en&gl=US&tbm=isch&source=iu&ictx=1&fir=Qspcw8WiAmXYzM%253A%252C-m-5575uWYilbM%252C_&vet=1&usg=AI4_-kTGLIU9LAzoCJxO8gp7kK322MV8Yg&sa=X&ved=2ahUKEwjFy8rSy7HkAhWkDrkGHck7A24Q9QEwAXoECAkQBg#imgrc=Qspcw8WiAmXYzM:",
    "source": "https://www.allmodern.com/decor-pillows/sb0/wall-clocks-c429917.html"
  },
  {
    "alt": "Image result for contemporary wall clock",
    "href": "/search?q=contemporary+wall+clock&safe=off&hl=en&gl=US&tbm=isch&source=iu&ictx=1&fir=G0pFK8TQ91ls6M%253A%252Cr5nLxZQfxnA3MM%252C_&vet=1&usg=AI4_-kStPZh1tpSdQ5vTAZUIXwW4zThzQg&sa=X&ved=2ahUKEwjFy8rSy7HkAhWkDrkGHck7A24Q9QEwAnoECAkQCQ#imgrc=G0pFK8TQ91ls6M:",
    "source": "https://www.wayfair.com/decor-pillows/cat/modern-wall-clocks-c1869680.html"
  },
  ...
  {
    "alt": "Image result for contemporary wall clock",
    "href": "/search?q=contemporary+wall+clock&safe=off&hl=en&gl=US&tbm=isch&source=iu&ictx=1&fir=o4ZXIngZyr9HAM%253A%252C-m-5575uWYilbM%252C_&vet=1&usg=AI4_-kTIJMWyTs07HFcVKHTfTd6otLL82w&sa=X&ved=2ahUKEwjFy8rSy7HkAhWkDrkGHck7A24Q9QEwCnoECAkQIQ#imgrc=o4ZXIngZyr9HAM:",
    "source": "https://www.allmodern.com/decor-pillows/sb0/wall-clocks-c429917.html"
  }
]

Real-Time Crawler for Google Related Questions

"related_questions": [
  {
    "pos": 1,
    "question": "What does Adidas stand for?"
  },
  {
    "pos": 2,
    "question": "Is Adidas German?"
  },
  {
    "pos": 3,
    "question": "Are Jordans Adidas?"
  },
  {
    "pos": 4,
    "question": "What shoe brands does adidas own?"
  }
]

Shopping Search

Real-Time Crawler for Google Shopping Search2

...
"organic": [
            {
              "pos": 1,
              "url": "/aclk?sa=l&ai=DChcSEwju8fmd84jpAhUPTxgKHQshDIcYABAHGgJsZQ&sig=AOD64_1BTHVcnNzI5775j9xNkILrCU2KYA&ctype=5&q=&ved=0ahUKEwjpr_Sd84jpAhVI2aYKHYn1CeMQvxMI4wQ&adurl=",
              "type": "grid",
              "price": 85,
              "title": "Adidas Women's Swift Run Casual Shoes in White ...",
              "merchant": {
                "url": "/aclk?sa=l&ai=DChcSEwju8fmd84jpAhUPTxgKHQshDIcYABAHGgJsZQ&sig=AOD64_1BTHVcnNzI5775j9xNkILrCU2KYA&ctype=5&q=&ved=0ahUKEwjpr_Sd84jpAhVI2aYKHYn1CeMQg-UECOoE&adurl=",
                "name": "Finish Line"
              },
              "price_str": "$85.00.",
              "pos_overall": 1
            },
            {
              "pos": 2,
              "url": "/shopping/product/4092922174439754197?uule=w+CAIQICIXQ29sb3JhZG8sIFVuaXRlZCBTdGF0ZXM&q=adidas&prds=epd:6096059639745774212,paur:ClkAsKraX5cxKGk1E_r15f66xbFqydL47KoF9cO04jau1Hw_EeaJnz0EV5mb_JEjRlE5_m7N_B5Vg-krR5766rvdESfkczSSBqkGVDV7A5Ts8BlTUCNfpUxgtxIZAFPVH73vXbe47J5qGlzkfYH83D9zVPSv8w,prmr:1&sa=X&ved=0ahUKEwjpr_Sd84jpAhVI2aYKHYn1CeMQvxMI7AQ",
              "type": "grid",
              "price": 139.97,
              "title": "adidas Mens Alphaboost Training Shoes White ...",
              "merchant": {
                "url": "/aclk?sa=l&ai=DChcSEwju8fmd84jpAhUPTxgKHQshDIcYABAEGgJsZQ&sig=AOD64_3S0xuLlA1GOzNxCvYQdpeTLZkRyQ&ctype=5&q=&ved=0ahUKEwjpr_Sd84jpAhVI2aYKHYn1CeMQg-UECPQE&adurl=",
                "name": "Baseball Savings.com"
              },
              "price_str": "$139.97.",
              "pos_overall": 2
            },
...

Shopping Product

Real-Time Crawler for Google Shopping Product2

...
{
  "type": "Bundle",
  "items": [
    {
      "value": "Console Only",
      "selected": true,
      "available": true,
      "product_id": "5007040952399054528"
    },
    {
      "value": "Splatoon 2 Bundle",
      "available": false,
      "product_id": "6767220879106424425"
    },
    {
      "value": "Super Mario Odyssey Edition",
      "available": false,
      "product_id": "11634753303078094444"
    }
  ]
}
...

Shopping Product Pricing

Real-Time Crawler for Google Shopping Product Pricing2

"content": {
  "url": "https://www.google.com/shopping/product/5007040952399054528/online",
  "title": "Nintendo Switch with Joy-Con - 32 GB - Gray/Black",
  "rating": 4.5,
  "pricing": [
    {
      "price": 319.99,
      "seller": "Electronic Express",
      "details": "Free shipping",
      "currency": "$",
      "price_tax": 0,
      "price_total": 319.99,
      "seller_link": "/aclk?sa=l&ai=DChcSEwi9t9HqoJ7mAhVCXw0KHdyPBEYYABABGgJxYg&sig=AOD64_2gaL_J1BQ5J5PR-JazDM86N23Nww&adurl=&ctype=5&q=",
      "price_shipping": 0
    },
    {
      "price": 334.99,
      "seller": "ShopZodys",
      "details": "Arrives Dec 9 – 13",
      "currency": "$",
      "price_tax": 27.69,
      "price_total": 412.67,
      "seller_link": "/aclk?sa=l&ai=DChcSEwi9t9HqoJ7mAhVCXw0KHdyPBEYYABADGgJxYg&sig=AOD64_1Rqy4wxKvZXAaoX9FNDBy379EAAA&adurl=&ctype=5&q=",
      "price_shipping": 49.99
    }

参数值

用户代理

下载完整列表 用户代理类型 JSON 中的值 这里.

[
  {
    "user_agent_type":"桌面"、
    "描述":"随机桌面浏览器用户代理"
  },
  {
    "user_agent_type":"desktop_firefox"、
    "描述":"最新版桌面火狐浏览器的随机用户代理"。
  },
  {
    "user_agent_type":"desktop_chrome"、
    "description":"最新版桌面 Chrome 浏览器的随机用户代理"。
  },
  {
    "user_agent_type":"desktop_opera"、
    "description":"最新版本桌面 Opera 的随机用户代理"。
  },
  {
    "user_agent_type":"desktop_edge"、
    "description":"桌面边缘最新版本之一的随机用户代理"。
  },
  {
    "user_agent_type":"desktop_safari"、
    "description":"桌面 Safari 最新版本之一的随机用户代理"。
  },
  {
    "user_agent_type":"mobile"、
    "description":"随机移动浏览器用户代理"
  },
  {
    "user_agent_type":"mobile_android"、
    "description"(描述):"最新版本安卓浏览器的随机用户代理"。
  },
  {
    "user_agent_type":"mobile_ios"、
    "描述":"最新版本 iPhone 浏览器的随机用户代理"。
  },
  {
    "user_agent_type":"平板电脑"、
    "描述":"随机平板电脑浏览器用户代理"
  },
  {
    "user_agent_type":"tablet_android"、
    "描述":"最新版本安卓平板电脑的随机用户代理"。
  },
  {
    "user_agent_type":"tablet_ios"、
    "description":"最新版本 iPad 平板电脑的随机用户代理"。
  }
]

Locale

下载完整列表 地点 JSON 中的值 这里.

[  
   {  
      "locale":{  
         "en-ai":{  
            "description":"Anguilla - English",
            "domain":"com.ai"
         },
         "es-pr":{  
            "description":"Puerto Rico - Spanish",
            "domain":"com.pr"
         },
         ...
         "en-by":{  
            "description":"Belarus - English",
            "domain":"by"
         },
         "en-in":{  
            "description":"India - English",
            "domain":"co.in"
         }
      }
   }
]

Results Language

下载完整列表 results_language JSON 中的值 这里.

[
 {
   "results_language": "af",
   "language": "Afrikaans"
 },
 {
   "results_language": "ar",
   "language": "Arabic"
 },
 ...
 {
   "results_language": "vi",
   "language": "Vietnamese"
 }
]

Geo_location

There are a few ways you can use the 地理位置 parameter to get correctly-localized Google results.

  • Using Google’s Canonical Location Name. This is very straightforward. Just pass us one of the values found on the CSV download 这里. Example: “geo_location”: “New York,New York,United States”.
  • Using a state name. Strip the first part of a Google's Canonical Location Name and pass a 地理位置 value in a “State,Country” format. Works with United States, Australia, India and other countries with federated states. Example: “geo_location”: “California,United States”.
  • Using a country name. To get results localized for the geographical center point of a country, pass an official country name. Example: “geo_location”: “United Kingdom”.
  • Using coordinates and radius. To get hyperlocal search results (especially useful for searches such as “restaurants near me”), you can pass latitude, longitude and radius values. The following example passes the coordinates of Space Needle in Seattle, WA: “geo_location”: “lat: 47.6205, lng: -122.3493, rad: 25000”.

If you pass a misspelled 地理位置 parameter, chances are, either us or Google will interpret and correct it for you. Nonetheless, we recommend using the parameter structures outlined above, combined with the 地点 和 领域 parameters, to get the most accurate results.


账户状态

使用统计

您可以通过查询以下端点找到您的使用统计数据:

GET https://data.oxylabs.io/v1/stats

默认情况下,API 将返回所有时间的使用统计数据。添加 group_by=month 将返回月度统计数据,而 group_by=day 将返回每日数字。

该查询将返回所有时间的统计数据。您可以通过添加 group_by=day 或 group_by=month

curl --user user:pass1 'https://data.oxylabs.io/v1/stats'

输出示例

{
    "data": {
        "sources": [
            {
                "realtime_results_count": "90",
                "results_count": "10",
                "title": "google_hotels"
            },
            {
                "realtime_results_count": "19",
                "results_count": "87",
                "title": "google_search"
            }
        ]
    },
    "meta": {
        "group_by": null
    }
}

限制

以下终端将提供您的每月承诺信息以及已使用的金额:

GET https://data.oxylabs.io/v1/stats/limits
curl --user user:pass1 'https://data.oxylabs.io/v1/stats/limits'

输出示例

{
    "monthly_requests_commitment":4500000,
    "used_requests":985000
}

响应代码

代码 现状 说明
204 无内容 您正在尝试检索一项尚未完成的任务。
400 多种错误信息 请求结构错误,可能是参数拼写错误或值无效。响应体将显示更具体的错误信息。
401 未提供授权标头"/"授权标头无效"/"未找到客户端 缺少授权标头或登录凭证不正确。
403 禁止 您的帐户无法访问此资源。
404 未找到 您要查找的职位编号已不再可用。
429 请求太多 超出费率限制。请联系您的客户经理以提高限额。
500 未知错误 无法提供服务。
524 超时 无法提供服务。
612 未定义的内部错误 出了点问题,我们未能完成您提交的任务。您可以免费再试一次,因为我们不会向您收取任何费用。 有问题 工作如果还不行,请联系我们。
613 重试次数过多后出现故障 我们曾尝试清除您提交的作业,但在达到重试限制后放弃了。您可以免费再试一次,因为我们不会向您收取任何费用。 有问题 工作如果还不行,请联系我们。

Parsed data response codes:

代码 现状 说明
12000 Success The parsed content returned is full and there should be no missing or broken fields.
12002 Failure We couldn't parse the page entirely. There may be an issue with the target website changing its HTML structure.
12003 不支持 The web page you asked us to parse is not supported.
12004 Partial Success We were able to parse the majority of the page, but there are a few missing fields.
12005 Partial Success We were able to parse the majority of the page, but there might be some fields with default values because we could not find them in the HTML.
12006 Failure Unexpected error. Let us know you got this response and we'll check what went wrong.
12007 未知 Unknown parsed data status. The actual result could range anywhere from a complete failure to a total success.
12008 Failure Parsed content is missing.
12009 Failure Product not found. Check the URL you submitted.

云存储上传响应代码:

代码 现状 说明
10001 意外异常 发生了严重的错误。我们可能已经知道,并正在修复。无论如何,请告诉我们。
13000 上传成功 一切顺利!
13001 上传失败 我们无法上传您的工作结果。
13102 没有这样的道路 我们找不到这样名字的水桶。请仔细检查。
13103 拒绝访问 Bucket 没有所需的权限。要了解如何授予我们必要的权限,请参阅 这里.

参考资料

 


免责声明 这部分内容主要来自商家。如果商家不希望在我的网站上显示,请 联系我们 删除您的内容。

最后更新于 5 月 16, 2022

您推荐代理服务吗?

点击奖杯即可颁奖!

平均评分 5 /5.计票: 1

目前没有投票!成为第一个给本帖评分的人。

发表评论

您的电子邮箱地址不会被公开。 必填项已用*标注

滚动到顶部