代码如下(示例):
private static String baiduTranslate(String q, String from, String to) throws IOException, ScriptException {
String mainpage_url = "https://fanyi.baidu.com/";
Document document = Jsoup.connect(mainpage_url).cookie("Cookie", "BAIDUID=1D8BC57A03641735D0F46872B391F36B; PSTM=1621752923; __yjs_duid=1_73eebc74c04c0586214b0074041092b91621754117386; REALTIME_TRANS_SWITCH=1; HISTORY_SWITCH=1; FANYI_WORD_SWITCH=1; SOUND_SPD_SWITCH=1; SOUND_PREFER_SWITCH=1; BDUSS=U1dmtTdlBaMG1MTHlQZWNkZnVCQm5vOHozVmdHdWcwajlzRjJBZS1-cU5WSEpoRVFBQUFBJCQAAAAAAAAAAAEAAABayrGns7257bntue0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAI3HSmGNx0phbX; BDUSS_BFESS=U1dmtTdlBaMG1MTHlQZWNkZnVCQm5vOHozVmdHdWcwajlzRjJBZS1-cU5WSEpoRVFBQUFBJCQAAAAAAAAAAAEAAABayrGns7257bntue0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAI3HSmGNx0phbX; BAIDUID=D1117626448036AD2AF919EC711025C3:FG=1; APPGUIDE_10_0_2=1; BDSFRCVID=P1IOJeC62w0oC0cHg4qyuRZb6V5Z9OQTH6aoVUmNkwmru95RKuk4EG0PhU8g0K4bGxQJogKKL2OTHmuF_2uxOjjg8UtVJeC6EG0Ptf8g0f5; H_BDCLCKID_SF=tJPJVI82tCD3fP36qRbsMJ8thl63-4oX2TTKWjrJaDvaMKJOy4oTj6j30l3Mql37MI6Qo454yJ_-OMQp5UQj3MvB-fnlXJoUWGFHLU7lWpTpEI3OQft20MkEeMtjBMoaBGvILR7jWhvdhl72y-chQlRX5q79atTMfNTJ-qcH0KQpsIJM5-DWbT8IjHCeJ6KfJJ4DoIv5b-0_HRT1Mt5Eh-cH-UnLqh_L02OZ0l8Ktt02DIjnhx7JjMFN5J5z5j5h-jTh2UomWIQHDUoXDfTI3TkDQnLfQfnt2aR4KKJx2UKWeIJoj-5n2h_phUJiBMAHBan7W45IXKohJh7FM4tW3J0ZyxomtfQxtNRJ0DnjtnLhbC8lj6t-D5oQepJf-K6a2CJ03JTs26rjDnCr05QzXUI8LNDH-5Oy0bR2an02-4ThVxcPjlDhW6Fg0JO7ttoyQHTL2Jv5a4ohbD5-ynoOjML1Db33L6vMtg0J3q3yLlcoepvoX55c3MkD5tjdJJQOBKQB0KnGbUQkeq8CQft20b0EeMtjKjLEK5r2SCKKJC3P; BDSFRCVID_BFESS=P1IOJeC62w0oC0cHg4qyuRZb6V5Z9OQTH6aoVUmNkwmru95RKuk4EG0PhU8g0K4bGxQJogKKL2OTHmuF_2uxOjjg8UtVJeC6EG0Ptf8g0f5; H_BDCLCKID_SF_BFESS=tJPJVI82tCD3fP36qRbsMJ8thl63-4oX2TTKWjrJaDvaMKJOy4oTj6j30l3Mql37MI6Qo454yJ_-OMQp5UQj3MvB-fnlXJoUWGFHLU7lWpTpEI3OQft20MkEeMtjBMoaBGvILR7jWhvdhl72y-chQlRX5q79atTMfNTJ-qcH0KQpsIJM5-DWbT8IjHCeJ6KfJJ4DoIv5b-0_HRT1Mt5Eh-cH-UnLqh_L02OZ0l8Ktt02DIjnhx7JjMFN5J5z5j5h-jTh2UomWIQHDUoXDfTI3TkDQnLfQfnt2aR4KKJx2UKWeIJoj-5n2h_phUJiBMAHBan7W45IXKohJh7FM4tW3J0ZyxomtfQxtNRJ0DnjtnLhbC8lj6t-D5oQepJf-K6a2CJ03JTs26rjDnCr05QzXUI8LNDH-5Oy0bR2an02-4ThVxcPjlDhW6Fg0JO7ttoyQHTL2Jv5a4ohbD5-ynoOjML1Db33L6vMtg0J3q3yLlcoepvoX55c3MkD5tjdJJQOBKQB0KnGbUQkeq8CQft20b0EeMtjKjLEK5r2SCKKJC3P; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; delPer=0; PSINO=3; BAIDUID_BFESS=D1117626448036AD2AF919EC711025C3:FG=1; Hm_lvt_64ecd82404c51e03dc91cb9e8c025574=1642991662,1643016044,1643016289,1643016293; H_PS_PSSID=35104_31254_35776_34584_35491_35797_35318_26350_35746; BA_HECTOR=258haka50ga42h00ib1guvce50r; Hm_lpvt_64ecd82404c51e03dc91cb9e8c025574=1643101993; ab_sr=1.0.1_MTg0OTU5NzFjYzk5NzFhOTIwMWRmOTkyNjc0MjIxZDhmY2UxYjVhZTRjZjljNzJmODdiYWUxYzc3ZTcyYzQzYTVkZmZmMTk0YzBmODQ4YTRlOTUwYzZjYjNhNDI4ZDg5ZTY3MTExNWVmNGZiMWE2YmU3MWQwYTcxZTY2ZmJlYmE=").timeout(10000).get();
Elements elements = document.getElementsByTag("script");
String jscode ="var window={};try{";
for (Element element : elements) {
String data = element.data();
if (data.startsWith("window") || data.startsWith("\nwindow") ) {
jscode += data + ";";
jscode = jscode.substring(0, jscode.length() -1);
jscode = jscode.replace("window.top.location.href = 'https://fanyi.baidu.com/';", "");
jscode += "}catch(e){}";
ScriptEngine engine = new ScriptEngineManager().getEngineByName("js");
engine.eval(jscode);
Map window = new HashMap();
if (engine instanceof Invocable) {
window = (Map) engine.get("window");
String token = (String) ((Map) window.get("common")).get("token");
String gtk = (String) window.get("gtk");
String baiduUrl = "https://fanyi.baidu.com/v2transapi";
String sign = token(q, gtk);
Map<String, String> params = new HashMap<String, String>();
params.put("from", from);
params.put("to", to);
params.put("query", q);
params.put("transtype", "translang");
params.put("simple_means_flag", "3");
params.put("sign", sign);
params.put("token", token);
params.put("domain", "common");
CloseableHttpClient httpClient = HttpClients.createDefault();
HttpPost request = new HttpPost(baiduUrl);
List<NameValuePair> paramList = new ArrayList<>();
Set<String> keySet = params.keySet();
for (String key : keySet) {
paramList.add( new BasicNameValuePair(key, params.get(key)));
request.setEntity(new UrlEncodedFormEntity(paramList, "UTF-8"));
request.setHeader("Cookie", "BIDUPSID=1D8BC57A03641735D0F46872B391F36B; PSTM=1621752923; __yjs_duid=1_73eebc74c04c0586214b0074041092b91621754117386; REALTIME_TRANS_SWITCH=1; HISTORY_SWITCH=1; FANYI_WORD_SWITCH=1; SOUND_SPD_SWITCH=1; SOUND_PREFER_SWITCH=1; BDUSS=U1dmtTdlBaMG1MTHlQZWNkZnVCQm5vOHozVmdHdWcwajlzRjJBZS1-cU5WSEpoRVFBQUFBJCQAAAAAAAAAAAEAAABayrGns7257bntue0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAI3HSmGNx0phbX; BDUSS_BFESS=U1dmtTdlBaMG1MTHlQZWNkZnVCQm5vOHozVmdHdWcwajlzRjJBZS1-cU5WSEpoRVFBQUFBJCQAAAAAAAAAAAEAAABayrGns7257bntue0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAI3HSmGNx0phbX; BAIDUID=D1117626448036AD2AF919EC711025C3:FG=1; APPGUIDE_10_0_2=1; BDSFRCVID=P1IOJeC62w0oC0cHg4qyuRZb6V5Z9OQTH6aoVUmNkwmru95RKuk4EG0PhU8g0K4bGxQJogKKL2OTHmuF_2uxOjjg8UtVJeC6EG0Ptf8g0f5; H_BDCLCKID_SF=tJPJVI82tCD3fP36qRbsMJ8thl63-4oX2TTKWjrJaDvaMKJOy4oTj6j30l3Mql37MI6Qo454yJ_-OMQp5UQj3MvB-fnlXJoUWGFHLU7lWpTpEI3OQft20MkEeMtjBMoaBGvILR7jWhvdhl72y-chQlRX5q79atTMfNTJ-qcH0KQpsIJM5-DWbT8IjHCeJ6KfJJ4DoIv5b-0_HRT1Mt5Eh-cH-UnLqh_L02OZ0l8Ktt02DIjnhx7JjMFN5J5z5j5h-jTh2UomWIQHDUoXDfTI3TkDQnLfQfnt2aR4KKJx2UKWeIJoj-5n2h_phUJiBMAHBan7W45IXKohJh7FM4tW3J0ZyxomtfQxtNRJ0DnjtnLhbC8lj6t-D5oQepJf-K6a2CJ03JTs26rjDnCr05QzXUI8LNDH-5Oy0bR2an02-4ThVxcPjlDhW6Fg0JO7ttoyQHTL2Jv5a4ohbD5-ynoOjML1Db33L6vMtg0J3q3yLlcoepvoX55c3MkD5tjdJJQOBKQB0KnGbUQkeq8CQft20b0EeMtjKjLEK5r2SCKKJC3P; BDSFRCVID_BFESS=P1IOJeC62w0oC0cHg4qyuRZb6V5Z9OQTH6aoVUmNkwmru95RKuk4EG0PhU8g0K4bGxQJogKKL2OTHmuF_2uxOjjg8UtVJeC6EG0Ptf8g0f5; H_BDCLCKID_SF_BFESS=tJPJVI82tCD3fP36qRbsMJ8thl63-4oX2TTKWjrJaDvaMKJOy4oTj6j30l3Mql37MI6Qo454yJ_-OMQp5UQj3MvB-fnlXJoUWGFHLU7lWpTpEI3OQft20MkEeMtjBMoaBGvILR7jWhvdhl72y-chQlRX5q79atTMfNTJ-qcH0KQpsIJM5-DWbT8IjHCeJ6KfJJ4DoIv5b-0_HRT1Mt5Eh-cH-UnLqh_L02OZ0l8Ktt02DIjnhx7JjMFN5J5z5j5h-jTh2UomWIQHDUoXDfTI3TkDQnLfQfnt2aR4KKJx2UKWeIJoj-5n2h_phUJiBMAHBan7W45IXKohJh7FM4tW3J0ZyxomtfQxtNRJ0DnjtnLhbC8lj6t-D5oQepJf-K6a2CJ03JTs26rjDnCr05QzXUI8LNDH-5Oy0bR2an02-4ThVxcPjlDhW6Fg0JO7ttoyQHTL2Jv5a4ohbD5-ynoOjML1Db33L6vMtg0J3q3yLlcoepvoX55c3MkD5tjdJJQOBKQB0KnGbUQkeq8CQft20b0EeMtjKjLEK5r2SCKKJC3P; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; delPer=0; PSINO=3; BAIDUID_BFESS=D1117626448036AD2AF919EC711025C3:FG=1; Hm_lvt_64ecd82404c51e03dc91cb9e8c025574=1642991662,1643016044,1643016289,1643016293; Hm_lpvt_64ecd82404c51e03dc91cb9e8c025574=1643102040; ab_sr=1.0.1_N2Y5ZGI2ZTE4MGIxMWQxNmU4MDIyMzRhMjg5NTg4MzRkZDk2Njc3MmY0M2Q3NDFkYzY1MzdlZWEzOGE1MmZkOWFjMDU0OGMxZmU4MDFiZmJiZDVhMDIwODRmNWY0YWY3ZTZiZmUzNGQ1MmMyNTQ4YjIyMWUwM2UyZTY2Mzg3YmU=; H_PS_PSSID=35104_31254_35776_34584_35491_35797_35318_26350_35746; BA_HECTOR=2k2kah210hag8004dr1guvj1c0q");
request.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36");
CloseableHttpResponse response = httpClient.execute(request);
HttpEntity entity = response.getEntity();
String result = EntityUtils.toString(entity, "utf-8");
JSONObject jsonObject = JSONObject.parseObject(result);
JSONObject object = (JSONObject) jsonObject.get("trans_result");
JSONArray object1 = (JSONArray) object.get("data");
JSONObject object2 = (JSONObject)object1.get(0);
String dst = (String) object2.get("dst");
EntityUtils.consume(entity);
response.getEntity().getContent().close();
response.close();
return dst;
private static String token(String value, String gtk) {
String result = "";
ScriptEngine engine = new ScriptEngineManager().getEngineByName("js");
try {
FileReader reader = new FileReader("C:\\NC\\js.js");
engine.eval(reader);
if (engine instanceof Invocable) {
Invocable invoke = (Invocable) engine;
result = String.valueOf(invoke.invokeFunction("token", value, gtk));
} catch (Exception e) {
e.printStackTrace();
return result;
public static void main(String[] args) throws IOException, ScriptException {
String s = baiduTranslate("你好", "zh", "jp");
System.out.println(s);
第一次写文章
文章目录前言一、分析一下二、使用步骤1.引入库2.读入数据总结前言提示:这里可以添加本文要记录的大概内容:例如:随着人工智能的不断发展,机器学习这门技术也越来越重要,很多人都开启了学习机器学习,本文就介绍了机器学习的基础内容。提示:以下是本篇文章正文内容,下面案例可供参考一、分析一下通过在谷歌浏览器上F12打开百度翻译分析其url发现返回的是一个html发现其中有二、使用步骤1.引入库代码如下(示例):import numpy as npimport pandas as pd
1 <dependency>
2 <groupId>org.apache.httpcomponents</groupId>
3 <artifactId>httpclient</artifactId>
4 <version>...
第一步、创建一个客户端,类似于你用浏览器打开一个网页
HttpClient httpClient = new HttpClient();
第二步、创建一个GET方法,用来获取到你需要抓取的网页URL
GetMethod getMethod = new GetMethod("http:
该代码可以直接获取网页全部内容,且能够很好解决中文的乱码问题。
//获取url 返回字符串 解决中文乱码问题
public static String Gget(String url1) {
CloseableHttpClient httpclient = HttpClients.createDefault();
String context = "";
try {
URL url = new URL(url1
目录如何确定url如何确定请求头如何找到参数对应的值百度翻译的源代码
在我刚学爬虫,爬取百度翻译时,就是因为找不到sign值而苦恼了一阵子,网上的资料是乱七八糟,要么直接就是放结果,要么就是自己说不明白。没有好好的分析一下sign值的由来。因此,我写了这篇博客来将我怎么找到sign值的经验分享给大家。让大家少走弯路。
如何确定url
确定url这一块是进行爬虫的第一步,确定URL的最简单的方法就是:找到你要下载资源的页面–>按F12–>点击network—>点击XHR(xml and ht
httpClient介绍
HttpClient 是Apache Jakarta Common 下的子项目,可以用来提供高效的、最新的、功能丰富的支持 HTTP 协议的客户端编程工具包,并且它支持 HTTP 协议最新的版本和建议。
HTTP 协议可能是现在 Internet 上使用得最多、最重要的协议了,越来越多的 Java 应用程序需要直接通过 HTTP 协议来访问网络资源。虽然在 JDK 的 j...
client = HTTPClient()
response = client.fetch('https://www.baidu.com/')
print(response.body)
client.close()
b'<html>\r\n<head>\r\n\t<sc...
public class BaiduSpider {
public static void main(String[] args) throws Exception {
// 要爬取的关键词
String keyword = "java";
// 构造百度搜索 URL
String baiduSearchUrl = "https://www.baidu.com/s?wd=" + keyword;
// 创建 URL 对象
URL url = new URL(baiduSearchUrl);
// 打开连接
URLConnection connection = url.openConnection();
// 设置 User-Agent 头部,模拟浏览器发出的请求
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Safari/537.36");
// 使用 BufferedReader 读取网页源代码
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line;
while ((line = reader.readLine()) != null) {
// 输出网页源代码
System.out.println(line);
reader.close();
这个爬虫的原理是:使用 Java 的 `URL` 类打开百度搜索 URL,并使用 `URLConnection` 发出 HTTP 请求。然后使用 `BufferedReader` 读取网页源代码并输出到控制台。
这只是一个简单的例子,实际上百度爬虫还有很多其他的功能,比如:
- 自动翻页,爬取