欢迎光临
我们一直在努力

网站优化-通过反查IP来识别真正的百度蜘蛛

经常在群里听一些人问,怎么正确识别百度蜘蛛呢,好像是有好多IP伪装成百度蜘蛛来访问我的站呀!我们都知道最简单的的一种判断百度蜘蛛的方法就是通过UA来查看,但由于UA信息可能是被伪造的,所以只能说通过UA这种方法去判断是不是百度的蜘蛛是很不准确的。那还有啥别的方法去准确判断这个IP到底是不是真实的百度蜘蛛呢?答案是有的,那就是通过域名反查。

[root@localhost nginx]# tailf logs/wp.log |grep Baiduspider
220.181.108.144 - - [02/Feb/2019:10:37:03 +0800] "GET /547.html HTTP/1.1" 200 36912 "-" "Mozilla/5.0 (Linux;u;Android 4.2.2;zh-cn;) AppleWebKit/534.46 (KHTML,like Gecko) Version/5.1 Mobile Safari/10600.6.3 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" "-"
111.206.198.32 - - [02/Feb/2019:10:37:19 +0800] "GET /static/api/js/share.js?cdnversion=430297 HTTP/1.1" 200 17068 "https://www.rsyncd.net/547.html" "Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1 (compatible; Baiduspider-render/2.0; +http://www.baidu.com/search/spider.html)" "-"
111.206.221.38 - - [02/Feb/2019:10:37:22 +0800] "GET /wp-content/themes/dux/img/qrcode.png HTTP/1.1" 200 15874 "https://www.rsyncd.net/547.html" "Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1 (compatible; Baiduspider-render/2.0; +http://www.baidu.com/search/spider.html)" "-"
220.181.108.123 - - [02/Feb/2019:10:39:46 +0800] "GET /system/linux/page/2 HTTP/1.1" 200 38786 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" "-"
220.181.108.185 - - [02/Feb/2019:11:00:37 +0800] "GET /235.html HTTP/1.1" 200 35512 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" "-"
111.206.198.93 - - [02/Feb/2019:11:00:41 +0800] "GET /static/api/js/share.js?cdnversion=430298 HTTP/1.1" 200 17068 "https://www.rsyncd.net/235.html" "Mozilla/5.0 (compatible; Baiduspider-render/2.0; +http://www.baidu.com/search/spider.html)" "-"
111.206.198.68 - - [02/Feb/2019:11:00:44 +0800] "GET /wp-content/themes/dux/img/qrcode.png HTTP/1.1" 200 15874 "https://www.rsyncd.net/235.html" "Mozilla/5.0 (compatible; Baiduspider-render/2.0; +http://www.baidu.com/search/spider.html)" "-"
123.125.71.109 - - [02/Feb/2019:11:21:28 +0800] "GET /system/linux/page/3 HTTP/1.1" 200 37578 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" "-"
220.181.108.169 - - [02/Feb/2019:12:03:10 +0800] "GET /system/linux/page/4 HTTP/1.1" 200 31461 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" "-"
123.125.71.16 - - [02/Feb/2019:12:24:01 +0800] "GET /system/linux/page/2 HTTP/1.1" 200 38786 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" "-"
123.125.71.31 - - [02/Feb/2019:12:45:36 +0800] "GET /links HTTP/1.1" 200 15708 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" "-"
27.148.162.10 - - [02/Feb/2019:12:46:19 +0800] "GET / HTTP/1.1" 200 10367 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" "-"

以上日志内容是我网站一段时间百度蜘蛛访问的日志,我们这样通过UA查看这些日志全都是百度蜘蛛到访产生的记录,但真实情况是这里面到底存不存在假蜘蛛呢!接下来我们通过域名反查看一下这些IP中有那些不是真正的百度蜘蛛,反查IP的指令在windows和Linux上面分别是二个指令,在Win上面我们用的指令是nslookup,而在Linux下面直接用host就可以,指令格式:指令(nslookup/host)+空格+IP地址,上面日志中的百度到访记录,总体分析是四个IP段,下面我们就一个个的验证一下,他们里面到底有没有假的百度蜘蛛。

  • Win7系统:左下角开始->附件->命令提示符
Microsoft Windows [版本 6.1.7601]
版权所有 (c) 2009 Microsoft Corporation。保留所有权利。

C:\Users\zhao>nslookup 220.181.108.185
服务器:  phicomm.me
Address:  192.168.2.1

名称:    baiduspider-220-181-108-185.crawl.baidu.com
Address:  220.181.108.185


C:\Users\zhao>nslookup 111.206.198.68
服务器:  phicomm.me
Address:  192.168.2.1

*** phicomm.me 找不到 111.206.198.68: Non-existent domain

C:\Users\zhao>nslookup 123.125.71.16
服务器:  phicomm.me
Address:  192.168.2.1

名称:    baiduspider-123-125-71-16.crawl.baidu.com
Address:  123.125.71.16


C:\Users\zhao>nslookup 27.148.162.10
服务器:  phicomm.me
Address:  192.168.2.1

*** phicomm.me 找不到 27.148.162.10: Non-existent domain

C:\Users\zhao>
  • Linux系统
[root@localhost ~]# host 220.181.108.185
185.108.181.220.in-addr.arpa domain name pointer baiduspider-220-181-108-185.crawl.baidu.com.
[root@localhost ~]# host 111.206.198.68
Host 68.198.206.111.in-addr.arpa. not found: 3(NXDOMAIN)
[root@localhost ~]# host 123.125.71.16
16.71.125.123.in-addr.arpa domain name pointer baiduspider-123-125-71-16.crawl.baidu.com.
[root@localhost ~]# host 27.148.162.10
Host 10.162.148.27.in-addr.arpa. not found: 3(NXDOMAIN)
[root@localhost ~]# 

百度官方说过,Baiduspider的hostname以 *.baidu.com 或 *.baidu.jp 的格式命名,非 *.baidu.com 或 *.baidu.jp 即为冒充,所以通过上面的反查IP,我们自然而然的一眼就看出谁是真的百度蜘蛛谁是假的百度蜘蛛了,不过随然百度这么说,上面的IP以我的了解111.206.*.*这个段好像是真的百度蜘蛛,主要用来抓取CSS,JS及图片等静态文件的, 这里不知道为什么反查不出百度来,另外27.148.162.10这个IP是站长工具的,你只要用站长工具来查询你网站的SEO信息,这个IP就会来抓取网站内容。

赞(0) 打赏
原创文章转载请注明出处:爱编程 » 网站优化-通过反查IP来识别真正的百度蜘蛛
分享到: 更多

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址

爱编程、一个运维兼程序员的博客!

联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏