欢迎光临
我们一直在努力

SEO-奇葩的百度蜘蛛在UA中没标注Baiduspider

原来写过一文章介绍怎么区分百度的真假蜘蛛, 网站优化-通过反查IP来识别真正的百度蜘蛛 文章中我详细的介绍了不光要通过日志的UA信息去判断还需要通过反查蜘蛛IP的方式去判断到底是真蜘蛛还是假蜘蛛,在最后面介绍了111.206.*.*这个段是百度蜘蛛负责抓取静态文件的,竟然在日志的UA中没有看到百度(Baiduspider)的信息,通过IP反查也查不到是百度的信息。

原来文章中说过我这人没事的时候喜欢通过tailf命令去动态查看网站日志,今天在查看日志的时候又发现了一个问题,那就是220.181.108.*这个段在抓取静态文件的时候,如图片,JS,CSS等文件时日志中的UA竟然也没有看到百度Baiduspider的信息,但通过IP反查可以确认是百度的蜘蛛,经过仔细的反查不过这个段是这样,123.125.71.*这个段也是同样的情况,在抓取静态文件是日志的UA中没有百度的信息,你说奇葩不奇葩,好多人抓取别人网站都把自己伪装中百度蜘蛛,而百度蜘蛛呢,抓取内容时竟然不想让别人知道他是百度蜘蛛。

  • 百度蜘蛛抓取日志
220.181.51.81 - - [12/Feb/2019:09:19:17 +0800] "GET /226.html HTTP/1.1" 200 34700 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1" "-"
220.181.108.95 - - [12/Feb/2019:09:19:24 +0800] "GET /static/api/js/share.js?cdnversion=430536 HTTP/1.1" 200 17068 "http://www.rsyncd.net/226.html" "Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1" "-"
123.125.71.95 - - [12/Feb/2019:09:19:28 +0800] "GET /wp-content/themes/dux/css/share.css HTTP/1.1" 200 1736 "http://www.rsyncd.net/226.html" "Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1" "-"
220.181.108.110 - - [12/Feb/2019:09:19:29 +0800] "GET /wp-content/themes/dux/img/share.png HTTP/1.1" 200 6932 "http://www.rsyncd.net/wp-content/themes/dux/css/share.css" "Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1" "-"
220.181.51.125 - - [12/Feb/2019:09:19:56 +0800] "GET /332.html HTTP/1.1" 200 36299 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1" "-"
216.244.66.238 - - [12/Feb/2019:09:20:14 +0800] "GET /robots.txt HTTP/1.1" 200 149 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)" "-"
123.125.67.144 - - [12/Feb/2019:09:20:35 +0800] "GET /190.html HTTP/1.1" 200 39199 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1" "-"
123.125.71.97 - - [12/Feb/2019:09:21:28 +0800] "GET /system/page/3 HTTP/1.1" 200 38589 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" "-"
220.181.108.185 - - [12/Feb/2019:09:29:23 +0800] "GET /wp-content/themes/dux/img/logo.png HTTP/1.1" 200 3782 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36" "-"
  • 通过host命令反查百度IP
[root@localhost ~]# host 220.181.51.125
Host 125.51.181.220.in-addr.arpa. not found: 3(NXDOMAIN)
[root@localhost ~]# host 220.181.108.110
110.108.181.220.in-addr.arpa domain name pointer baiduspider-220-181-108-110.crawl.baidu.com.
[root@localhost ~]# host 220.181.51.125 
Host 125.51.181.220.in-addr.arpa. not found: 3(NXDOMAIN)
[root@localhost ~]# host 123.125.67.144
Host 144.67.125.123.in-addr.arpa. not found: 3(NXDOMAIN)
[root@localhost ~]# host 123.125.71.97
97.71.125.123.in-addr.arpa domain name pointer baiduspider-123-125-71-97.crawl.baidu.com.
[root@localhost ~]# host 123.125.71.95
95.71.125.123.in-addr.arpa domain name pointer baiduspider-123-125-71-95.crawl.baidu.com.
[root@localhost ~]# host 220.181.108.185
185.108.181.220.in-addr.arpa domain name pointer baiduspider-220-181-108-185.crawl.baidu.com.

话说这么多,总结一下,就是百度蜘蛛在抓取静态资源的时候,网站日志的UA信息中并不一定带有百度(baiduspider)的信息。

赞(0) 打赏
原创文章转载请注明出处:爱编程 » SEO-奇葩的百度蜘蛛在UA中没标注Baiduspider
分享到: 更多

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址

爱编程、一个运维兼程序员的博客!

联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏