self_example/Spider/Chapter07_动态渲染页面爬取/seleniumLearning/demo13反屏蔽.py

34 lines
1.6 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# -*- encoding:utf-8 -*-
'''
@Author : dingjiawen
@Date : 2023/12/6 21:20
@Usage : 反屏蔽
@Desc : 现在很多网站增加了对Selenium的监测如果检测到Selenium打开浏览器就直接屏蔽
基本原理是监测当前浏览器窗口下的window.navigator对象中是否包含webdriver属性。
正常使用浏览器这个属性应该是undefined,一旦使用了Selenium就会给window.navigator设置webdriver属性
https://antispider1.scrape.center/ 就是使用了上述原理
'''
from selenium import webdriver
from selenium.webdriver import ChromeOptions
option = ChromeOptions()
option.add_experimental_option('excludeSwitches', ['enable-automation'])
option.add_experimental_option('useAutomationExtension', False)
browser = webdriver.Chrome(options=option)
# 无效,因为这是页面加载完毕之后才执行,但是页面渲染之前已经检测了
browser.execute_script('Object.defineProperty(navigator, "webdriver", {get: () => undefined})')
browser.get('https://antispider1.scrape.center/')
# 使用CDP(chrome开发工具协议)解决这个问题在每个页面刚加载的时候就执行JavaScript语句将webdriver置空
option = ChromeOptions()
option.add_experimental_option('excludeSwitches', ['enable-automation'])
option.add_experimental_option('useAutomationExtension', False)
browser = webdriver.Chrome(options=option)
browser.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument', {
'source': 'Object.defineProperty(navigator, "webdriver", {get: () => undefined})'
})
browser.get('https://antispider1.scrape.cuiqingcai.com/')