[Python爬蟲教學]3個建構Python動態網頁爬蟲重要的等待機制

Photo by Agefis on Unsplash

使用Python Selenium套件來開發動態網頁爬蟲時，有一個非常重要的觀念，就是「等待(Waits)」，這是什麼意思呢?簡單來說，就是Python爬蟲程式「等待(Waits)」網頁載入所要使用的元素，進而執行其它的操作。如果沒有處理好，就會時常發例外錯誤或影響執行效率。

各位一定會想說，為什麼Python爬蟲程式要「等待(Waits)」?舉例來說，在開啟一個瀏覽器並且連結到某一個網站時，可以看到瀏覽器標題旁邊會有轉圈圈的圖示，代表網頁正在載入內容元素，在這個圖示消失前，網頁不會顯示所有的內容，這時候如果Python爬蟲程式沒有進行「等待(Waits)」的動作，就執行元素定位及操作，可能就會發生因為要使用的元素還沒載入完成，而產生例外錯誤。

所以，適當的使用「等待(Waits)」機制，可以讓Python爬蟲程式更為穩定，本文將以PChome網站為例，來說明常見的3種「等待(Waits)」機制，包含：

sleep(強制等待)
Implicit Waits(隱含等待)
Explicit Waits(明確等待)

而其中的sleep(強制等待)為Python內建的time模組所提供，Implicit Waits(隱含等待)及Explicit Waits(明確等待)則是Selenium套件提供。

一、sleep(強制等待)

強制程式碼停止執行所給定的時間，不論元素是否存在或網頁提早載入完成，都需等待給定的時間，才往下執行，並且每一次需要等待時，就要設定一次，如下範例：

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import time

browser = webdriver.Chrome(ChromeDriverManager().install())
browser.get("https://shopping.pchome.com.tw/")  # 前往PChome網站

time.sleep(20)  # 強制等待20秒

search_input = browser.find_element_by_id("keyword")  # 查詢文字框
search_input.send_keys("藍芽耳機")  # 輸入文字

執行範例程式後，可以看到「查詢文字框」元素都已經提前載入完成了，程式碼還是會等滿20秒，才會往下執行，而且，下一次要等待時，又要再設定一次。

當給定的等待時間內，找不到所定位的元素時，則會產生例外錯誤，如下範例：

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import time

browser = webdriver.Chrome(ChromeDriverManager().install())
browser.get("https://shopping.pchome.com.tw/")  # 前往PChome網站

time.sleep(20)  # 強制等待20秒

search_input = browser.find_element_by_id("hello")  # 查詢文字框
search_input.send_keys("藍芽耳機")  # 輸入文字

執行結果

二、Implicit Waits(隱含等待)

相較於sleep(強制等待)，Implicit Waits(隱含等待)同樣能夠設定程式碼的等待時間，不過，如果「整個網頁」提早載入完成，就會往下執行，這邊要特別注意的是，就算要尋找的元素已經載入完成，還是需等待「整個網頁」載入完成，才會往下執行。

並且，Implicit Waits(隱含等待)的等待時間是全域性的，也就是說，只要設定一次，之後所有元素的等待方式都適用，這也使得如果有其它元素需較長的等待時間時，較不易客製化，如下範例：

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

browser = webdriver.Chrome(ChromeDriverManager().install())
browser.get("https://shopping.pchome.com.tw/")  # 前往PChome網站

browser.implicitly_wait(20)  # 隱含等待20秒

search_input = browser.find_element_by_id("keyword")  # 查詢文字框
search_input.send_keys("藍芽耳機")  # 輸入文字

search_button = browser.find_element_by_id("doSearch")  # 找商品按鈕
search_button.click()  # 點擊

titles = browser.find_elements_by_class_name("prod_name")  # 取得商品標題
for title in titles:
    print(title.text)

同樣的程式碼，利用Selenium套件的Implicit Waits(隱含等待)，在20秒前，要尋找的「查詢文字框」元素已載入完成，程式碼就往下執行，並且在之後導向到商品頁後，無需再設定一次Implicit Waits(隱含等待)，瀏覽器最長就會等待20秒，如果「整個網頁」提前載入完成，就往下執行。

如果在給定的Implicit Waits(隱含等待)時間內，找不到所定位的元素時，同樣會產生例外錯誤，如下範例：

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

browser = webdriver.Chrome(ChromeDriverManager().install())
browser.get("https://shopping.pchome.com.tw/")  # 前往PChome網站

browser.implicitly_wait(20)  # 隱含等待20秒

search_input = browser.find_element_by_id("hello")  # 查詢文字框
search_input.send_keys("藍芽耳機")  # 輸入文字

執行結果

三、Explicit Waits(明確等待)

提供了until()及until_not()方法(Method)，語法說明如下：

until()：符合指定的等待條件。

WebDriverWait(driver, 等待的最長時間, 檢查條件的頻率, 忽略的例外類別).until(expected_conditions條件, 超時例外的錯誤訊息)

until_not()：不符合指定的等待條件。

WebDriverWait(driver, 等待的最長時間, 檢查條件的頻率, 忽略的例外類別).until_not(expected_conditions條件, 超時例外的錯誤訊息)

Explicit Waits(明確等待)可以針對特定的元素，設定等待的最長時間及條件，當符合所設定的等待條件，程式碼就會往下執行，反之，如果不符合條件，則會一直等待到所給的最長時間，如果還是不符合條件，則會產生TimeoutException，如下範例：

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By


browser = webdriver.Chrome(ChromeDriverManager().install())
browser.get("https://shopping.pchome.com.tw/")  # 前往PChome網站

# 明確等待
locator = (By.ID, "keyword")  # 定位器
search_input = WebDriverWait(browser, 10).until(
    EC.presence_of_element_located(locator),
    "找不到指定的元素"
)

search_input.send_keys("藍芽耳機")  # 輸入文字

範例中，首先引用Selenium套件的WebDriverWait、expected_condition及By模組，接著，在第13行的地方，使用Explicit Waits(明確等待)，只要until()符合「ID為keyword的元素出現在HTML原始碼中(presence_of_element_located)」條件，就會往下執行，輸入藍芽耳機關鍵字，否則最長等待到10秒，如果還是不符合條件的話，就會產生自訂的TimeoutException例外錯誤訊息，如下範例：

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By


browser = webdriver.Chrome(ChromeDriverManager().install())
browser.get("https://shopping.pchome.com.tw/")  # 前往PChome網站

# 明確等待
locator = (By.ID, "hello")  # 定位器
search_input = WebDriverWait(browser, 10).until(
    EC.presence_of_element_located(locator),
    "找不到指定的元素"
)

search_input.send_keys("藍芽耳機")  # 輸入文字

執行結果

相較於Implicit Waits(隱含等待)，Explicit Waits(明確等待)能夠針對特定元素來進行等待，擁有非常高的客製化彈性，只要一符合條件，程式碼就會往下執行，無需等待「整個網頁」載入完成，也是開發Python動態網頁爬蟲時，建議使用的元素等待方式。

四、Expected Conditions(預期條件)

知道Selenium套件的Explicit Waits(明確等待)使用方式後，各位一定會想問有哪些條件是可以進行指定的?這邊整理了幾個利用Selenium套件開發Python動態網頁爬蟲時，常用的Expected Conditions(預期條件)類別，如下：

title_is(title)

檢查頁面的HTML原始碼中，<title>標籤的標題是否與傳入的標題完全符合。

title = WebDriverWait(self.driver, 10).until(
	EC.title_is("PChome 線上購物")
)
print(title)  # True

title_contains(title)

檢查頁面的HTML原始碼中，<title>標籤的標題是否包含傳入的標題。

title = WebDriverWait(self.driver, 10).until(
	EC.title_contains("PChome")
)
print(title)  # True

presence_of_element_located(locator)

檢查定位的單一元素是否存在於HTML原始碼中。

locator = (By.ID, "keyword")

keyword = WebDriverWait(self.driver, 10).until(
	EC.presence_of_element_located(locator)
)
keyword.send_keys("藍芽耳機")

presence_of_all_elements_located(locator)

檢查定位的元素是否至少有一個存在於HTML原始碼中。

locator = (By.LINK_TEXT, "24h購物")

link_text = WebDriverWait(self.driver, 10).until(
	EC.presence_of_all_elements_located(locator)
)
print(link_text)  # 符合定位條件的所有元素

invisibility_of_element_located(locator)

檢查定位的元素是否不存在於HTML原始碼中或無法在網頁上被看見。

locator = (By.ID, "hello")

non_exist = WebDriverWait(self.driver, 10).until(
	EC.invisibility_of_element_located(locator)
)
print(non_exist) #True

visibility_of_element_located(locator)

檢查定位的元素是否存在於HTML原始碼中且可以在網頁上被看見。

locator = (By.ID, "keyword")

keyword = WebDriverWait(self.driver, 10).until(
	EC.visibility_of_element_located(locator)
)
print(keyword)  # True

visibility_of(element)

檢查已知存在於HTML原始碼中的元素是否可以在網頁上被看見。

element = self.driver.find_element_by_id("keyword")

keyword = WebDriverWait(self.driver, 10).until(
	EC.visibility_of(element)
)
keyword.send_keys("藍芽耳機")

text_to_be_present_in_element_value(locator, text)

檢查定位的元素，value屬性值是否為傳入的文字。

locator = (By.ID, "doSearch")

search = WebDriverWait(self.driver, 10).until(
	EC.text_to_be_present_in_element_value(locator, "找商品")
)
print(search)  # True

element_to_be_clickable(locator)

檢查定位的元素是否在網頁上可被看見且可點擊的。

locator = (By.ID, "doSearch")

search = WebDriverWait(self.driver, 10).until(
	EC.element_to_be_clickable(locator)
)
print(search)  # True

alert_is_present()

檢查頁面的提示視窗(alert)是否存在。

WebDriverWait(self.driver, 10).until(
	EC.alert_is_present(),
	"彈出視窗不存在"
)

五、小結

以上就是利用實際的Python動態網頁爬蟲為例，來瞭解3種「等待(Waits)」機制的重要觀念，如果能夠善用其中的特性，在開發Python動態網頁爬蟲時，除了能夠更加得心應手外，程式碼也會更加穩定。另外，詳細的實作內容可以參考以下的GitHub網址，希望對正在學習的大家有幫助。

如果您喜歡我的文章，請幫我按五下Like(使用Google或Facebook帳號免費註冊)，支持我創作教學文章，回饋由LikeCoin基金會出資，完全不會花到錢，感謝大家。

GitHub網址：https://github.com/mikeku1116/python-pchome-scraper

有想要看的教學內容嗎?歡迎利用以下的Google表單讓我知道，將有機會成為教學文章，分享給大家😊

https://forms.gle/UW8u9XddoY17HjaSA

Python學習資源

Python學習資源整理

Python網頁爬蟲推薦課程

Python網頁爬蟲－BeautifulSoup教學

[Python爬蟲教學]7個Python使用BeautifulSoup開發網頁爬蟲的實用技巧

Python網頁爬蟲－Selenium教學

Python非同步網頁爬蟲

Python網頁爬蟲應用

Python網頁爬蟲部署

[Python爬蟲教學]教你如何部署Python網頁爬蟲至Heroku雲端平台

Python網頁爬蟲資料儲存

Python網頁爬蟲技巧

留言

匿名2020年6月14日下午2:50
清楚明瞭，淺顯易懂的教學，真的很棒
不得不給作者一個讚~
還很貼心的附上了程式碼
回覆刪除
回覆
pocketman2021年2月4日下午6:52
寫得很好!拜讀了!
回覆刪除
回覆
RTC workbook2025年8月10日晚上10:42
不錯，提供python爬蟲的等待機制
回覆刪除
回覆

新增留言

你的Py教練Mike

搜尋此網誌

[Python爬蟲教學]3個建構Python動態網頁爬蟲重要的等待機制

一、sleep(強制等待)

二、Implicit Waits(隱含等待)

三、Explicit Waits(明確等待)

四、Expected Conditions(預期條件)

五、小結

標籤

留言

張貼留言

這個網誌中的熱門文章

[Pandas教學]資料分析必懂的Pandas DataFrame處理雙維度資料方法

[Python教學]搞懂5個Python迴圈常見用法

[Python爬蟲教學]7個Python使用BeautifulSoup開發網頁爬蟲的實用技巧

[Python物件導向]淺談Python類別(Class)

[Python教學]5個必知的Python Function觀念整理

[Pandas教學]5個實用的Pandas讀取Excel檔案資料技巧

[Python+LINE Bot教學]6步驟快速上手LINE Bot機器人

[Python教學]Python Lambda Function應用技巧分享

[Python爬蟲教學]整合Python Selenium及BeautifulSoup實現動態網頁爬蟲

Visual Studio Code Python環境建置

取得最新發佈的免費Python教學