[Python爬蟲教學]一學就會的Python網頁爬蟲動態讀取資料庫應用

Photo by CardMapr.nl on Unsplash

Python網頁爬蟲在日常生活中有非常多的應用，股票分析就是其中之一，利用Python網頁爬蟲自動化爬取的特性，蒐集所需的各個公司股價資訊。但是隨著經濟的變動，關注的股票代碼時常會進行調整，這時候，該如何讓Python網頁爬蟲有彈性的讀取股票代碼就很重要。

而資料庫就是實務上最常使用的資料儲存工具，本文就以SQLite資料庫為例，來和大家分享Python網頁爬蟲如何動態讀取資料庫中所要分析的股票代碼資料，來爬取臺灣證券交易所的個股日成交資訊。其中的實作步驟包含：

分析網頁結構
開發Python網頁爬蟲
建置SQLite資料庫
Python網頁爬蟲讀取資料庫

一、分析網頁結構

前往臺灣證券交易所的個股日成交資訊，輸入股票代碼，可以看到如下圖的畫面：

截取自臺灣證券交易所個股日成交資訊https://www.twse.com.tw/zh/page/trading/exchange/STOCK_DAY.html

由於查詢結果是依據使用者動態輸入的股票代碼，到網頁伺服端撈取相應的資料，所以，這時候可以按下F12，切換到Network(網路)頁籤，並且再次按下查詢按鈕，來觀察網頁背後的請求情形，如下圖：

其中就可以看到第一個網址回應了我們所需要的查詢結果資料，接著，切換到Headers(標頭)頁籤，就能夠得到完整的網址，如下圖：

相信讀者應該已經知道，透過替換網址中的date日期與stockNo股票代碼參數，就能夠動態爬取所要分析的公司股票資料。

二、開發Python網頁爬蟲

知到了網址之後，利用以下指令安裝requests發送請求的套件：

$ pip install requests

開啟Visual Studio Code程式碼編輯器，建立app.py檔案，引用開發所需的以下模組(Module)：

from datetime import datetime
import requests
import sqlite3

接下來，利用requests模組(Module)發送請求到查詢結果的網址，並且透過json()函式轉為字典(Dictionary)後，存取data(查詢結果)欄位，如下範例：

from datetime import datetime
import requests
import sqlite3


response = requests.get(
    f'https://www.twse.com.tw/exchangeReport/STOCK_DAY?response=json&date=20210806&stockNo=2330')
response_data = response.json()['data']

print(response_data)

執行結果：

[['110/08/02', '24,948,096', '14,593,329,483', '583.00', '590.00', '580.00', '590.00', '+10.00', '19,791'], 
['110/08/03', '28,104,984', '16,655,446,605', '594.00', '594.00', '590.00', '594.00', '+4.00', '20,221'], 
['110/08/04', '23,714,971', '14,132,827,829', '598.00', '598.00', '594.00', '596.00', '+2.00', '18,228'], 
['110/08/05', '15,673,765', '9,343,887,536', '598.00', '598.00', '593.00', '596.00', ' 0.00', '15,495'], 
['110/08/06', '13,994,018', '8,275,142,201', '596.00', '596.00', '588.00', '591.00', '-5.00', '13,742']]

由於我們是要取得當日的成交資訊，這時候就可以利用Python Comprehension語法，透過迴圈讀取以上的網頁回應資料，並且判斷如果有當天的日期，則將資料取出來，如下範例：

from datetime import datetime
import requests
import sqlite3


response = requests.get(
    f'https://www.twse.com.tw/exchangeReport/STOCK_DAY?response=json&date=20210806&stockNo=2330')
response_data = response.json()['data']

result = [data for index, data in enumerate(response_data) if '110/08/06' in response_data[index]]

print(result)

執行結果

[['110/08/06', '13,994,018', '8,275,142,201', '596.00', '596.00', '588.00', '591.00', '-5.00', '13,742']]

當然，日期的部分會建議Python網頁爬蟲能夠自動帶入當天的日期，就可以使用datetime模組(Module)來取得當下的日期與指定格式，如下範例第6、7行：

from datetime import datetime
import requests
import sqlite3


today = datetime.now().strftime('%Y%m%d')  #西元年(yyyymmdd)
chinese_today = f"{(datetime.now().year - 1911)}/{datetime.now().strftime('%m/%d')}"  #民國年(yyy/mm/dd)

response = requests.get(
    f'https://www.twse.com.tw/exchangeReport/STOCK_DAY?response=json&date=20210806&stockNo=2330')
response_data = response.json()['data']

result = [data for index, data in enumerate(response_data) if '110/08/06' in response_data[index]]

print(result)

設定完成後，將第10行的請求網址date日期參數與第13行的民國日期改為使用變數的方式帶入，如下範例：

from datetime import datetime
import requests
import sqlite3


today = datetime.now().strftime('%Y%m%d')  #西元年(yyyymmdd)
chinese_today = f"{(datetime.now().year - 1911)}/{datetime.now().strftime('%m/%d')}"  #民國年(yyy/mm/dd)

response = requests.get(
    f'https://www.twse.com.tw/exchangeReport/STOCK_DAY?response=json&date={today}&stockNo=2330')
response_data = response.json()['data']

result = [data for index, data in enumerate(response_data) if chinese_today in response_data[index]]

print(result)

另外，為了方便後續的資料識別，所以當日如果有成交資料的話，則在串列(List)中的第一個位置增加股票代碼，如下範例第15、16行：

from datetime import datetime
import requests
import sqlite3


today = datetime.now().strftime('%Y%m%d')  #西元年(yyyymmdd)
chinese_today = f"{(datetime.now().year - 1911)}/{datetime.now().strftime('%m/%d')}"  #民國年(yyy/mm/dd)

response = requests.get(
    f'https://www.twse.com.tw/exchangeReport/STOCK_DAY?response=json&date={today}&stockNo=2330')
response_data = response.json()['data']

result = [data for index, data in enumerate(response_data) if chinese_today in response_data[index]]

if result:  #如果有資料
    result[0].insert(0, '2330')

print(result)

執行結果

[['2330', '110/08/06', '13,994,018', '8,275,142,201', '596.00', '596.00', '588.00', '591.00', '-5.00', '13,742']]

三、建置SQLite資料庫

到目前為止，都只有爬取單一公司的當日成交資料，接下來，就要讓Python網頁爬蟲動態讀取資料庫中，我們所自訂的多個股票代碼。

而在這之前，就需要先建立一個資料庫，大家可以下載DB Browser for SQLite工具，來建立SQLite資料庫，開啟後如下圖：

點擊「新建資料庫」，命名為「Stocks.db」，並且存放在專案資料夾中，如下圖：

接著，命名資料表為「StockNumbers」，以及新增「StockNo」欄位，類型為「TEXT」，如下圖：

點擊OK後，就可以在「StockNumbers」資料表，點擊右建，選擇「Browse Table(瀏覽資料表)」，如下圖：

這時候，就可以點擊「+」新增三筆股票代碼，完成後點擊「Write Changes」寫入資料表，如下圖：

四、Python網頁爬蟲讀取資料庫

SQLite資料庫建置完成，開啟app.py檔案，利用sqlite3模組(Module)來連接與設定撈取StockNumbers資料表中的StockNo欄位資料SQL指令，如下範例第9~11行：

from datetime import datetime
import requests
import sqlite3


today = datetime.now().strftime('%Y%m%d')  #西元年(yyyymmdd)
chinese_today = f"{(datetime.now().year - 1911)}/{datetime.now().strftime('%m/%d')}"  #民國年(yyy/mm/dd)

conn = sqlite3.connect('Stocks.db')
cursor = conn.cursor()
cursor.execute('SELECT StockNo FROM StockNumbers')

response = requests.get(
    f'https://www.twse.com.tw/exchangeReport/STOCK_DAY?response=json&date={today}&stockNo=2330')
response_data = response.json()['data']

result = [data for index, data in enumerate(response_data) if chinese_today in response_data[index]]

if result:  #如果有資料
    result[0].insert(0, '2330')

print(result)

接著，呼叫sqlite3模組(Module)的fetchall()方法(Method)，執行撈取StockNo欄位的所有股票代碼資料SQL指令，並且透過迴圈來進行讀取與替換既有的股票代碼，如下範例第13、15、21行：

from datetime import datetime
import requests
import sqlite3


today = datetime.now().strftime('%Y%m%d')  #西元年(yyyymmdd)
chinese_today = f"{(datetime.now().year - 1911)}/{datetime.now().strftime('%m/%d')}"  #民國年(yyy/mm/dd)

conn = sqlite3.connect('Stocks.db')
cursor = conn.cursor()
cursor.execute('SELECT StockNo FROM StockNumbers')

for stock_no in cursor.fetchall():
    response = requests.get(
	f'https://www.twse.com.tw/exchangeReport/STOCK_DAY?response=json&date={today}&stockNo={stock_no[0]}')
    response_data = response.json()['data']

    result = [data for index, data in enumerate(response_data) if chinese_today in response_data[index]]

    if result:  #如果有資料
	result[0].insert(0, stock_no[0])

print(result)

最後，為了讓三個股票代碼資料能夠存放在同一個串列(List)中，所以另外定義了combined串列(List)，加入Python網頁爬蟲爬取的所有股票代碼資料，如下範例第13、23行：

from datetime import datetime
import requests
import sqlite3


today = datetime.now().strftime('%Y%m%d')  #西元年(yyyymmdd)
chinese_today = f"{(datetime.now().year - 1911)}/{datetime.now().strftime('%m/%d')}"  #民國年(yyy/mm/dd)

conn = sqlite3.connect('Stocks.db')
cursor = conn.cursor()
cursor.execute('SELECT StockNo FROM StockNumbers')

combined = []  #合併結果
for stock_no in cursor.fetchall():
    response = requests.get(
	f'https://www.twse.com.tw/exchangeReport/STOCK_DAY?response=json&date={today}&stockNo={stock_no[0]}')
    response_data = response.json()['data']

    result = [data for index, data in enumerate(response_data) if chinese_today in response_data[index]]

    if result:  #如果有資料
	result[0].insert(0, stock_no[0])
	combined.append(result[0])

print(combined)

執行結果

[['2330', '110/08/06', '13,994,018', '8,275,142,201', '596.00', '596.00', '588.00', '591.00', '-5.00', '13,742'], 
['2409', '110/08/06', '164,723,579', '3,591,785,901', '22.10', '22.10', '21.60', '21.60', '-0.70', '42,887'], 
['2382', '110/08/06', '5,748,237', '441,589,670', '76.90', '77.50', '76.20', '77.10', '+0.70', '3,779']]

五、小結

學會了Python網頁爬蟲讀取資料庫的技巧，當未來想要調整關注的股票代碼時，只需要修改資料庫中的資料即可，而無需變動Python網頁爬蟲的程式碼，讓專案更具有彈性，後續如果想要將爬取到的資料存入資料庫，則可以參考[Python爬蟲教學]輕鬆學會Python網頁爬蟲與MySQL資料庫的整合方式或[Pandas教學]快速掌握Pandas套件讀寫SQLite資料庫的重要方法文章。

如果您喜歡我的文章，別忘了在下面訂閱本網站，以及幫我按五下Like(使用Google或Facebook帳號免費註冊)，支持我創作教學文章，回饋由LikeCoin基金會出資，完全不會花到錢，感謝大家。