Python 100 project #51: Web scraping – Sunshine duration across countries

It’s said to be London is always covered with cloud. As I moved to London roughly two years ago, I realized it is actually not the case.

I searched the web and found very useful wikipedia page to list the (typically average) sunshine duration among each month of a year. This is a very basic task for web scraping (just 1 page).

 

Output:

 

 

Code:

# _*_ coding: utf-8 _*_

import csv
import re
from urllib.parse import urljoin

from bs4 import BeautifulSoup
import requests

base = "https://en.wikipedia.org"
target_url = base + "/wiki/List_of_cities_by_sunshine_duration"
req = requests.get(target_url, verify=False)

bs = BeautifulSoup(req.text, "html.parser")

tables = bs.find_all("table", {"class": "wikitable"})

cities_list = []

for table in tables:
    cities = table.find_all("tr")
    for city in cities:
        city_row = []
        # for text data collection. country_name, country_url, city_name, city_url
        for text_elem in city.find_all("td", style=re.compile("text-align:left")):
            elem_text = text_elem.get_text()
            city_row.append(elem_text)
            if text_elem.find("a"):
                city_row.append(urljoin(base, text_elem.find("a").get("href")))
            else:
                city_row.append("")
        # for sunshine hours data in monthly sequence.
        for data_elem in city.find_all("td", style=re.compile("background.*")):
            elem_text = data_elem.get_text()
            city_row.append(elem_text)

        cities_list.append(city_row)

with open('sunshine_hours.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(cities_list)