Scraping Weather Data-Proof of concept

Scraping weather data near Zebulon, NC
Weather
Python
Web-scraping
Author

Steven Wolf

Published

August 8, 2024

My friend, Ben Leese, was telling me about his most recent project. He has a passion for going through old naturalist’s notebooks and pulling out data from the depths of that analog mess and bringing it into the digital world. He was talking to me about how weather could impact different bird behaviors. But he only had binary weather data (Yes, it rained/No, it didn’t rain). Furthermore, it was from Raleigh, NC rather than Zebulon, NC. While these places are close on the map, weather is even more local than politics. So I said that I’d try to find some better weather data for him.

I found this helpful python script for scraping the weather data from Weather Underground, which I will adapt to my purpose.

import time
import sys

import numpy as np
import pandas as pd
from bs4 import BeautifulSoup as BS
from selenium import webdriver
from great_tables import GT
from scrape_wunderground import *

I will scrape weather from the BreakingWind station with code KNCZEBUL74 on 8/1/2024, as I am pretty sure this is the closest station to the desired location.

station_id = "KNCZEBUL74"
date_id = "2024-08-01"

(
  GT(scrape_wunderground(station_id,date_id).head(20))
    .tab_options(
      column_labels_background_color = "#3B3A3EFF",
  )
)
timestamps Temperature Dew Point Humidity Wind Speed Wind Gust Pressure Precip. Rate Precip. Accum.
2024-08-01 12:04 AM 75.0 72.0 92.0 0.0 0.0 29.92 0.0 0.0
2024-08-01 12:09 AM 75.0 72.0 92.0 0.0 0.0 29.92 0.0 0.0
2024-08-01 12:14 AM 75.0 72.0 92.0 0.0 0.0 29.92 0.0 0.0
2024-08-01 12:19 AM 74.8 72.0 91.0 0.0 0.0 29.92 0.0 0.0
2024-08-01 12:24 AM 74.8 72.0 91.0 0.0 0.0 29.93 0.0 0.0
2024-08-01 12:29 AM 74.8 72.0 91.0 0.0 0.0 29.93 0.0 0.0
2024-08-01 12:34 AM 74.7 72.0 92.0 0.0 0.0 29.93 0.0 0.0
2024-08-01 12:39 AM 74.7 72.0 92.0 0.0 0.0 29.93 0.0 0.0
2024-08-01 12:44 AM 74.6 72.0 92.0 0.0 0.0 29.93 0.0 0.0
2024-08-01 12:49 AM 74.5 72.6 93.0 0.0 0.0 29.93 0.0 0.0
2024-08-01 12:54 AM 74.5 73.0 93.0 0.0 0.0 29.93 0.0 0.0
2024-08-01 12:59 AM 74.7 73.0 93.0 0.0 0.0 29.93 0.0 0.0
2024-08-01 1:04 AM 74.7 73.0 93.0 0.0 0.0 29.92 0.0 0.0
2024-08-01 1:09 AM 74.7 72.3 92.0 0.0 0.0 29.92 0.0 0.0
2024-08-01 1:14 AM 74.7 72.2 92.0 0.0 0.0 29.92 0.0 0.0
2024-08-01 1:19 AM 74.6 72.0 92.0 0.0 0.0 29.92 0.0 0.0
2024-08-01 1:24 AM 74.5 72.0 92.0 0.0 0.0 29.92 0.0 0.0
2024-08-01 1:29 AM 74.5 72.0 92.0 0.0 0.0 29.92 0.0 0.0
2024-08-01 1:34 AM 74.5 72.9 93.0 0.0 0.0 29.92 0.0 0.0
2024-08-01 1:39 AM 74.5 73.0 93.0 0.0 0.0 29.92 0.0 0.0

And it works! That being said, I’ll have to find a different weather station because this one seems somewhat new. There is only partial data for 5/29/2024.

date_id = "2024-05-29"
(
  GT(scrape_wunderground(station_id,date_id).head(20))
    .tab_options(
      column_labels_background_color = "#3B3A3EFF",
  )
)
timestamps Temperature Dew Point Humidity Wind Speed Wind Gust Pressure Precip. Rate Precip. Accum.
2024-05-29 10:34 PM 71.2 51.2 50.0 0.2 0.2 29.93 0.0 0.0
2024-05-29 10:39 PM 71.7 53.9 54.0 1.4 9.0 29.94 0.1 0.1
2024-05-29 10:44 PM 73.1 55.1 53.0 0.8 10.0 29.94 0.1 0.1
2024-05-29 10:49 PM 72.8 54.9 53.0 0.0 10.0 29.94 0.1 0.1
2024-05-29 10:54 PM 72.4 53.5 52.0 0.0 10.0 29.94 0.1 0.1
2024-05-29 10:59 PM 72.2 53.0 52.0 0.0 10.0 29.94 0.1 0.1
2024-05-29 11:04 PM 72.1 53.0 52.0 0.0 10.0 29.95 0.1 0.1
2024-05-29 11:09 PM 72.0 53.0 52.0 0.0 10.0 29.95 0.1 0.1
2024-05-29 11:14 PM 71.9 52.1 51.0 0.0 10.0 29.95 0.1 0.1
2024-05-29 11:19 PM 71.8 51.8 51.0 0.0 10.0 29.95 0.1 0.1
2024-05-29 11:24 PM 71.6 51.6 51.0 0.0 10.0 29.95 0.1 0.1
2024-05-29 11:29 PM 71.6 51.0 50.0 0.0 10.0 29.95 0.1 0.1
2024-05-29 11:34 PM 71.4 51.0 49.0 0.0 10.0 29.95 0.1 0.1
2024-05-29 11:39 PM 71.3 51.0 49.0 0.0 8.0 29.95 0.1 0.1
2024-05-29 11:44 PM 71.2 50.1 48.0 0.0 2.0 29.95 0.0 0.1
2024-05-29 11:49 PM 71.1 50.0 47.0 0.0 1.3 29.95 0.0 0.1
2024-05-29 11:54 PM 71.1 50.0 47.0 0.0 0.0 29.94 0.0 0.1
2024-05-29 11:59 PM 70.9 50.0 47.0 0.0 0.0 29.94 0.0 0.1

And 5/28/2024 has no data.

date_id = "2024-05-28"
(
  GT(scrape_wunderground(station_id,date_id).head(20))
    .tab_options(
      column_labels_background_color = "#3B3A3EFF",
  )
)
timestamps Temperature Dew Point Humidity Wind Speed Wind Gust Pressure Precip. Rate Precip. Accum.

I will have to find a different nearby weather station for this purpose of finding Ben some weather data for the time period he is interested in (mid 1970s). And, once I succeed at that, I will have to aggregate the 5-minute data to daily data. But once that’s done, my friend should have more than enough weather data to help him with his model.