import time
import sys
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup as BS
from selenium import webdriver
from great_tables import GT
from scrape_wunderground import *
My friend, Ben Leese, was telling me about his most recent project. He has a passion for going through old naturalist’s notebooks and pulling out data from the depths of that analog mess and bringing it into the digital world. He was talking to me about how weather could impact different bird behaviors. But he only had binary weather data (Yes, it rained/No, it didn’t rain). Furthermore, it was from Raleigh, NC rather than Zebulon, NC. While these places are close on the map, weather is even more local than politics. So I said that I’d try to find some better weather data for him.
I found this helpful python script for scraping the weather data from Weather Underground, which I will adapt to my purpose.
I will scrape weather from the BreakingWind station with code KNCZEBUL74
on 8/1/2024, as I am pretty sure this is the closest station to the desired location.
= "KNCZEBUL74"
station_id = "2024-08-01"
date_id
(20))
GT(scrape_wunderground(station_id,date_id).head(
.tab_options(= "#3B3A3EFF",
column_labels_background_color
) )
timestamps | Temperature | Dew Point | Humidity | Wind Speed | Wind Gust | Pressure | Precip. Rate | Precip. Accum. |
2024-08-01 12:04 AM | 75.0 | 72.0 | 92.0 | 0.0 | 0.0 | 29.92 | 0.0 | 0.0 |
2024-08-01 12:09 AM | 75.0 | 72.0 | 92.0 | 0.0 | 0.0 | 29.92 | 0.0 | 0.0 |
2024-08-01 12:14 AM | 75.0 | 72.0 | 92.0 | 0.0 | 0.0 | 29.92 | 0.0 | 0.0 |
2024-08-01 12:19 AM | 74.8 | 72.0 | 91.0 | 0.0 | 0.0 | 29.92 | 0.0 | 0.0 |
2024-08-01 12:24 AM | 74.8 | 72.0 | 91.0 | 0.0 | 0.0 | 29.93 | 0.0 | 0.0 |
2024-08-01 12:29 AM | 74.8 | 72.0 | 91.0 | 0.0 | 0.0 | 29.93 | 0.0 | 0.0 |
2024-08-01 12:34 AM | 74.7 | 72.0 | 92.0 | 0.0 | 0.0 | 29.93 | 0.0 | 0.0 |
2024-08-01 12:39 AM | 74.7 | 72.0 | 92.0 | 0.0 | 0.0 | 29.93 | 0.0 | 0.0 |
2024-08-01 12:44 AM | 74.6 | 72.0 | 92.0 | 0.0 | 0.0 | 29.93 | 0.0 | 0.0 |
2024-08-01 12:49 AM | 74.5 | 72.6 | 93.0 | 0.0 | 0.0 | 29.93 | 0.0 | 0.0 |
2024-08-01 12:54 AM | 74.5 | 73.0 | 93.0 | 0.0 | 0.0 | 29.93 | 0.0 | 0.0 |
2024-08-01 12:59 AM | 74.7 | 73.0 | 93.0 | 0.0 | 0.0 | 29.93 | 0.0 | 0.0 |
2024-08-01 1:04 AM | 74.7 | 73.0 | 93.0 | 0.0 | 0.0 | 29.92 | 0.0 | 0.0 |
2024-08-01 1:09 AM | 74.7 | 72.3 | 92.0 | 0.0 | 0.0 | 29.92 | 0.0 | 0.0 |
2024-08-01 1:14 AM | 74.7 | 72.2 | 92.0 | 0.0 | 0.0 | 29.92 | 0.0 | 0.0 |
2024-08-01 1:19 AM | 74.6 | 72.0 | 92.0 | 0.0 | 0.0 | 29.92 | 0.0 | 0.0 |
2024-08-01 1:24 AM | 74.5 | 72.0 | 92.0 | 0.0 | 0.0 | 29.92 | 0.0 | 0.0 |
2024-08-01 1:29 AM | 74.5 | 72.0 | 92.0 | 0.0 | 0.0 | 29.92 | 0.0 | 0.0 |
2024-08-01 1:34 AM | 74.5 | 72.9 | 93.0 | 0.0 | 0.0 | 29.92 | 0.0 | 0.0 |
2024-08-01 1:39 AM | 74.5 | 73.0 | 93.0 | 0.0 | 0.0 | 29.92 | 0.0 | 0.0 |
And it works! That being said, I’ll have to find a different weather station because this one seems somewhat new. There is only partial data for 5/29/2024.
= "2024-05-29"
date_id
(20))
GT(scrape_wunderground(station_id,date_id).head(
.tab_options(= "#3B3A3EFF",
column_labels_background_color
) )
timestamps | Temperature | Dew Point | Humidity | Wind Speed | Wind Gust | Pressure | Precip. Rate | Precip. Accum. |
2024-05-29 10:34 PM | 71.2 | 51.2 | 50.0 | 0.2 | 0.2 | 29.93 | 0.0 | 0.0 |
2024-05-29 10:39 PM | 71.7 | 53.9 | 54.0 | 1.4 | 9.0 | 29.94 | 0.1 | 0.1 |
2024-05-29 10:44 PM | 73.1 | 55.1 | 53.0 | 0.8 | 10.0 | 29.94 | 0.1 | 0.1 |
2024-05-29 10:49 PM | 72.8 | 54.9 | 53.0 | 0.0 | 10.0 | 29.94 | 0.1 | 0.1 |
2024-05-29 10:54 PM | 72.4 | 53.5 | 52.0 | 0.0 | 10.0 | 29.94 | 0.1 | 0.1 |
2024-05-29 10:59 PM | 72.2 | 53.0 | 52.0 | 0.0 | 10.0 | 29.94 | 0.1 | 0.1 |
2024-05-29 11:04 PM | 72.1 | 53.0 | 52.0 | 0.0 | 10.0 | 29.95 | 0.1 | 0.1 |
2024-05-29 11:09 PM | 72.0 | 53.0 | 52.0 | 0.0 | 10.0 | 29.95 | 0.1 | 0.1 |
2024-05-29 11:14 PM | 71.9 | 52.1 | 51.0 | 0.0 | 10.0 | 29.95 | 0.1 | 0.1 |
2024-05-29 11:19 PM | 71.8 | 51.8 | 51.0 | 0.0 | 10.0 | 29.95 | 0.1 | 0.1 |
2024-05-29 11:24 PM | 71.6 | 51.6 | 51.0 | 0.0 | 10.0 | 29.95 | 0.1 | 0.1 |
2024-05-29 11:29 PM | 71.6 | 51.0 | 50.0 | 0.0 | 10.0 | 29.95 | 0.1 | 0.1 |
2024-05-29 11:34 PM | 71.4 | 51.0 | 49.0 | 0.0 | 10.0 | 29.95 | 0.1 | 0.1 |
2024-05-29 11:39 PM | 71.3 | 51.0 | 49.0 | 0.0 | 8.0 | 29.95 | 0.1 | 0.1 |
2024-05-29 11:44 PM | 71.2 | 50.1 | 48.0 | 0.0 | 2.0 | 29.95 | 0.0 | 0.1 |
2024-05-29 11:49 PM | 71.1 | 50.0 | 47.0 | 0.0 | 1.3 | 29.95 | 0.0 | 0.1 |
2024-05-29 11:54 PM | 71.1 | 50.0 | 47.0 | 0.0 | 0.0 | 29.94 | 0.0 | 0.1 |
2024-05-29 11:59 PM | 70.9 | 50.0 | 47.0 | 0.0 | 0.0 | 29.94 | 0.0 | 0.1 |
And 5/28/2024 has no data.
= "2024-05-28"
date_id
(20))
GT(scrape_wunderground(station_id,date_id).head(
.tab_options(= "#3B3A3EFF",
column_labels_background_color
) )
timestamps | Temperature | Dew Point | Humidity | Wind Speed | Wind Gust | Pressure | Precip. Rate | Precip. Accum. |
I will have to find a different nearby weather station for this purpose of finding Ben some weather data for the time period he is interested in (mid 1970s). And, once I succeed at that, I will have to aggregate the 5-minute data to daily data. But once that’s done, my friend should have more than enough weather data to help him with his model.