Related Works and What I Could've Done Better

In the article Billboard Hot 100 Analytics, Rosebud Anwuri analyzed Billboard's Hot 100 charts from 1950–2015 (with data from https://github.com/kevinschaich/billboard) using Spotify’s API. She noticed a trend towards less instrumentals and more speechiness over the decades, especially in the 1990s. She connected it with the rise of hip-hop and the types of bands that were popular: rock bands were replaced by pop bands. Finally, she showed that loudness has been on the rise over the decades.

Her findings relate to my own findings as well: especially with the speechiness of the recent songs. The data she used doesn't cover as long of a time period as mine (hers ends at 2015 while mine ends at 2018). However, the repo she took the data from had some interesting basic analysis done. In the future, I would be interested in using Spotify's API as well as looking at some of the features Kevin Schaich added to his dataset. However, when I first encountered these things, I knew far less than I do now so I wasn't sure how to utilize what I found.

If I had more time and/or money, I would've definitely included more songs. From certain word clouds, because I didn't have enough datapoints, certain very repetitive and song-specific phrases from certain songs (examples include: "goodnight Irene" from "Goodnight Irene", "wooly bully" from "Wooly Bully", "uptown funk you up" from "Uptown Funk") end up showing up very prominently in the data.

Additionally, in my latest iteration of the data inputed, I took the top 2 songs from each year as the corpus for each decade. As my latest data is 2018, the 2010s data contains less songs than the others, so is more subject to the individual characteristics of the songs in that decade.

Although it's not an issue now, the earlier Billboard top singles lists only lists 30 while the later ones list 50 or 100. Therefore, if I wanted a balanced dataset, the data would end up capped at 30 songs per year so if I wish to include more data, I would need to be mindful of that and perhaps find another source.

Another thing I think I could've done better was include more characteristics. As it was my first time doing web-scraping, I only managed to collect the ranking data from Wikipedia and didn't know how to efficently and accurately append data from other sources. My compromise was to manually include lyrics to the dataset, and although I was able to get a lot of interesting information from just the lyrics, if I had more information such as the gender of the artists, the genre, the length of the songs, or the sheet music, I could definitely do a more complex and insightful analysis. Perhaps after more experience, I would return to this project and redo it with more finesse.