Skip to main content

Assumptions & Concerns

I am using an equal number of songs from each decade to combat imbalances. However, since I need to manually input the lyrics, the number of songs used is very few so I may end up drawing wrong conclusions based on unique characteristics of the specific songs I chose. Because of that, I'm making a huge assumption that my data is representative of the whole decade. However, I'm adding more lyrics to the list and re-running everything so I hope as I add more data, the assumption will become less of a stretch. The underlying phenomena is the zeitgeist of each decade, and I am using the lyrics of the popular songs as a latent variable to investigate this.

Subsets:

  1. 7 data points
    • #1 song in 1950, 1960, 1970, 1980, 1990, 2000, 2010
  2. 14 data points
    • #1 song in 1950, 1955, 1960, 1965, 1970, 1975, 1980, 1985, 1990, 1995, 2000, 2005, 2010, 2015
  3. 28 data points
    • #1 & #2 songs in 1950, 1955, 1960, 1965, 1970, 1975, 1980, 1985, 1990, 1995, 2000, 2005, 2010, 2015
  4. 68 data points
    • #1 song from each year from 1950 to 2018
  5. 136 data points
    • #1 & #2 songs from each year from 1950 to 2018