This is the work behind the dataset and the model notebooks for what I want to put into the analytics webapp. I utilized the web APIs for both Genius and Spotify to get the job done.
By the end, this repo will include my model notebooks, the data, and the scripts I used to collect the data. The final models here are going to be one that classifies a song via song attributes and another will do the same classification with lyrics. I also have a topic model in one of the workbooks that I used to generate a single label for each song so the classification could test accuracies.
This is my first draft of the project. It was difficult to get the lyrics data to build the models, so I am in the process of trying to make an even dataset of 900-1000 songs per genre and see if I can get accuracy to improve with balanced classes. I currently only have 5 classes in the predictor based on lyrics due to this issue, so I am actively working to correct the song counts per genre so I can use all 8 of the final labels I created. However, the song attribute model has enough data to go about this balanced approach with 8 classes, so I implemented that in the attribute model workbook. Currently, the models run at similar levels of accuracy (around 68-69%), but the lyric model is predicting 5 classes instead of 8, so that accuracy is subject to change once the data issue is resolved as well.