Selecting Our Sources
We chose to work primarily with the TMDB dataset, a widely used collection of movies with over a million different movies. We chose this set because it contains several key attributes of film in the form of 24 different categories. For instance, it provides film genre, language, budget, country, and release date–all categories that make it possible to analyze trends over time. Additionally, some categories provide public opinion of the films, including rating and vote count, which makes it easy to understand public perception of genres and trends over time. Due to the extensive data in the TMDB dataset, we decided to merge this set with the Letterbox dataset, which consists of around 16,000 movies scraped from a popular social platform for film enthusiasts, as a sampling framework. By merging these sets, we were able to use the wide range of metadata of TMDB while focusing on our analysis on a manageable collection of films.
Processing Our Data
In terms of processing the data, we knew that we wanted to merge the 2 datasets together, based on the title. In order to do this, we first normalized the titles by converting them all to lowercase and removing any white spaces. Then, based on this, we merged the 2 datasets together using an inner join on the normalized titles and added suffixes (_tmdb, _letterboxd) to distinguish overlapping column names. If there were any missing values in any of the rows, they were removed since we already had such a large pool to work with, and this was the final data set that we would be using for our analysis.
Presenting Our Narrative
The website was created by WordPress and hosted via UCLA’s HumSpace portal, an accessible website hosting service provided by the University of California, Los Angeles, for students in the Digital Humanities department. We chose the Film Maker Lite theme, which comes with a red-white color palette, for its clean and minimalist visual and user-friendliness. We then populated our website with Background, Narrative, and About pages and various subpages to allow the audience to navigate our website easily.
