top of page

About the app

Screen Shot 2022-03-14 at 8.36.35 PM.png
About the App

This project is a visualization of Chicago Transportation data, obtained from Chicago Transit Authority over the years 2001 - 2021.

The original data is available from the Chicago Data Portal at:

https://data.cityofchicago.org/Transportation/CTA-Ridership-L-Station-Entries-Daily-Totals/5neh-572f

This application can be used to understand the major trends in transportation occurring over the years, and can also be used to compare these trends at different stations in the Chicago Transit System. This application also includes a map visualization of the different stations in the L system, so users can easily look at the number of rides on those stations relative to their location on the map. This app also allows users to select a year and a station and view information about that particular station in detail.

On the left side of the screen, we have included a menu that contains options allowing the user to control the different reactive visualizations in the application.

 

  • The user selects a date for which the bar plot displays information for all stations for that date, so users can see the number of rides on each station for that date. The leaflet map on the right side of the screen also updates to show the stations locations, as well as markers of sizes proportional to the number of rides on that station on that date. The corresponding data is displayed in tabular form on the application as well in the left-most table. The users can also select different backgrounds from the leaflet map in order to see features better.

 

  • For the plot mentioned above, the users can also sort the bars in the bar plot in alphabetical order according to the station name, as well as in ascending order of rides. This data also updates in the table when the sort options are changed.

 

  • Users can also choose another date, and select the view difference between dates checkbox, which causes the barplot, map and table to update to show this information. On the map, the blue markers indicate an increase in the number of rides, and the red markers indicate a decrease, from the first date to the second date.

 

  • Additionally, users can either select a specific station by clicking on the map or selecting a station from the “select a station” text field in the menu on the left. This updates the four plots in the middle of the visualization to display ridership information for that specific station. The users are also given a dropdown to select a year from 2001-2021. The three barplots show ridership for each day, month, and day of the week of the selected year and the plot at the bottom shows total ridership for the selected station for each year. For this data, the users are also given a control to view plots or tables which are displayed on the screen based on which option is selected.

The users can use all these different controls to manipulate the visualizations displayed, so they can easily analyze the data in terms of specific factors.

The visualizations are created using R, Shiny, Shiny Dashboard, and ggplot2 in R.

Screen Shot 2022-03-14 at 8.36.52 PM.png
Screen Shot 2022-03-14 at 8.37.04 PM.png
The Data

The Data

The data for this application was obtained from the Chicago Data Portal, owned by the Chicago Transit Authority and it contains ride information for all the ‘L’ stations in Chicago. Ridership statistics are provided for these stations on a system-wide, bus-train level basis. Ridership is calculated when boarding, that is, each time a person boards a transit vehicle, a bus, or a train, it is considered as a ride. On the rail system, a customer is counted as an entry each time he or she passes through a turnstile to enter a station.  Customers are not counted as rides when they make a transfer since they don't pass through a turnstile. Rides are recorded using the bus farebox and farecard reader. Whenever there is an error in the farebox or farecard reader, these rides are marked as a 0 in the data. This information also contains the day type for each date of the year. Weekdays are marked as “W”, Saturdays are marked as “A”, and Sundays and Holidays are marked as a “U” in the system. New Year's Day, Memorial Day, Independence Day, Labor Day, Thanksgiving, and Christmas Day are considered "Sundays" for the purposes of ridership reporting.

 

There are 1.09M rows in this dataset, with information from 2001 up to November 2021. There are 5 columns, listed below with their datatype:

station_id – Number

stationname – plain text

date – date and time

daytype – plain text “W”, “A” or “U”

rides – number

Screen Shot 2022-02-14 at 1.06.14 PM.png

The first step was to separate the date for the different stations in order to upload them to the shinyApps platform. This was done using python to separate the data by filtering the station_ids. The code for the .ipynb file, and the separated data files are available in the GitHub repository for this project. I also chose to use the station_id file to filter the data, instead of their name to avoid any errors due to punctuation.

 

After separating the files and importing the data into R as a dataframe, the next step was to make the dates in the data of a more usable format. R has a library called lubridate, which contains several useful functions to work with dates and extract the dates, as well as the months, years, and weekdays from them. The final dataframe for each station contains the original columns, as well as the fixed date, month, year, and weekday. Alternatively, we can also use these functions while generating the graphs. Another modification we made was eliminating the commas from the total number of rides and converting them to integers from strings, in order to use aggregate functions on them.

 

The second data file for this project contained the stop information for all of the stations mentioned in the first file. It contains the type of stations, the directions as well as the latitude and longitude for each station. For this project we’re using the latitude and longitude to plot the stations on a leaflet map. This file also contains information for a lot more stations. We compared the stations from file 1 using their station_ids, which are marked as MAP_ID in file 2. The latitude and longitude are formatted as strings, which we converted to individual integers to be passed on to the map function in leaflet in R. The code for this can be found in the app.R file in the repository. The only features of interest from this second file for this project were MAP_ID and location. We did not use any of the other columns, however, they can be used to fine tune the project based on which railway line the station lies on.

GitHub

Interesting Observations

Looking at the data in the application, and comparing the stations amongst each other at different dates, we found some interesting trends in the data, the reasons for which can be studied in greater detail. 

The numbers at Lake/State are constantly much higher than the rest of the stations. Even in 2020 they dipped in April but rose almost immediately afterwards. This can be due to its location in the Loop in downtown, which makes it very close to many offices and key spots around Chicago including the riverwalk and Millennium Park.

 

Clark/Lake also sees a similar trend in the data, with a dip and then an immediate increase in the number in 2020. The numbers dipped relative to the other years, but they remain higher than other stations. This is also due to this station’s location on the Loop and in central downtown.

O’hare airport showed a consistent large number of rides for most days in Aug 2021. This is probably due to the fact that the station is the primary transportation for an international airport.

For a lot of weeks in O'hare, greater number of rides can be seen on Thursdays, Fridays and Sundays than any other day.

Selecting first day of school in 2021 (8-23-2021) as the first date and the first day of school in 2020 (8-24-2020), the difference filter shows “Blue” circles for almost all stations on the map, which tells us that most stations’ ridership increased in 2021 compared to 2020, illustrating the pandemic’s affect on almost all stations across the system.

 

Screen Shot 2022-03-14 at 8.53_edited.jp
Interesting Ob

GitHub Repository

The application code files can be accessed on Github at https://github.com/rafiyaawan/Project2-CTAMap .

This public repo contains multiple CSV files containing CTA ridership data for each station. The files are named by each station’s ID number. In addition, the StopList.csv file contains each station’s location data including their latitude and longitude values. This information was used to plot stations on the map in our application.

In order to run this application on their own machine, the users would first need to install R and RStudio and install the required libraries.

R can be installed using this link: https://repo.miserver.it.umich.edu/cran/. Newest version of R is 4.1.3, however older versions from 3.3.0+ are required for using RStudio. Here the user can access instructions on downloading and installing the appropriate R versions for their machine. More information on R can be found at https://www.r-project.org/.

Furthermore, users can download a free version of RStudio using this link: https://www.rstudio.com/products/rstudio/download/. The download link can be found at the bottom of the page and users can also find instructions on installation here. The most recent version of RStudio is 2022.02.0-443, although older versions like 2021.09 and above are also fine.

Once the user has installed these, the libraries used for this project are listed below which can be installed using “install.packages("[library name]")” e.g. install.packages(“shiny”) :

  • library(shiny)

  • library(ggplot2)

  • library(shinydashboard)

  • library(lubridate)

  • library(jpeg)

  • library(grid)

  • library(leaflet)

  • library(scales)

  • library(dplyr)

  • library(plyr)

  • library(readr)

  • library(leaflet)

  • library(leaflet.providers)

  • library(viridis)

After the libraries are installed the user can simply clone the project repository from Github using the “git clone https://github.com/rafiyaawan/Project2-CTAMap.git” command which can be run in the terminal application. This would download all project files including all the data CSV files. The files file_divide.ipynb and scrap.r can be ignored as those were used in the development process for breaking up the data files and for testing code. The only files relevant for the app are app.R and all the CSV files. These files should be stored in the same directory. 

From here, the user can simply open the app.R file in RStudio and click the "Run App" button with the green arrow in the top right to run the application.

ShinyApps

The application can also be run using the Shiny Server using the link:  

Video Demonstration

To watch a video demonstration of the application and it's features, use the link below!

Shiny
Video
bottom of page