top of page
About the App

About the app

This project is a visualization of Chicago Transportation data, obtained from Chicago Transit Authority over the years 2001 - 2021.

The original data is available from the Chicago Data Portal at:

https://data.cityofchicago.org/Transportation/CTA-Ridership-L-Station-Entries-Daily-Totals/5neh-572f

This application can be used to understand the major trends in transportation occurring over the years, and can also be used to compare these trends at different stations in the Chicago Transit System. This application also includes a map visualization of the different stations in the L system, so users can easily look at the number of rides on those stations relative to their location on the map. This app also allows users to select a year and a station and view information about that particular station in detail.

On the left side of the screen, we have included a menu that contains options allowing the user to control the different reactive visualizations in the application.

 

  • The user selects a date for which the bar plot displays information for all stations for that date, so users can see the number of rides on each station for that date. The leaflet map on the right side of the screen also updates to show the stations locations, as well as markers of sizes proportional to the number of rides on that station on that date. The corresponding data is displayed in tabular form on the application as well in the left-most table. The users can also use the change background option to select different backgrounds in the leaflet map in order to see features better.

 

  • For the plot mentioned above, the users can also sort the bars in the barplot in alphabetical order according to the station name, as well as in ascending order of rides. This data also updates in the table when the sort options are changed.

 

  • Users can also choose another date, and select the view difference between dates checkbox, which causes the barplot, map and table to update to show this information. On the map, the blue markers indicate an increase in the number of rides, and the red markers indicate a decrease, from the first date to the second date.

 

  • Additionally, users can either select a specific station by clicking on the map or selecting a station from the “select a station” text field in the menu on the left. This updates the four plots in the middle of the visualization to display ridership information for that specific station. The users are also given a dropdown to select a year from 2001-2021. The three barplots show ridership for each day, month, and day of the week of the selected year and the plot at the bottom shows total ridership for the selected station for each year. For this data, the users are also given a control to view plots or tables which are displayed on the screen based on which option is selected.

 

The users can use all these different controls to manipulate the visualizations displayed, so they can easily analyze the data in terms of specific factors.

The visualizations are created using R, Shiny, Shiny Dashboard, and ggplot2 in R.

Data

The Data

The data for this application was obtained from the Chicago Data Portal, owned by the Chicago Transit Authority and it contains ride information for all the ‘L’ stations in Chicago. Ridership statistics are provided for these stations on a system-wide, bus-train level basis. Ridership is calculated when boarding, that is, each time a person boards a transit vehicle, a bus, or a train, it is considered as a ride. On the rail system, a customer is counted as an entry each time he or she passes through a turnstile to enter a station.  Customers are not counted as rides when they make a transfer since they don't pass through a turnstile. Rides are recorded using the bus farebox and farecard reader. Whenever there is an error in the farebox or farecard reader, these rides are marked as a 0 in the data. This information also contains the day type for each date of the year. Weekdays are marked as “W”, Saturdays are marked as “A”, and Sundays and Holidays are marked as a “U” in the system. New Year's Day, Memorial Day, Independence Day, Labor Day, Thanksgiving, and Christmas Day are considered "Sundays" for the purposes of ridership reporting.

 

There are 1.09M rows in this dataset, with information from 2001 up to November 2021. There are 5 columns, listed below with their datatype:

station_id – Number

stationname – plain text

date – date and time

daytype – plain text “W”, “A” or “U”

rides – number

​​

Screen Shot 2022-02-14 at 1.06.14 PM.png

We’re interested in looking at individual stations for their ridership information, so the natural next step was to filter out the information for the stations that appear in the application. For this project, I chose to look at three stations – UIC Halsted, O’Hare, and Washington. I did this using python, but it can be done using any operation that allows the creation of CSV or TSV files. The code for the .ipynb file is available in the GitHub repository for this project. I also chose to use the station_id file to filter the data, instead of their name to avoid any errors due to punctuation marks.

 

After separating the files and importing the data into R as a dataframe, the next step was to make the dates in the data of a more usable format. R has a library called lubridate, which contains several useful functions to work with dates and extract the dates, as well as the months, years, and weekdays from them. The final dataframe for each station contains the original columns, as well as the fixed date, month, year, and weekday. Alternatively, we can also use these functions while generating the graphs.

Observations

Interesting Observations

Looking at the data in the application, and comparing the three stations amongst each other, I found some interesting trends in the data, the reasons for which can be studied in greater detail. 

  • The O'Hare station experienced a sharp drop in rides - September 2001

  • There is an overall decrease in rides in the O'Hare station from 2007-2009, with a sharp drop in July 2008

  • The rides start increasing in 2009, continued upward trend until 2010 at the O'Hare station

  • A major drop in rides is seen in April 2020 across all three stations.

  • O'Hare and UICHalsted see low rides throughout 2020, however, Washington stations rides start increasing almost immediately after the drop.

  • UIC Halsted has an decrease in the number of rides in the summer months, and a sharp increase in late August/early September for all the years

  • In 2006, UIC Halsted has more rides in the winter months (January, February, March, and April) than both 2005 and 2007.

  • UIC Halsted experienced a sharp drop in the number of rides in April 2020, and had less than 25000 rides for the rest of the year. The numbers started increasing again in September 2021.

  • There is a sharp drop in the number of rides at Washington in July 2008.

Sharp decrease in the number of rides in July 2008

Graphs for O'Hare, 2008

Screen Shot 2022-02-14 at 3.45.58 PM.png

Graphs for O'Hare and UIC Halsted, 2020

Github

GitHub Repository

The code for this application, along with the necessary data files can be found at: https://github.com/Ameesha23/CS424Project1

To run app.R, you can use any IDE that runs R, I used RStudio.

To make new data files, the repository contains a python notebook file which can be modified to export files for any of the stations available in the dataset.

ShinyApps
Video Demonstration

ShinyApps

The application can also be run using the Shiny Server using the link:  https://ameesha.shinyapps.io/424Project1/

Video Demonstration

To watch a video demonstration of the application and it's features, use the link below!

bottom of page