SCHOOL BUS BREAKDOWNS IN NEW YORK CITY

In this paper, we examine the dataset representing bus breakdowns and delays in the New York public school system. We analyze several measures involving the companies involved in delays, the season/date of the delays, the causes of the delays and other measures. We have several conclusions and recommendations.


Introduction
We analyze the data set, Bus Breakdowns and Delays, from the open source data site Kaggle. The data is a collection of information from school bus vendors in the New York City school systems. The bus companies record delays in real-time on their bus route, which are then used by a centralized customer service system to notify parents about delays. The data are were accessed in November 2019.
This data set was chosen primarily to improve the New York OPT. The company website is lacking and OPT's Google rating is 1.3 out of five stars. Insights from these data could lead to changes that would improve their brand and offer solutions for improvement.
Through analysis, the following questions were addressed: • Which bus companies are responsible for the highest percentage of breakdowns and delays? • What type of bus company-caused issues commonly come from the same contractor?
• Which companies are better at notifying parents, and which are worse?
• Does notifying correlate with bus breakdown/delay time?
• Does the type of breakdown/delay correlate with breakdown/delay time?
• Is there a seasonality effect for weather-caused breakdowns or delays? Http://www.granthaalayah.com ©International Journal of Research -GRANTHAALAYAH [337] From this analysis and the literature review, recommendations will be made for further analysis and to the bus companies, schools, and OPT as to how to improve operations of their school bus system.
Inadequate school bus functionality is not a new concept to the boroughs of New York. Between 2015 and 2019, the number of breakdowns and delays increased by 73%. In the 2018-2019 school year alone, there were 9,488 breakdowns in New York City (Sanders, 2019). Even though this is such a large problem, and the raw data from the New York City Department of Public Transportation is publicly available, there has not been much publicized research into solving the crisis.
The earliest article on improving the school bus system in New York City is from 1994, where researchers Braca, Bramel, Posner, and Simchi-Levi created a computerized algorithm to project improvements of school bus routes in the city (Braca et al.,1995). While many bus routes use algorithms and trackers in 2019, this early test in 1994 exemplifies that even 25 years ago, there was significant room for improvement in the New York City school bus system. Similar findings to the Boston experiment were found in studies in Rome, Italy in 2017 (Brinchi et al, 2017) and New York City in 2016. Although both studies were on public transportation buses, not school buses, the general conclusion was that with sufficient data and analysis, it is possible, especially with regard to route planning, to make improvements to bus service planning capabilities (Hanft et al., 2016).
The data discussed here focus specifically on school bus delays in New York City and boroughs of the tri-state area and the root causes. Although the literature reviewed does not touch on this topic specifically, the studies show different perspectives on similar topics and potential ways to mitigate the problems the New York school buses are experiencing. Ultimately, the goal of this data analysis is to identify the causes of the bus delays and offer potential solutions. The literature above was reviewed because it shares this common goal. Some of these cases, including Sanders' spotlight on New York City school buses causing delays, especially for special-needs students, and Austin's research on reducing school bus emissions to improve students' academic performance and health (Austin et al., 2019), point out the obvious positive implications of improving the system. While others, such as MIT's Quantum mapping algorithm for Boston school buses, and the New York City algorithm for mapping public transportation in 1994, offer solutions. This paper will dive more deeply into New York's data more specifically, taking into consideration this prior research when analyzing the data.

Materials and Methods
The data set represents all reported breakdowns and delays for buses that transport public school students to and from a New York City-based school, beginning with the 2015-2016 school year through March 2019 (not the complete 2018-2019 school year).
These data are broken up into several different variables, including: • The date and time of when the breakdown or delay occurred • The date and time when the breakdown or delay was reported to OPT, the school bus contractor • The date and time when the breakdown or delay was reported to parents • The borough where the breakdown or delay occurred (or if it occurred outside New York City, a geographic indicator of the location of the breakdown or delay) • The name of the bus company that was responsible for that bus • The length of time it took to resolve the breakdown or delay • Number of children on board the bus when it broke down or was delayed • And, the reason given for the breakdown or delay.
Based on an initial look at the data, we can see that there are several clear observations from our data set. First, the number of breakdowns and delays has risen dramatically since 2015-16, as seen in Figure   Second, there are significantly more breakdowns and delays in Manhattan, the Bronx, Brooklyn, and Queens out of the ten regions in this data set, seen in Figure 2 below. Combined, there are not as many breakdowns and delays in Westchester, Staten Island, Rockland County, New Jersey, Nassau County and Connecticut as there are in Manhattan. To prepare the data for proper analysis, the data needed to be cleaned. Because the entries were made by different people from different bus companies, all of whom used different methods, many of the data points were formatted differently.
For example, a bus company could be inputted under different names, such as "Montauk Student Trans" and "Montauk Student Trans LLC." Through cleaning, each company name was standardized to avoid confusion and produce clear results.
A similar problem occurred with the "Length of delay" column, which included the same length of time inputted in different formats (i.e., "1 hour," "1 hr," "60," "60 mins,"). This column was standardized into number of minutes format for all entries under length of delay.
To analyze the data, a mixture of Tableau and Excel was used. In Tableau, analysis was focused on seeking trends and similarities between bus companies, seasons, and reporting errors. Smart columns and maps were created to detect patterns and "top offenders"companies with the most or longest breakdowns and delays. Using Tableau was necessary here to account for the highly qualitative aspect of the data, and particularly helpful with detecting and displaying these patterns in a way that is relatively difficult to deal with in Excel or SPSS. For example, with Tableau, seasonality and geographical effects were routinely identified.
Excel was then used to analyze data quantitatively and test the effects on the time of delay/breakdown. Correlations, simple regressions, and multiple regressions were performed to test the relationships among number of students on the bus, type of delay/breakdown, notifying schools of delay/breakdown, and notifying OPT of delay/breakdown on the length (in minutes) of the delay/breakdown. The goal was to test if any of these factors correlated with shorter or longer breakdowns. Building off the Tableau analysis, breakdowns/delays because of weather was examined further.

Results and Discussion
To best answer the goal questions outlined in the introduction, Tableau was used for a holistic view of the breakdowns and delays. First, the frequency and locations of incidents were mapped (Exhibit 1). While not representative of the total geographical area of each New York City borough, using zip codes in place of "Boro" allowed for a general indication of incident locations. Manhattan, Brooklyn, and the Bronx, in that order, had the most breakdowns and delays.
The highest number of breakdowns and delays by company was then compiled into a top-ten offenders list (Exhibit 2). Leesel Transp Corp (48,701), GVC LTD. (24,618), and Pioneer Transportation Co. (24,208) had the most. These data were then compared with the amount of schools serviced (Exhibit 3), and, interestingly, Leesel Transp Corp and GVC LTD. were not in this top three (although they are in the top ten).
Next, a new top ten was created, emphasizing factors where individual bus companies may be at fault for breakdowns and delays. Exhibit 4 displays accident and mechanical issue data. While it is nearly impossible to identify the root cause of an accident or delay, there is a probability that the bus itself can be a cause of mechanical issues. These issues can often be dangerous, which means safety and brand reputation implications.
As shown in Exhibit 4, the ten companies with the highest accident ratios have ratios between 2.2% and 4.1%. This suggests that for every 100 delay records, an accident is the reason for a delay about 2-4 times. Mechanical problems are much more prevalent, with the top ten companies having reported mechanical problems as the reason for a delay between 13% and 28.1% of the times. These data points also display a much larger range than in the accident data.
Most interestingly, five bus companies, different than the overall breakdown and delay offenders, appear in both of the top-ten lists for accident and mechanical issue ratios. These companies are Boro Transit, Quality Transportation, Consolidated Bus Transportation, Jofaz Transportation, and SNT Bus Inc, highlighted in orange in Exhibit 4.
Further emphasizing SNT Bus Inc., this company holds the second-highest accident ratio and the highest mechanical issue ratio. These numbers are distressing and should be addressed. Through this analysis, it is possible to conclude that there is something intrinsic to these five companies that is causing the statistics to be so disturbing.
Based on this analysis, we decided to look further into other potential harmful practices. This flowed into a new top ten, emphasizing the lowest frequency-reporting individual bus companies that experienced incidents to their clients (i.e., parents and schools). While it is impossible for a company to never experience an incident, it is protocol and ethical that companies report the event to the schools and parents. This would give their clients time to respond accordingly and adjust for the disruption. The failure to do so leads to anger and resentment by the parents (which can be seen on OPT's Google business page) as well as the schools, therefore leading to significant brand reputation implications. Exhibit 5 shows the top ten worst reporting companies both by notification of parents and notifications of schools when an incident occurs. For example, in the first column of Pioneer Transportation Company, of all records of incidents for that company, parents were not notified 99.76% of the time. One can argue that this is a shocking revelation.
What is just as shocking is that, again, five specific bus companies appear in the top 10 for both not reporting incidents to parents and schools. These companies are Lorissa Bus Service Inc., Quality Transportation Co., All American School Bus Corp., Safe Coach Inc., and Mar-Can Transport Co. Inc., highlighted in orange in Exhibit 5.
The seasons, and more specifically the weather changes that come with the seasons in New York City, affect the number and length of delays and breakdowns. The original hypothesis was that there would be a rise in the number of breakdowns and delays, as well as length of breakdowns and delays, in the winter months (December-March). This could likely be attributed to snowstorms and snow traffic, as well as cautious driving when it is darker earlier in the afternoon.
First, number of breakdowns and delays were analyzed by month; the three months with the most delays and breakdowns were, in order, January, October, and March (Exhibit 6). Studying length of delays by month (Exhibit 7) indicated that the delays are longer in the winter months. This partially confirms the seasonality hypothesis, given that snow is standard in January and March in New York. February and December are not represented in the top three months. It can be determined that this is likely related to the school calendar, since students have time off in December and February. When exploring these data on a per school day basis, it is clear that December and February have a high number of delays, as well as lengthier delays.
Looking into length of delay further, Excel was used to determine correlations between time in minutes of delays/breakdowns, and potential contributing factors. First, number of students on the bus, notifying schools, and notifying OPT was reviewed. There were no strong correlations among these factors (Exhibit 9), so simple and multiple regressions were run to create a model predicting length of delay in minutes. Although the P-value for each independent variable was low enough to be considered a significant factor, the R-squared of 0.00023 was very low, meaning the model was not useful in predicting length of breakdown/delay (Exhibit 10). The results of the linear regression with number of students on bus as the independent variable and length of delay as the dependent variable also shows that this was not a good predictor (Exhibit 11).
The next hypothesis to test was whether the reason for the delay affects the time of delay. This was a reasonable variable to test considering that bus drivers had ten options to choose from (won't start and no answer were not tested due to lack of answers in these categories). Holding accident constant as an independent variable, delayed by school, flat tire, heavy traffic, late return from field trip, mechanical problem, other, and weather conditions were tested against the dependent variable length of delay/breakdown. Relatively surprisingly, the R-squared also came out very low indicating that this model was not a good predictor of length of delay/breakdown (Exhibit 12).
Circling back to the Tableau testing on seasonality effect, one last test was done on the affect of weather conditions on length of delay. Because the Tableau analysis indicated that winter months had a higher amount of delays/breakdowns, it was assumed weather would increase time of delay/breakdown. This was not the case, with the test again resulting in too low of an R-squared value to be significant (Exhibit 13).

Conclusions and Recommendations
It was unexpected to not find correlations for any factors affecting time of breakdown/ delay, but overall the need for improvement in the New York school bus system was very clear.
One recommendation is to further address the companies that are under-reporting to OPT's clients, schools and parents. These companies are Lorissa Bus Service Inc., Quality Transportation Co., All American School Bus Corp., Safe Coach Inc., and Mar-Can Transport Co. Inc. These companies do not report between 65% and 95% of incidents. There is a concerning gap between the top five and other companies in the data set, who report almost every incident.
Further research is recommended into the subject of why companies and bus drivers are not reporting. Though reporting does not directly correlate with length of delay, it is still alarming to parents who do not know where their kids are, and damages the education of the students who are consistently late. This is clearly a problem, evident in the articles reviewed from Brookings and The Daily News.
The next recommendation is for the system to address the five companies that appear in both the top ten worst offenders of traffic accidents and mechanical issues. These companies are Boro Transit, Quality Transportation, Consolidated Bus Transportation, Jofaz Transportation, and SNT Bus Inc. As some of these incidents are likely at least partially due to the negligence of the bus company, it is clear that there may be some factors internal to these companies that lead to potential lack of safety awareness and implementation of necessary safety protocols.
In terms of seasonality, there is a spike in incidents during January, February, March and October, as seen in Exhibit 6. It is probable that this spike is due to dangerous winter weather conditions and increased darkness during driving hours, and it is recommended to analyze this issue further with the proper data. Time of incident and time of report is indicated in the data analyzed in this case, but the actual weather conditions, possibly from a different database, could provide more insight.
Overall, the data analyzed did not paint a perfectly clear picture into how to improve the system, but did give significant insight into how to fix certain problems. A solution like MIT's Quantum