• Posted by : Ruilin 1970/01/01

    MTA ridership

    Part 1: Designing a survey

    Goal

    As the largest public transportation agency in North America, the Metropolitan Transportation Agency in New York City(Ley, A. (2023, April 28). M.T.A. averts fiscal crisis as New York Strikes Budget deal. The New York Times.)needs to study the frequency of passengers using MTA for transit to better arrange the schedule for trains/buses. This survey will target at passengers using MTA service during September 2023 and the project studies the number of times of each passenger uses MTA for transit per month in order to help the agency to better arrange the capacity.

    Procedure

    Since this survey aims at passengers who uses MTA service in September 2023, the target population is every individual who uses MTA service in September 2023. After gaining agreement from MTA for assistance, the frame population should be every individual on the follower list of @MTA on Twitter/X and the MTA’s mailing list. Finally, we will collect data by having @MTA posting the questionnaire on Twitter/X and send through email to everyone on the mailing list. Thus, the sample population would be everyone who finished the survey either on Twitter or link from the email.As incentive, we would have everyone who completed the questionnaire a chance to win a newest MacBook by having them in the pool.

    As for the sampling method, we will use simple random sampling and we will randomly select 1000 individuals for research.

    Certain advantages of our procedure include the low cost and easiness to conduct, since distributing survey through social media is common and no cost is needed. However, certain limitation exist as well. To start with, one single Mac Book as prize might not be motivative enough since the chance of winning is obviously low. Furthermore, as mega city inhabitants, citizens of New York City use public transpiration frequently, which means it might be hard for them to accurately recall the exact times they have used the service in a given month either.

    Showcasing the survey.

    https://docs.google.com/forms/d/e/1FAIpQLSf9K8vRfRMPrAjOOHAxCzOv2UpJHQf0kI99zwKs_UJkbw4WUQ/viewform?usp=sf_link

    Question 1 With the best of your knowledge, how many times have you used MTA for transition in September 2023?
    _____(integer answer only)
    <The question might be straightforward, but it is hard to recall in precision. Therefore error is inevitable. >

    Question 2 Rate from 1 to 5, your general experience with MTA service.(Multiple choice)
    -1
    -2
    -3
    -4
    -5
    <Having the rating from passengers into account, we can study the relationship between level of satisfaction and the frequency of usage>

    Question 3 Which borough do you live? (Multiple choice)
    -Manhattan
    -Brooklyn
    -Queens
    -The Bronx
    -Staten island
    <Having the borough information into account, we can study the relationship between the borough of residence and the frequency of usage.However,we will have 5 indicator variables to work with.>

    Part 2: Data Analysis

    Data Simulation.

    To perform the simulation of samples obtained through Simple Random Sampling using R, we begin by assigning random id from 1 to 10000 to these 1000 samples. Then we generate 1000 numbers representing the number of rides each individual has taken during September of 2023 following normal distribution assuming the average number of rides being 90 with standard deviation of 20.Rounding to 0 decimal is also applied since the number of rides should be integer. Next, we choose 1000 samples from 1 to 5 inclusive to represent the rating of experience of the 1000 samples. For income,we generate 1000 numbers representing the number of rides each individual’s monthly income following normal distribution assuming the average monthly income being 51000 USD with standard deviation of 2000 USD. Here, round to 2 decimals is applied since the lowest amount is 0.01 USD. As for time of ridership, we generate 1000 numbers representing the usual time they spend for each ride following normal distribution assuming the average time of transit being 30 minutes with standard deviation of 20 minutes with rounding to 0 decimal places applied. Assuming each individual can only live in one borough, we first generate 1000 samples with the values from (“Manhattan”, “Brooklyn”, “Queens”, “Bronx”, “StatenIsland”), representing where each individual individual lives. Then we initiate 5 Indicator variables with “N”, each of size 1000 which represent of whether each individual lives in the given borough. Next, we iterate through every value of “Manhattan”, “Brooklyn”, “Queens”, “Bronx”, “StatenIsland” we generated previously, and change to the corresponding indicator variable to “Y” when the value is equal to the title of the indicator variable. Finally, we randomly select 1000 samples from (“Y”, “N”) to simulate whether the individual takes transfers.

    Data showcase

    The data set we simulated using R consists of 1000 observations of MTA passengers, along with their number of rides per month, their rating of experience, their time on each transit, their income, whether they living Manhattan,whether they living Brooklyn,whether they living Queens,whether they living Bronx,whether they living Staten Island,and whether they transfer.

    Below is a histogram of a variable we are interested: the number of rides per month.

    The figure 1 is a histogram of the number of MTA rides per month which is close to a symmetric distribution across both ends of the x axis with the mode close to 90.

    Table 1: summary of the sample
    min max mean median sd
    14 145 89.379 90 19.5343

    From the summary table, we can see that the mean and median are close which shows the symmetry of the distribution of rides per month. However, with a standard deviation relatively large(19.5343), the confidence interval is expected to be wide.

    Methods

    The simulated data set yields a sample mean of 89.379, which is merely a mathematical average value of number of MTA rides per month in the sample. To gain a better understanding of the situation, we will perform a 95% T Confidence Interval to calculate a reasonable range of values for the population average number of MTA rides per month, which helps MTA to predict the number of passengers and the better allocate the resources. And to compute the 95% T Confidence Interval, we assume that the samples are independent and then the confidence interval would be \(\mu \in \bar x \pm t_{\alpha/2,n-1}\frac{s}{\sqrt{n}}\) where\(\bar x\) is the sample mean, n is the sample size, s is the standard deviation and \(t_{\alpha/2,n-1}\) is the value from t-distribution with \(\alpha = 0.05\)

    The coronavirus outbreak and the consequent “New York State on PAUSE” executive order to close all non-essential business sent both subway and bus ridership to the unprecedented lowest point in April, when subway was at 8% of 2019 ridership and bus at 23%. (Subway and bus ridership for 2020. MTA. (n.d.). https://new.mta.info/agency/new-york-city-transit/subway-bus-ridership-2020 ). Therefore, since it’s 2023 already, performing a hypothesis testing on whether the average ridership per month is greater than 60 to see if on average, each individual uses MTA for transit more than twice a day. Since the sample size is large, we assume the assumption of normal distribution is met by Central Limit Theorem. The null hypothesis in this scenario would be the average monthly ridership per person being exactly 60. And the alternative hypothesis would be the average monthly ridership per person being more than 60.\(H_o: \mu = 60, H_a: \mu > 60\) where \(\mu\) is the population average of monthly ridership of MTA per person in the population. And the test statistic is \(\frac{\bar x - \mu_0}{\sigma/\sqrt{n}}\) where\(\bar x\) is the sample mean, n is the sample size, s is the standard deviation and \(\mu_0 = 60\)

    Result

    The result of 95% T confidence Interval is (88.16681 90.59119), which means MTA can expect the population mean, the average monthly ridership all passengers in New York being around 90.

    Table 2: Test statistics and p-value
    item value
    Test statistics 47.5597
    p-value 0.0000

    The result of the hypothesis testing for \(H_o: \mu = 60, H_a: \mu > 60\) is that t statistics = 47.5597 and p-value = 0.From the result, we can see there is strong evidence for us to support the alternative hypothesis which suggests in 2023, the monthly MTA ridership is greater than 60 which means in average, each passenger in population used MTA service more than twice which makes sense since most restrictions of Covid have been lifted.

    Part 3: Reference

    Generative AI

    No AI is used

    Bibliography

    • Ley, A. (2023, April 28). M.T.A. averts fiscal crisis as New York Strikes Budget deal. The New York Times.
    • Subway and bus ridership for 2020. MTA. (n.d.). https://new.mta.info/agency/new-york-city-transit/subway-bus-ridership-2020
    • Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” Journal of Open Source Software, 4(43), 1686. doi:10.21105/joss.01686 https://doi.org/10.21105/joss.01686.
    • Yihui Xie (2022). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.41.

    Appendix

    ## Rows: 1,000
    ## Columns: 11
    ## $ id           <int> 202, 5821, 3022, 706, 3196, 6575, 5634, 4305, 6995, 8190,…
    ## $ rides        <dbl> 99, 69, 80, 107, 125, 107, 129, 80, 94, 65, 66, 61, 78, 9…
    ## $ rating       <int> 4, 1, 4, 3, 4, 1, 3, 1, 1, 4, 2, 3, 1, 5, 4, 4, 4, 2, 5, …
    ## $ time         <dbl> 38, 25, 36, 45, 28, 37, 13, 43, 36, 33, 8, 41, 26, 27, 36…
    ## $ income       <dbl> 50591.68, 52182.07, 51448.58, 52295.76, 49086.32, 47017.1…
    ## $ Manhattan    <chr> "Y", "N", "N", "Y", "N", "N", "N", "N", "N", "N", "N", "N…
    ## $ Brooklyn     <chr> "N", "N", "Y", "N", "N", "N", "N", "Y", "N", "Y", "Y", "N…
    ## $ Queens       <chr> "N", "N", "N", "N", "Y", "Y", "N", "N", "N", "N", "N", "N…
    ## $ Bronx        <chr> "N", "N", "N", "N", "N", "N", "Y", "N", "Y", "N", "N", "Y…
    ## $ StatenIsland <chr> "N", "Y", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N…
    ## $ transfer     <chr> "N", "N", "Y", "N", "N", "Y", "N", "N", "Y", "Y", "Y", "Y…

  • - Copyright © Ruilin's blog - Powered by Blogger - Designed by Johanes Djogan -