Archive for January 1970
STA305H1F Sec L0101, W2024
Statistical Science Major,
Computer Science Minor, Mathematics Minor
Student name: RUILIN PENG
Student ID: 1005762765
Student name: RUILIN PENG
Student ID: 1005762765
2024-03-30
In terms of software engineering, one of the fields I have the most experience in is home server deployment. As an important indicator of server performance, the response time that it takes to load a web app from a home server is impacted by many factors. Thus studying how these factors impact response time helps us gain better insights into how to improve the server performance. In this study, I aim to investigate 3 of the most testable factors in the setting of home server deployment and understand the magnitude of their effects individually and combined.
Experimental design
I experimented with a specific full-stack app of mine to study these 3 most testable factors – the RAM size of the server (the first factor), access method (the second factor), and during the request, login status of the app (the third factor). The web app is a full-stack project that has been hosted on two devices of mine- one is a single board computer with 4GB RAM that is dedicated as a home server, while the other is my working laptop with 8GB RAM that is temporarily used as a server. Both devices can serve the website over the Internet with NGINX, DNS, and HTTPS as well as private networks with local IP addresses. However, a larger RAM size of the server would theoretically have the backend processes run faster. Furthermore, compared to being accessed from a private network by IP (192.168.x.x), access from the Internet would require accessing the Internet, DNS redirecting, SSL handshaking, etc which should theoretically take more time. Lastly, if a request’s payload suggests that the user is logged in, then the backend would also need to fetch the user’s data from my Postgresql database which also should theoretically take some more time to finish the processing of the request. The response times from the server therefore are accompanied by a \(2^3\) factorial design with
\(\begin{tabular}{|c|c|c|} \hline Factors & Level 1 & Level 2 \\ \hline RAM size & 4 GB RAM(-1) & 8 GB RAM(+1) \\ \hline Access method & From Internet(-1) & Private Network(+1) \\ \hline Login status & Logged-in(-1) & Not logged in(+1) \\ \hline \end{tabular}\)
Thus this is a \(2^3\) factorial design with 3 factors and each has 2 levels. A replicated \(2^3\) factorial design is conducted with 16 observations. First, using Postman, an API testing tool, I tested response times, in ms with 8 different configurations/settings. Then to test variablility, I replicated the same configurations/settings on another day.
Data collocted:
Exploratory Data Analysis
A cube plot would allow us to easily visualize the interplay between the three main effects we are interested:ram, access, and login. Here, we can observe the pattern of how all three factors can individually have a negative effect on response time.
The interaction plot employing ram and access shows that an interaction
effect exists between these two variables.
The interaction plot employing ram and login shows that the interaction effect exists and is significant between these two variables.
The interaction plot employing access and login shows that the interaction effect exists and is significant between these two variables.
Linear model assumptions check
Since in the QQ plots, all the points fall approximately along a
straight line. The normality of residuals can be assumed. However, a
fanning pattern can be observed which suggests the existence of
heteroscedasticity.
Linear model fitted
To minimize heteroscedasticity, let’s apply log transformation on response time and fit a new linear regression model.
Assuming that all responses are independent, all variances of errors are constant, and lastly all residuals are normally distributed, I can fit a linear regression model with all the main effects and interaction effects as independent variables.
\(log(time) = \beta_0 = \beta_1 x_{ram} + \beta_2 x_{access} = \beta_3 x_{login} + \beta_4 x_{ram:access} + \beta_5 x_{ram:login} = \beta_6 x_{access:login} + \beta_7 x_{ram:access:login}\)
The result of the linear regression model would allow us to study the specific effects individually and combined. The significance level could also be obtained with step since we get p-values from the model fitted.
Main effects and interactions:
By multiplying regression coefficients by 2, we got the main and interaction effects as follows:
\(\begin{tabular}{|c|c|} \hline factor & effect \\ \hline ram & -0.332 \\ \hline access & -0.652 \\ \hline login & -0.172 \\ \hline ram:access & 0.041 \\ \hline ram:login & 0.114 \\ \hline access:login & -0.172 \\ \hline ram:access:login & 0.050 \\ \hline \end{tabular}\)
Interpretations of main effects and interaction effects:
- The estimated log response time decreases by 0.332 if we use the 8GB RAM server. Correspondingly, the expected decrease by a factor of \(e^{-0.332} = 0.717\), which means the one unit increase in ram, the expected response time is expected to be 28.3% faster, holding all other predictors constant. The p-value for the estimated coefficient of ram is < 0.05 and thus we reject \(H_0\) and having a server of larger RAM size would affect the response time negatively. 
- The estimated log response time decreases by 0.652 when we access from the private network. Correspondingly, the expected decrease by a factor of \(e^{-0.652} = 0.521\), which means the one unit increase in ram, the expected response time is expected to be 47.9% faster, holding all other predictors constant. The p-value for the estimated coefficient of ram is < 0.05 and thus we reject \(H_0\) and accessing from a private network would lead to a shorter response time. 
- The estimated log response time decreases by 0.171 when we access from a not logged-in state. Correspondingly, the expected decrease by a factor of \(e^{-0.172} = 0.841\), which means the one unit increase in ram, the expected response time is expected to be 15.9% faster, holding all other predictors constant. The p-value for the estimated coefficient of login is < 0.05 and thus we reject \(H_0\) and accessing from a not logged-in state would lead to a shorter response time. 
- The estimated increase in log response time associated with the interaction of ram and access is 0.041. Correspondingly, the expected decrease by a factor of \(e^{0.041} = 1.042\), which means the one unit increase in ram:access, the expected response time is expected to be 4.2% slower, holding all other predictors constant. The p-value for the estimated coefficient of ram:access is < 0.05 and thus we reject \(H_0\) and we can conclude that a greater RAM size has more effect in reducing response time when accessed from a private network compared to access from the Internet. 
- The estimated increase in log response time associated with the interaction of ram and login is 0.114. Correspondingly, the expected decrease by a factor of \(e^{0.114} = 1.121\), which means the one unit increase in ram:login, the expected response time is expected to be 12.2% slower, holding all other predictors constant. The p-value for the estimated coefficient of ram:login is < 0.05 and thus we reject \(H_0\) and we can conclude that a greater RAM size has more effect in reducing response time when the payload indicates that the user is not logged in compared to logged in. 
- The estimated decrease in log response time associated with the interaction of access and login is 0.171. Correspondingly, the expected decrease by a factor of \(e^{-0.172} = 0.843\), which means the one unit increase in access:login, the expected response time is expected to be 15.7% quicker, holding all other predictors constant. The p-value for the estimated coefficient of access:login is < 0.05 and thus we reject \(H_0\) and we can conclude that accessing from a private network would have more effect of reducing response time when the payload indicates that the user is logged in compared to not logged in. 
- The interaction effects of ram:access:login has p-value > 0.05, thus we fail to reject \(H_0\), and there lacks evidence of interaction of 3 effects together. 
\(\begin{tabular}{|c|c|c|} \hline Coefficients & 2.5\% & 97.5\% \\ \hline (Intercept) & 5.17274779 & 5.35678344\\ \hline ram & -0.42443710 & -0.24040144\\ \hline access & -0.74452354 & -0.56048789\\ \hline login & -0.26430279 & -0.08026713\\ \hline ram:access & 0.05038626 & 0.13364939\\ \hline ram:login & 0.02238918 & 0.20642484\\ \hline access:login & -0.26320083 & -0.07916517\\ \hline ram:access:login & -0.04129435 & 0.14274130\\ \hline \end{tabular}\)
Interpretations of Confidence Interval:
- The confidence interval for the interaction term “ram:access:login” ranges from -0.04129435 to 0.14274130 which includes 0. Thus there is insufficient evidence to conclude that there is a significant interaction effect between RAM size, access method, and login status on response time.
- All other main effects and interaction effects (RAM size, access method, login status, ram:access, ram:login, and access:login) have confidence intervals that do not contain 0. Thus, there is evidence to suggest that the size of RAM, access method, login status, as well as other interaction terms, have a statistically significant influence on response time.
After conducting the experiment looking into factors affecting the response time of a web server serving a web app by employing a \(2^3\) factorial design with replication, using statistical analysis, we proved that factors that significantly influence the response time of the web app include the size of RAM allocated to the hosting PC, the access method (whether from the internet or a private network), and the login status (logged-in or not logged-in).
Specifically, increasing the RAM size from 4GB to 8GB resulted in a significant decrease in response time, with an estimated reduction of 28.3% of response time, depending on the access method and login status. Additionally, accessing the web app from a private network was associated with a considerable decrease in response time compared to accessing it from the internet, with an estimated reduction of 47.9% in response time. Furthermore, the login status of the user also played a significant role in response time, with logged-in users experiencing a reduction in response time compared to non-logged-in users, with an estimated decrease of 15.9%.
Regarding interaction effects, the analysis found significant interaction effects between RAM size and access method, between RAM size and login status, and between access method and login status. The positive interaction effects between RAM size and access method, between RAM size and login status suggest that the effect of RAM size of reducing the response time is amplified when accessed from a private network or the user is not logged in. On the other hand, the negative interaction effects between the access method and the login status suggest that the effect of access from a private network would be amplified when the user is logged in.
However, there lack of evidence to support a significant interaction effect that combines all three factors.
Grammarly: free AI writing assistance. (n.d.). https://app.grammarly.com/
Groemping U (2023). FrF2: Fractional Factorial Designs with 2-Level Factors. R package version 2.3-3, https://CRAN.R-project.org/package=FrF2.
Peng, R. (2024, March 17). Guide for setting up a home server with ubuntu server edition. Ruilin’s Blog. https://blog.ruilinp.com/posts/ubuntu/
 
