Load the Dataset

library(xlsx)
data <- read.xlsx("Regression Sample One.xlsx", sheetIndex = "Sheet1")
str(data)
## 'data.frame':    184 obs. of  16 variables:
##  $ NA.                   : Factor w/ 184 levels "1","10","100",..: 1 97 108 119 130 141 152 163 174 2 ...
##  $ Mode.of.Payment       : Factor w/ 3 levels "Cash","Govt Funded",..: 1 1 3 3 3 3 3 3 3 3 ...
##  $ Smoker                : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ HPN                   : Factor w/ 2 levels "No","Yes": 1 1 1 1 2 1 1 1 1 1 ...
##  $ DM                    : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 2 ...
##  $ Surgery               : Factor w/ 9 levels "Abdominal","Anorectal",..: 2 6 5 5 5 5 5 5 5 5 ...
##  $ AnesthesiaTime        : num  68 85 30 35 40 35 38 42 44 51 ...
##  $ Sedative              : Factor w/ 2 levels "No","Yes": 1 2 2 2 1 2 2 1 2 2 ...
##  $ Antiemetic            : Factor w/ 2 levels "No","Yes": 1 1 1 2 1 1 2 2 2 1 ...
##  $ Cardiac               : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ IntraopHypotension    : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Non.Ambulatory        : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 2 1 1 ...
##  $ PainPersists          : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ DoNotVoid             : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Delay.in.Documentation: Factor w/ 2 levels "Delayed","NotDelayed": 1 2 2 2 1 2 2 1 2 2 ...
##  $ HospStayMinutes       : num  290 100 120 115 210 120 120 230 115 110 ...
data$NA. <- NULL

Create Classification and Regression Tree

library(rattle)
library(rpart.plot)
set.seed(1); fit <- rpart(HospStayMinutes~., data = data)
rpart.plot(fit)

#It can be observed that those who have a delay in documentation processing (shown on the right side of CART;
# "Delay.in.Documentation = NotDelayed" - NO) are likely to have longer hospital stay at around 247 minutes
# whereas those who do not have delay (shown in left) stay for about 128 minutes. Among those with documentation
# processing delay, those who have abdominal, anorectal, breast, gyne, ORL and orthopedic surgery are likely to have
# shorter stay compared to other forms of surgery. Among patients who had the aforementioned surgeries, those who had 
# their medical insurance handle the payment had shorter hospital stay. 

Stepwise Regression and Choosing the Best Model with Aikaike Information Criterion (AIC)

fit2 <- lm(HospStayMinutes~., data=data)
best <- step(fit2, direction = "both")
## Start:  AIC=1379.61
## HospStayMinutes ~ Mode.of.Payment + Smoker + HPN + DM + Surgery + 
##     AnesthesiaTime + Sedative + Antiemetic + Cardiac + IntraopHypotension + 
##     Non.Ambulatory + PainPersists + DoNotVoid + Delay.in.Documentation
## 
## 
## Step:  AIC=1379.61
## HospStayMinutes ~ Mode.of.Payment + Smoker + HPN + DM + Surgery + 
##     AnesthesiaTime + Sedative + Antiemetic + Cardiac + Non.Ambulatory + 
##     PainPersists + DoNotVoid + Delay.in.Documentation
## 
##                          Df Sum of Sq    RSS    AIC
## - Surgery                 8     20113 281480 1377.2
## - AnesthesiaTime          1        18 261385 1377.6
## - Mode.of.Payment         2      4023 265390 1378.4
## - DM                      1      1733 263100 1378.8
## - HPN                     1      1967 263334 1379.0
## - Sedative                1      2600 263967 1379.4
## <none>                                261367 1379.6
## - Smoker                  1      2923 264289 1379.7
## - Cardiac                 1      5993 267360 1381.8
## - Antiemetic              1      8443 269810 1383.5
## - Non.Ambulatory          1     10073 271439 1384.6
## - DoNotVoid               1     13080 274446 1386.6
## - PainPersists            1     21546 282913 1392.2
## - Delay.in.Documentation  1    268182 529548 1507.5
## 
## Step:  AIC=1377.25
## HospStayMinutes ~ Mode.of.Payment + Smoker + HPN + DM + AnesthesiaTime + 
##     Sedative + Antiemetic + Cardiac + Non.Ambulatory + PainPersists + 
##     DoNotVoid + Delay.in.Documentation
## 
##                          Df Sum of Sq    RSS    AIC
## - AnesthesiaTime          1        33 281513 1375.3
## - Sedative                1       193 281673 1375.4
## - Mode.of.Payment         2      3466 284946 1375.5
## - HPN                     1       813 282294 1375.8
## - Smoker                  1      1634 283114 1376.3
## - DM                      1      1704 283184 1376.4
## <none>                                281480 1377.2
## - Antiemetic              1      5962 287443 1379.1
## + Surgery                 8     20113 261367 1379.6
## - Cardiac                 1      8332 289812 1380.6
## - Non.Ambulatory          1      9340 290820 1381.3
## - DoNotVoid               1     27839 309320 1392.6
## - PainPersists            1     31410 312890 1394.7
## - Delay.in.Documentation  1    300557 582037 1508.9
## 
## Step:  AIC=1375.27
## HospStayMinutes ~ Mode.of.Payment + Smoker + HPN + DM + Sedative + 
##     Antiemetic + Cardiac + Non.Ambulatory + PainPersists + DoNotVoid + 
##     Delay.in.Documentation
## 
##                          Df Sum of Sq    RSS    AIC
## - Sedative                1       193 281706 1373.4
## - Mode.of.Payment         2      3508 285021 1373.5
## - HPN                     1       844 282357 1373.8
## - DM                      1      1683 283195 1374.4
## - Smoker                  1      1716 283229 1374.4
## <none>                                281513 1375.3
## - Antiemetic              1      5933 287446 1377.1
## + AnesthesiaTime          1        33 281480 1377.2
## + Surgery                 8     20128 261385 1377.6
## - Cardiac                 1      8889 290402 1379.0
## - Non.Ambulatory          1      9393 290906 1379.3
## - DoNotVoid               1     28206 309719 1390.8
## - PainPersists            1     31454 312967 1392.8
## - Delay.in.Documentation  1    300651 582164 1507.0
## 
## Step:  AIC=1373.4
## HospStayMinutes ~ Mode.of.Payment + Smoker + HPN + DM + Antiemetic + 
##     Cardiac + Non.Ambulatory + PainPersists + DoNotVoid + Delay.in.Documentation
## 
##                          Df Sum of Sq    RSS    AIC
## - Mode.of.Payment         2      3510 285215 1371.7
## - HPN                     1       921 282627 1372.0
## - DM                      1      1621 283327 1372.5
## - Smoker                  1      1769 283475 1372.5
## <none>                                281706 1373.4
## - Antiemetic              1      5827 287533 1375.2
## + Sedative                1       193 281513 1375.3
## + AnesthesiaTime          1        32 281673 1375.4
## - Cardiac                 1      8853 290559 1377.1
## - Non.Ambulatory          1      9297 291002 1377.4
## + Surgery                 8     17716 263990 1377.5
## - DoNotVoid               1     28622 310328 1389.2
## - PainPersists            1     31508 313214 1390.9
## - Delay.in.Documentation  1    345082 626788 1518.5
## 
## Step:  AIC=1371.68
## HospStayMinutes ~ Smoker + HPN + DM + Antiemetic + Cardiac + 
##     Non.Ambulatory + PainPersists + DoNotVoid + Delay.in.Documentation
## 
##                          Df Sum of Sq    RSS    AIC
## - HPN                     1       816 286032 1370.2
## - DM                      1      1687 286902 1370.8
## - Smoker                  1      2355 287571 1371.2
## <none>                                285215 1371.7
## - Antiemetic              1      5193 290409 1373.0
## + Mode.of.Payment         2      3510 281706 1373.4
## + Sedative                1       195 285021 1373.5
## + AnesthesiaTime          1        75 285140 1373.6
## - Cardiac                 1      8313 293529 1375.0
## - Non.Ambulatory          1      8568 293783 1375.1
## + Surgery                 8     17089 268127 1376.3
## - DoNotVoid               1     27521 312736 1386.6
## - PainPersists            1     30833 316049 1388.6
## - Delay.in.Documentation  1    345343 630558 1515.7
## 
## Step:  AIC=1370.2
## HospStayMinutes ~ Smoker + DM + Antiemetic + Cardiac + Non.Ambulatory + 
##     PainPersists + DoNotVoid + Delay.in.Documentation
## 
##                          Df Sum of Sq    RSS    AIC
## - DM                      1      1448 287480 1369.1
## - Smoker                  1      2052 288084 1369.5
## <none>                                286032 1370.2
## - Antiemetic              1      5314 291345 1371.6
## + HPN                     1       816 285215 1371.7
## + Mode.of.Payment         2      3405 282627 1372.0
## + Sedative                1       256 285776 1372.0
## + AnesthesiaTime          1       115 285916 1372.1
## - Cardiac                 1      7777 293809 1373.1
## - Non.Ambulatory          1      8212 294243 1373.4
## + Surgery                 8     16171 269861 1375.5
## - DoNotVoid               1     27921 313953 1385.3
## - PainPersists            1     31452 317484 1387.4
## - Delay.in.Documentation  1    352236 638268 1515.9
## 
## Step:  AIC=1369.13
## HospStayMinutes ~ Smoker + Antiemetic + Cardiac + Non.Ambulatory + 
##     PainPersists + DoNotVoid + Delay.in.Documentation
## 
##                          Df Sum of Sq    RSS    AIC
## - Smoker                  1      2185 289665 1368.5
## <none>                                287480 1369.1
## + DM                      1      1448 286032 1370.2
## - Antiemetic              1      5081 292561 1370.3
## + HPN                     1       578 286902 1370.8
## + Mode.of.Payment         2      3479 284001 1370.9
## + Sedative                1       179 287301 1371.0
## + AnesthesiaTime          1        67 287412 1371.1
## - Cardiac                 1      8017 295497 1372.2
## - Non.Ambulatory          1      8261 295740 1372.3
## + Surgery                 8     16283 271197 1374.4
## - DoNotVoid               1     27477 314957 1383.9
## - PainPersists            1     34466 321946 1388.0
## - Delay.in.Documentation  1    351435 638915 1514.1
## 
## Step:  AIC=1368.52
## HospStayMinutes ~ Antiemetic + Cardiac + Non.Ambulatory + PainPersists + 
##     DoNotVoid + Delay.in.Documentation
## 
##                          Df Sum of Sq    RSS    AIC
## <none>                                289665 1368.5
## + Smoker                  1      2185 287480 1369.1
## + DM                      1      1581 288084 1369.5
## + Mode.of.Payment         2      4058 285607 1369.9
## - Antiemetic              1      5683 295348 1370.1
## + HPN                     1       317 289348 1370.3
## + Sedative                1       225 289440 1370.4
## + AnesthesiaTime          1       186 289479 1370.4
## - Non.Ambulatory          1      7148 296813 1371.0
## - Cardiac                 1      7677 297343 1371.3
## + Surgery                 8     15549 274116 1374.4
## - DoNotVoid               1     28675 318340 1383.9
## - PainPersists            1     36265 325930 1388.2
## - Delay.in.Documentation  1    349267 638932 1512.1

Best Model via lowest AIC (1368.52)

summary(best)
## 
## Call:
## lm(formula = HospStayMinutes ~ Antiemetic + Cardiac + Non.Ambulatory + 
##     PainPersists + DoNotVoid + Delay.in.Documentation, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -114.49  -22.38  -10.13   23.17  114.87 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                       222.147      6.704  33.138  < 2e-16 ***
## AntiemeticYes                      12.748      6.841   1.863   0.0641 .  
## CardiacYes                        -25.177     11.624  -2.166   0.0317 *  
## Non.AmbulatoryYes                  17.643      8.442   2.090   0.0381 *  
## PainPersistsYes                    89.011     18.909   4.707 5.05e-06 ***
## DoNotVoidYes                       39.703      9.485   4.186 4.47e-05 ***
## Delay.in.DocumentationNotDelayed -104.770      7.172 -14.609  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 40.45 on 177 degrees of freedom
## Multiple R-squared:  0.7004, Adjusted R-squared:  0.6903 
## F-statistic: 68.98 on 6 and 177 DF,  p-value: < 2.2e-16
# The best model (lowest AIC) shows that the predictors for Hospital Stay in Minutes included whether antiemetic was given, whether patient is ambulatory, existence of persistent pain, whether patient was able to void and the time of documentation delay.

Among these variables, those that were considered as significant predictors and are meaningful include:

Whether patient was ambulatory: Non-ambulatory patients were likely to have longer hospital stay by 18 minutes

Presence of persistent pain: Patients with persistent pain were likely to have longer hospital stay by 89 minutes

Voiding: Those where were not able to void had longer hospital stay by 40 minutes

Delay in Documentation Processing: Patients without documentation delay had shorter hospital stay by about 105 minutes

Given the findings, it seems that documentation delay is the most probable factor that prolongs patients’ stay in the hospital.