# Logistic Regression with Minitab

### What is Logistic Regression?

Logistic regression is a statistical method to predict the probability of an event occurring by fitting the data to a logistic curve using logistic function. The regression analysis used for predicting the outcome of a categorical dependent variable, based on one or more predictor variables. The logistic function used to model the probabilities describes the possible outcome of a single trial as a function of explanatory variables. The dependent variable in a logistic regression can be binary (e.g. 1/0, yes/no, pass/fail), nominal (blue/yellow/green), or ordinal (satisfied/neutral/dissatisfied). The independent variables can be either continuous or discrete.

### Three Types of Logistic Regression

- Binary Logistic Regression
- Binary response variable
- Example: yes/no, pass/fail, female/male

- Nominal Logistic Regression
- Nominal response variable
- Example: set of colors, set of countries

- Ordinal Logistic Regression
- Ordinal response variable
- Example: satisfied/neutral/dissatisfied

All three logistic regression models can use multiple continuous or discrete independent variables and can be developed in Minitab using the same steps.

### How to Run a Logistic Regression in Minitab

Case Study: We want to build a logistic regression model using the potential factors to predict the probability that the person measured is female or male.

Data File: “Logistic Regression” tab in “Sample Data.xlsx”

Response and potential factors

- Response (Y): Female/Male
- Potential Factors (Xs):
- Age
- Weight
- Oxy
- Runtime
- RunPulse
- RstPulse
- MaxPulse

Step 1:

- Click Stat → Regression → Binary Logistic Regression→ Fit Binary Logistic Model
- A new window named “Binary Logistic Regression” appears.
- Click into the blank box next to “Response” and all the variables pop up in the list box on the left.
- Select “Sex” as the “Response.”
- Select “Age”, “Weight”, “Oxy”, “Runtime”, “RunPulse”, “RstPulse”, “MaxPulse” as “Continuous predictors.”

- Click “OK.”
- The results of the logistic regression model appear in session window.

Step 2:

- Check the p-values of all the independent variables in the model.
- Remove the insignificant independent variables one at a time from the model and rerun the model.
- Repeat step 2.1 until all the independent variables in the model are statistically significant.

Since the p-values of all the independent variables are higher than the alpha level (0.05), we need to remove the insignificant independent variables one at a time from the model, starting with the highest p-value. Runtime has the highest p-value (0.990), so it will be removed from the model first.

After removing Runtime from the model, the p-values of all the independent variables are still higher than the alpha level (0.05). We need to continue removing the insignificant independent variables one at a time, continuing with the highest p-value. Age has the highest p-value (0.977), so it will be removed from the model next.

After removing both Age and RunTime from the model, the p-values of the remaining independent variables are still higher than the alpha level (0.05). We need to continue successively removing the insignificant independent variables. Continue with the next highest p-value. RstPulse has the highest p-value (0.803) of the remaining variables, it will be removed next.

After removing RstPulse from the model, the p-values of all the independent variables are still higher than the alpha level (0.05). Continue removing the insignificant independent variables. Weight has the highest p-value (0.218) of the remaining variables, it will be removed next.

After removing Weight from the model, the p-values of the remaining three independent variables are still higher than the alpha level (0.05). Once again, remove the next highest p-value. RunPulse with a p-value of 0.140 should be next.

After removing RunPulse from the model, the last two p-values are still higher than the alpha level (0.05). We need to remove one more insignificant variable, it will be MaxPulse with a p-value of 0.0755.

After removing MaxPulse from the model, the p-value of the only independent variable “Oxy” is lower than the alpha level (0.05). There is no need to remove “Oxy” from the model.

Step 3:

Analyze the binary logistic report in the session window and check the performance of the logistic regression model. The p-value here is 0.031, smaller than alpha level (0.05). We conclude that at least one of the slope coefficients is not equal to zero. The p-value of the “Goodness-of-Fit” tests are all higher than alpha level (0.05). We conclude that the model fits the data.

Step 4: Get the predicted probabilities of the event (i.e., Sex = M) occurring using the logistic regression model.

- Click the “Storage” button in the window named “Binary Logistic Regression” and a new window named “Binary Logistic Regression – Storage” pops up.
- Check the box “Fits (event probabilities).”

- Click “OK” in the window of “Binary Logistic Regression– Storage.”
- Click “OK” in the window of “Binary Logistic Regression.”
- A column of the predicted event probability is added to the data table with the heading “FITS”.

Model summary: In column C10, Minitab provides the probability that the sex is male based on the only statistically significant independent variable “Oxy”.