Logistic regression with XLSTAT

Dataset for Logistic regression for binary response data and polytomous variables (Logit, Probit) XLS69.0 KB

Tutorial video
Logistic regression for binary response data and polytomous variables (Logit, Probit) is part of: Download Trial version More details See users' feedback
  • Pro Core statistical software

  • System configuration

    • Windows:
      • Versions: 9x/Me/NT/2000/XP/Vista/Win 7
      • Excel: 97 and later
      • Processor: 32 or 64 bits
      • Hard disk: 150 Mb
    • Mac OS X:
      • OS: OS X
      • Excel: X, 2004 and 2011
      • Hard disk: 150Mb.

Benefits

  • Easy and user-friendly
    Easy and user-friendly XLSTAT is flawlessly integrated with Microsoft Excel which is the most popular spreadsheet worldwide. This integration makes it one of the simplest available tools to work with as it utilizes the same philosophy as Microsoft Excel. The program is accessible in a dedicated XLSTAT tab. The analyses are grouped into functional menus. The dialog boxes are user-friendly and setting up an analysis is straightforward.
  • Data and results shared seamlessly
    Data and results shared seamlessly One of the greatest advantages of XLSTAT is the way you can share data and results seamlessly. As the results are stored in Microsoft Excel, anyone can access them. There is no need for the receiver to have an XLSTAT license or any additional viewer which makes your team-work easier and more affordable. In addition, results are easily integrable into other Microsoft Office software such as PowerPoint, so that you can create striking presentation in minutes.
  • Modular
    Modular XLSTAT is a modular product. XLSTAT-Pro is a core statistical module of XLSTAT which includes all the mainstream functionalities in statistics and multivariate analysis. More advanced features contained in add-on modules can be added for specific applications. This way you can adapt the software to your needs making the software more cost-efficient.
  • Didactic
    Didactic The results of XLSTAT are organized by analysis and are easy to navigate. Moreover useful information is provided along with the results to assist you in your interpretation.
  • Affordable
    Affordable XLSTAT is a complete and modular analytical solution that can suit any analytical business needs. It is very reasonably priced so that the return of your investment is almost immediate. Any XLSTAT license comes with top level support and assistance.
  • Accessible - Available in many languages
    Accessible - Available in many languages We have ensured XLSTAT is accessible to everyone by making the program available in many languages, including Chinese, English, French, German, Italian, Japanese, Polish, Portuguese and Spanish.
  • Automatable and customizable
    Automatable and customizable Most of the statistical functions available in XLSTAT can be called directly from the Visual Basic window of Microsoft Excel. They can be modified and integrated to more code to fit to the specificity of your domain. Adding tables and plots as well as modifying existing outputs becomes easy. Furthermore, XLSTAT includes some special tools on the dialog boxes to generate automatically the VBA code in order to reproduce your analysis using the VBA editor or to simply load pre-set settings. This effortless automation of routine analysis will be a huge time saver on your part.

Logistic regression

Logistic regression, and related methods such as Probit analysis, are very useful techniques when one wants to understand or to predict the effect of a series of variables on a binary response variable (a variable which can take only two values, 0/1 or Yes/no, for example). Logistic regression can be helpful to model the effect of doses in medicine or agriculture, or to anticipate the likelihood of customers responding to a direct mail, or to evaluate the risk for a bank that a client will not pay back a loan.

With XLSTAT you can either run the logistic regression on raw data (the response is given as 0s and 1s) or on aggregated data (the response is a sum of "successes" or ones, and the number of repetitions must also be available).

log1.gif Example of raw data - (effect of temperature on the resistance of a chip)

log2.gif Example of aggregated data - (effect of an insecticide on a specific species of insects)

Note that Addinsoft has developed a specific module for dose analysis. This module is called XLSTAT-Dose and can be ordered separately.

The methodology of logistic regression aims at modeling the probability of success depending on the values of the explanatory variables, which can be categorical or numerical variables.

Dataset for logistic regression

The example treated here is a marketing case where we want to detect if customers are likely to renew their subscription for an online sports information service. An Excel sheet with both the data and the results can be downloaded by clicking here.

The data show a sample of 60 users, with their age category, the average number of pages viewed per week, and the number of pages viewed during the previous week. These users have been offered the possibility of renewing their subscription, which is expiring in two weeks. Our goal is to understand why some renewed their subscription and others did not.

Goal of the logistic regression

Using a Logistic regression model, we want to explain the results we obtained, and then use the model on the whole population in order to identify the users who might not renew the subscription. These users could be targeted and offered an incentive (an added service, for example) to renew.

Setting up a logistic regression

To activate the Logistic regression dialog box, start XLSTAT, then select the XLSTAT / Modeling data / Logistic regression data command, or click on the corresponding button of the Modeling Data toolbar (see below).

barlog1.gif

When you click on the button, the Logistic regression dialog box appears. Select the data on the Excel sheet.

The Response corresponds to the column where the binary variable or the counts of positive cases are stored (NB: when using aggregated data the Weights must be selected). In this particular case we have three explanatory variables, one categorical - the age group - and two numerical: the pages counts.

As we selected the column titles of all variables, we have selected the option Column labels included.

log3.gif

The computations begin once you have clicked on the OK button.

Interpretation of the logistic regression

The following table gives details on the model. This table is helpful in understanding the effect of the various variables and the relative effects of the age categories.

log4.gif

On this table we can see from looking at the probability of the Chi-squares that the variable most influencing renewal is the number of pages viewed the previous month. The intercept is significant, and the fact that the customer's age is between 40-49 also has a strong negative influence on subscription renewal. This last point needs to be interpreted by marketing people so that the right action can be taken towards this specific population.

The next table gives several indicators of the quality of the model (or goodness of fit). These results are equivalent to the R2 and to the analysis of variance table in linear regression and ANOVA. The most important value to look at is the probability of Chi-square test on the log ratio. This is equivalent to the Fisher's F test: we try to evaluate if the variables bring significant information by comparing the model as it is defined with a simpler model with only one constant. In this case, as the probability is lower than 0.0001, we can conclude that significant information is brought by the variables.

log5.gif

The last step is the application of the model on the whole population. In this case the model writes: Y = Exp( L(x) ) / [ 1 + Exp( L(x) ], where L(x) = -2.3567 + 0.0235.AvPage/Week + 0.0893.Page/Week + Factor and Factor takes the value of the parameter corresponding to the age group to which a customer belongs.

When we applied the model to the 600 customers who needed to renew their subscription the following month, we found that only 40% would renew. By taking the right marketing actions, we were able to boost the result to 85%!