Investigation of Diagnostic checks and finding Outliers in fitting Regression models

The fitting of regression model has problems related to non-linearity, multicollinearity, serial correlation and heteroscedasticity which involves very long and complex procedure of calculations and analysis. This study focuses on an improvement in the model fit based on R2 value. An attempt is made to investigate the outliers in any data set and to increase the R2 square value after the removal of outliers. In this study, a hypothetical data set is considered.  The data set indicates consumption as a dependent variable and Income, Food size are considered as independent variables. The regression model for Actual data indicates the R2 value is 0.455. After the removal of outliers using the cook’s distance, the revised R2 value is 0.578. This indicates that the outlier in the data set plays a vital role in the model fit. Therefore it is necessary to remove the outlier if any in the data, before proceeding to further analysis.