

CURRICULUM IN CARDIOLOGY  STATISTICS 

Year : 2018  Volume
: 4
 Issue : 1  Page : 3336 

Linear regression analysis study
Khushbu Kumari, Suniti Yadav
Department of Anthropology, University of Delhi, New Delhi, India
Date of Web Publication  4May2018 
Correspondence Address: Khushbu Kumari Department of Anthropology, University of Delhi, New Delhi India
Source of Support: None, Conflict of Interest: None  Check 
DOI: 10.4103/jpcs.jpcs_8_18
Linear regression is a statistical procedure for calculating the value of a dependent variable from an independent variable. Linear regression measures the association between two variables. It is a modeling technique where a dependent variable is predicted based on one or more independent variables. Linear regression analysis is the most widely used of all statistical techniques. This article explains the basic concepts and explains how we can do linear regression calculations in SPSS and excel. Keywords: Continuous variable test, excel and SPSS analysis, linear regression
How to cite this article: Kumari K, Yadav S. Linear regression analysis study. J Pract Cardiovasc Sci 2018;4:336 
Introduction   
The concept of linear regression was first proposed by Sir Francis Galton in 1894. Linear regression is a statistical test applied to a data set to define and quantify the relation between the considered variables. Univariate statistical tests such as Chisquare, Fisher's exact test, ttest, and analysis of variance (ANOVA) do not allow taking into account the effect of other covariates/confounders during analyses (Chang 2004). However, partial correlation and regression are the tests that allow the researcher to control the effect of confounders in the understanding of the relation between two variables (Chang 2003).
In biomedical or clinical research, the researcher often tries to understand or relate two or more independent (predictor) variables to predict an outcome or dependent variable. This may be understood as how the risk factors or the predictor variables or independent variables account for the prediction of the chance of a disease occurrence, i.e., dependent variable. Risk factors (or dependent variables) associate with biological (such as age and gender), physical (such as body mass index and blood pressure [BP]), or lifestyle (such as smoking and alcohol consumption) variables with the disease. Both correlation and regression provide this opportunity to understand the “risk factorsdisease” relationship (Gaddis and Gaddis 1990). While correlation provides a quantitative way of measuring the degree or strength of a relation between two variables, regression analysis mathematically describes this relationship. Regression analysis allows predicting the value of a dependent variable based on the value of at least one independent variable.
In correlation analysis, the correlation coefficient “r” is a dimensionless number whose value ranges from −1 to +1. A value toward −1 indicates inverse or negative relationship, whereas towards +1 indicate a positive relation. When there is a normal distribution, the Pearson's correlation is used, whereas, in nonnormally distributed data, Spearman's rank correlation is used.
The linear regression analysis uses the mathematical equation, i.e., y = mx + c, that describes the line of best fit for the relationship between y (dependent variable) and x (independent variable). The regression coefficient, i.e., r^{2} implies the degree of variability of y due to x.^{[1],[2],[3],[4],[5],[6],[7],[8]}
Significance of Linear Regression   
The use of linear regression model is important for the following reasons:
 Descriptive – It helps in analyzing the strength of the association between the outcome (dependent variable) and predictor variables
 Adjustment – It adjusts for the effect of covariates or the confounders
 Predictors – It helps in estimating the important risk factors that affect the dependent variable
 Extent of prediction – It helps in analyzing the extent of change in the independent variable by one “unit” would affect the dependent variable
 Prediction – It helps in quantifying the new cases.
Assumptions for Linear Regression   
The underlying assumptions for linear regression are:
 The values of independent variable “x” are set by the researcher
 The independent variable “x” should be measured without any experimental error
 For each value of “x,” there is a subpopulation of “y” variables that are normally distributed up and down the Yaxis [Figure 1]
 The variances of the subpopulations of “y” are homogeneous
 The mean values of the subpopulations of “y” lie on a straight line, thus implying the assumption that there exists a linear relation between the dependent and the independent variables
 All the values of “y” are independent from each other, though dependent on “x.”
Coefficient of Determination, R^{2}   
The coefficient of determination is the portion of the total variation in the dependent variable that can be explained by variation in the independent variable(s). When R^{2} is + 1, there exists a perfect linear relationship between x and y, i.e., 100% of the variation in y is explained by variation in x. When it is 0< R^{2}<1, there is a weaker linear relationship between x and y, i.e., some, but not all of the variation in y is explained by variation in x.
Linear Regression in Biological Data Analysis   
In biological or medical data, linear regression is often used to describe relationships between two variables or among several variables through statistical estimation. For example, to know whether the likelihood of having high systolic BP (SBP) is influenced by factors such as age and weight, linear regression would be used. The variable to be explained, i.e., SBP is called the dependent variable, or alternatively, the response variables that explain it age, weight, and sex are called independent variables.
How to Calculate Linear Regression?   
Linear regression can be tested through the SPSS statistical software (IBM Corp. Released 2011. IBM SPSS Statistics for Windows, Version 20.0. Armonk, NY: IBM Corp.) in five steps to analyze data using linear regression. Following is the procedure followed [Table 1], [Table 2], [Table 3], [Table 4]:
Click Analyze > Regression > Linear > then select Dependent and Independent variable > OK (enter).
Example 1 – Data (n = 55) on the age and the SBP were collected and linear regression model would be tested to predict BP with age. After checking the normality assumptions for both variables, bivariate correlation is tested (Pearson's correlation = 0.696, P < 0.001) and a graphical scatter plot is helpful in that case [Figure 2].  Figure 2: Starting Data Analysis ToolPak. Click the OFFICE button and choose Excel options.
Click here to view 
Now to check the linear regression, put SBP as the dependent and age as the Independent variable.
This indicates the dependent and independent variables included in the test.
Pearson's correlation between SBP and age is given (r = 0.696). R^{2} = 0.485 which implies that only 48.5% of the SBP is explained by the age of a person.
The ANOVA table shows the “usefulness” of the linear regression model with P < 0.05.
This provides the quantification of the relationship between age and SBP. With every increase of 1 year in age, the SBP (on the average) increases by 1.051 (95% confidence interval 0.752–1.350) units, P < 0.001. The constant here has no “practical” meaning as it gives the value of the SBP when age = 0.
Further, if more than one independent variable is added, the linear regression model would adjust for the effect of other dependent variables when testing the effect of one variable.
Example 2 – If we want to see the genetic effect of variables, i.e., the effect of increase in per allele dose of any genetic variant (mutation) on the disease or phenotype, linear regression is used in a similar way as described above. The three genotypes, i.e., normal homozygote AA, heterozygote AB and homozygote mutant BB may be coded as 1, 2, and 3, respectively. The test may be preceded, and in a similar way, the unstandardized coefficient (β) would explain the effect on the dependent variable with per allele dose increase.
Example 3 – Using Excel to see the relationship between sale of medicine with the price of the medicine and TV advertisements.
[Table 5] contains data which can be entered into an Excel sheet. Follow instructions as shown in [Figure 2], [Figure 3], [Figure 4].  Figure 3: The Tool Pak. Choose Add Ins > Choose Analysis ToolPak and select Go.
Click here to view 
 Figure 4: The regression screen. Choose Data > Data Analysis > Regression. Input y Range: A1:A8. Input X Range: B1:C8. Check Labels, Residuals, Output Range as A50.
Click here to view 
As shown in [Table 6], Multiple R is the Correlation Coefficient, where 1 means a perfect correlation and zero means none. R Square is the coefficient of determination which here means that 92% of the variation can be explained by the variables. Adjusted R square adjusts for multiple variables and should be used here. here. [Table 7] shows how to create a linear regression equation from the data.
Conclusion   
The techniques for testing the relationship between two variables are correlation and linear regression. Correlation quantifies the strength of the linear relationship between a pair of variables, whereas regression expresses the relationship in the form of an equation. In this article, we have used simple examples and SPSS and excel to illustrate linear regression analysis and encourage the readers to analyze their data by these techniques.
Financial support and sponsorship
Nil.
Conflicts of interest
There are no conflicts of interest.
References   
1.  Schneider A, Hommel G, Blettner M. Linear regression analysis: Part 14 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2010;107:77682. [ PUBMED] 
2.  Freedman DA. Statistical Models: Theory and Practice. Cambridge, USA: Cambridge University Press; 2009. 
3.  Chan YH. Biostatistics 201: Linear regression analysis. Age (years). Singapore Med J 2004;45:5561. [ PUBMED] 
4.  Chan YH. Biostatistics 103: Qualitative data – Tests of independence. Singapore Med J 2003;44:498503. [ PUBMED] 
5.  Gaddis ML, Gaddis GM. Introduction to biostatistics: Part 6, correlation and regression. Ann Emerg Med 1990;19:14628. [ PUBMED] 
6.  Mendenhall W, Sincich T. Statistics for Engineering and the Sciences. 3 ^{rd} ed. New York: Dellen Publishing Co.; 1992. 
7.  Panchenko D. 18.443 Statistics for Applications, Section 14, Simple Linear Regression. Massachusetts Institute of Technology: MIT OpenCourseWare; 2006. 
8.  Elazar JP. Multiple Regression in Behavioral Research: Explanation and Prediction. 2 ^{nd} ed. New York: Holt, Rinehart and Winston; 1982. 
[Figure 1], [Figure 2], [Figure 3], [Figure 4]
[Table 1], [Table 2], [Table 3], [Table 4], [Table 5], [Table 6], [Table 7]
This article has been cited by  1 
Design and Performance Analyses of Evacuated UTube Solar Collector Using DataDriven Machine Learning Models 

 Astarag Mohapatra, P. K. S. Tejes, Chatur Gembali, B. Kiran Naik   Journal of Solar Energy Engineering. 2023; 145(1)   [Pubmed]  [DOI]   2 
An Application of Fuzzy Multiple Linear Regression in Biological Paradigm 

 Saima Mustafa, Shumaila Ghaffar, Murrium Bibi, Muhammad Ghaffar Khan, Qaisara Praveen, Harish Garg, Mahamane Saminou, Zakia Hammouch   Complexity. 2022; 2022: 1   [Pubmed]  [DOI]   3 
Modelling and Forecasting Fresh AgroFood Commodity Consumption Per Capita in Malaysia Using Machine Learning 

 Rayner Alfred, Christylyn Leikson, Bonaventure Boniface, Geoffrey Harvey Tanakinjal, Assis Kamu, Mori Kogid, Stephen L. Sondoh, Nolila Mohd Nawi, Nalini Arumugam, Ryan Macdonell Andrias, Mohammed Shuaib   Mobile Information Systems. 2022; 2022: 1   [Pubmed]  [DOI]   4 
The relationship between schoolage children’s interoceptive awareness and executive functioning: An exploratory study 

 Caitlin Bishop, Ted Brown, MongLin Yu   British Journal of Occupational Therapy. 2022; : 0308022622   [Pubmed]  [DOI]   5 
Strategic planning in secondary schools in Rangwe subcounty, Kenya: Influence on student learning outcomes 

 John James Juma, Milcah Nyaga, Zachary N. Ndwiga   Management in Education. 2022; : 0892020622   [Pubmed]  [DOI]   6 
Machine learning models to predict the delivered positions of Elekta multileaf collimator leaves for volumetric modulated arc therapy 

 Sruthi Sivabhaskar, Ruiqi Li, Arkajyoti Roy, Neil Kirby, Mohamad Fakhreddine, Nikos Papanikolaou   Journal of Applied Clinical Medical Physics. 2022;   [Pubmed]  [DOI]   7 
Primary and secondary cardiac tumors: clinical presentation, diagnosis, surgical treatment, and results 

 Alessio Campisi, Angelo Paolo Ciarrocchi, Nizar Asadi, Andrea Dell’Amore   General Thoracic and Cardiovascular Surgery. 2022;   [Pubmed]  [DOI]   8 
A Review on Drought Index Forecasting and Their Modelling Approaches 

 Yi Xun Tan, Jing Lin Ng, Yuk Feng Huang   Archives of Computational Methods in Engineering. 2022;   [Pubmed]  [DOI]   9 
Optimal Dependence of Performance and Efficiency of Collaborative Filtering on Random Stratified Subsampling 

 Samin Poudel, Marwan Bikdash   Big Data Mining and Analytics. 2022; 5(3): 192   [Pubmed]  [DOI]   10 
Soft computing and image processing techniques for COVID19 prediction in lung CT scan images 

 Neeraj Venkatasai L. Appari, Mahendra G. Kanojia   International Journal of Hybrid Intelligent Systems. 2022; : 1   [Pubmed]  [DOI]   11 
Regional scale analysis of land cover dynamics in Kerala over last two decades through MODIS data and statistical techniques 

 Vijith H., Ninu Krishnan MV., Alhassan Sulemana   Journal of Environmental Studies and Sciences. 2022;   [Pubmed]  [DOI]   12 
Assessing land erosion and accretion dynamics and river bank line shifting of upper reach of Hooghly river of West Bengal, India 

 Abhijit Paul, Manjari Bhattacharji   Sustainable Water Resources Management. 2022; 8(5)   [Pubmed]  [DOI]   13 
Recent trends of smart agricultural systems based on Internet of Things technology: A survey 

 Dunia Abas Gzar, Ali Majeed Mahmood, Maythem Kamal Abbas AlAdilee   Computers and Electrical Engineering. 2022; 104: 108453   [Pubmed]  [DOI]   14 
Wastetoenergy as a tool of circular economy: Prediction of higher heating value of biomass by artificial neural network (ANN) and multivariate linear regression (MLR) 

 Fatima Ezzahra Yatim, Imane Boumanchar, Bousalham Srhir, Younes Chhiti, Charafeddine Jama, Fatima Ezzahrae M'hamdi Alaoui   Waste Management. 2022; 153: 293   [Pubmed]  [DOI]   15 
Datadriven approach towards identifying dyesensitizer molecules for higher power conversion efficiency in solar cells 

 Ganapathi Rao Kandregula, Dhinesh Kumar Murugaiah, N. Arul Murugan, Kothandaraman Ramanujam   New Journal of Chemistry. 2022;   [Pubmed]  [DOI]   16 
Application of MLRPRN model for estimation of arsenic concentration in drinking water: a case study for Izmir City 

 Serdar Gündogdu   Urban Water Journal. 2022; : 1   [Pubmed]  [DOI]   17 
Factors influencing accounting research output in South Africa’s universities of technology 

 Mzwandile Mbambo, Odunayo Olarewaju, Thabiso Sthembiso Msomi, Collins G. Ntim   Cogent Business & Management. 2022; 9(1)   [Pubmed]  [DOI]   18 
NonDestructive Estimation of Fruit Weight of Strawberry Using Machine Learning Models 

 Jayanta Kumar Basak, Bhola Paudel, Na Eun Kim, Nibas Chandra Deb, Bolappa Gamage Kaushalya Madhavi, Hyeon Tae Kim   Agronomy. 2022; 12(10): 2487   [Pubmed]  [DOI]   19 
Feature–Classifier Pairing Compatibility for sEMG Signals in Hand Gesture Recognition under Joint Effects of Processing Procedures 

 Mohammed Asfour, Carlo Menon, Xianta Jiang   Bioengineering. 2022; 9(11): 634   [Pubmed]  [DOI]   20 
A Quantitative Study of the Impact of Organizational Culture, Communication Management, and Clarity in Project Scope on Constructions’ Project Success with Moderating Role of Project Manager’s Competencies to Enhance Constructions Management Practices 

 Muhammad Muneer, Nawar Khan, Muhammad Awais Hussain, Zhang Shuai, Adnan Ahmad Khan, Rashid Farooq, Muhammad Aamir Moawwez, Muhammad Atiq Ur Rehman Tariq   Buildings. 2022; 12(11): 1856   [Pubmed]  [DOI]   21 
Analyzing Greece 2010 Memorandum’s Impact on Macroeconomic and Financial Figures through FCM 

 Stavros P. Migkos, Damianos P. Sakas, Nikolaos T. Giannakopoulos, Georgios Konteos, Anastasia Metsiou   Economies. 2022; 10(8): 178   [Pubmed]  [DOI]   22 
Effect of Money Supply, Population, and Rent on Real Estate: A Clustering Analysis in Taiwan 

 ChengHong Yang, Borcy Lee, YuDa Lin   Mathematics. 2022; 10(7): 1155   [Pubmed]  [DOI]   23 
Review of Methods, Applications and Publications on the Approximation of Piecewise Linear and Generalized Functions 

 Sergei Aliukov, Anatoliy Alabugin, Konstantin Osintsev   Mathematics. 2022; 10(16): 3023   [Pubmed]  [DOI]   24 
Reorientation and simple understanding of regression analysis for student nurses: When and why to use 

 Anindita Mandal, SureshK Sharma   Indian Journal of Continuing Nursing Education. 2022; 0(0): 0   [Pubmed]  [DOI]   25 
Forecasting Liquidated Damages via Machine LearningBased Modified Regression Models for Highway Construction Projects 

 Odey Alshboul, Mohammad A. Alzubaidi, Rabia Emhamed Al Mamlook, Ghassan Almasabha, Ali Saeed Almuflih, Ali Shehadeh   Sustainability. 2022; 14(10): 5835   [Pubmed]  [DOI]   26 
Investigation on Viscosity Behavior of Anionic Polyacrylamide Copolymer in Brine Solutions for Slickwater Fluids Applications at High Salinity and Hardness Conditions 

 Dileep Kumar Balaga, Navneeth Kumar Korlepara, Aditya Vyas, Sandeep D. Kulkarni   Journal of Energy Resources Technology. 2022; 144(11)   [Pubmed]  [DOI]   27 
Optimizing the Tolerance for the Products with MultiDimensional Chains via Simulated Annealing 

 ChenKun Tsung   Symmetry. 2021; 13(10): 1780   [Pubmed]  [DOI]   28 
A simple method for correction of the systematic error in calculating biological age by the multiple regression equation 

 Anatoly Pisaruk   Ageing & Longevity. 2021; (1 2021): 26   [Pubmed]  [DOI]   29 
A simple method for correction of the systematic error in calculating biological age by the multiple regression equation 

 Anatoly Pisaruk   Ageing & Longevity. 2021; (1 2021): 26   [Pubmed]  [DOI]   30 
Alternative mathematical method for calculating biological age 

 Anatoly Pisaruk   Ageing & Longevity. 2021; (2): 1   [Pubmed]  [DOI]   31 
Determining the relation between the count number and xray energy level in pyroelectric materials using linear regression analysis 

 Saadet Sena Egeli, Yalcin Isler   Journal of Intelligent Systems with Applications. 2021; : 58   [Pubmed]  [DOI]   32 
A New Nanomaterial Based Biosensor for MUC1 Biomarker Detection in Early Diagnosis, Tumor Progression and Treatment of Cancer 

 Fulden UlucanKarnak,Sinan Akgöl   Nanomanufacturing. 2021; 1(1): 14   [Pubmed]  [DOI]   33 
Influence of Financial Variables on the Development of Rural Communes of Eastern Poland in 2009–2018 

 Andrzej Pawlik,Pawel Dziekanski,Jaroslaw W. Przybytniowski   Risks. 2021; 9(8): 145   [Pubmed]  [DOI]   34 
Ensemble Machine Learning Assisted Reservoir Characterization Using Field Production Data–An Offshore Field Case Study 

 Baozhong Wang,Jyotsna Sharma,Jianhua Chen,Patricia Persaud   Energies. 2021; 14(4): 1052   [Pubmed]  [DOI]   35 
Deceleration of the development of city gas connections amidst the covid19 pandemic in the metropolitan area 

 A Prima,O Ridaliani,A Hamid,H Pramadika,H P Sanusi,A Rinanti   IOP Conference Series: Earth and Environmental Science. 2021; 802(1): 012014   [Pubmed]  [DOI]   36 
Prospect of coalbased methanol market in Indonesia 

 T Suseno, D F Umar   IOP Conference Series: Earth and Environmental Science. 2021; 882(1): 012073   [Pubmed]  [DOI]   37 
Empirical analysis of regression techniques by house price and salary prediction 

 U Bansal,A Narang,A Sachdeva,I Kashyap,S P Panda   IOP Conference Series: Materials Science and Engineering. 2021; 1022: 012110   [Pubmed]  [DOI]   38 
The role of predictive analytics to explain the employability of management graduates 

 Ramakrishnan Raman, Dhanya Pramod   Benchmarking: An International Journal. 2021; aheadofp(aheadofp)   [Pubmed]  [DOI]   39 
Deep and machine learning approaches for forecasting the residual value of heavy construction equipment: a management decision support model 

 Odey Alshboul, Ali Shehadeh, Maha AlKasasbeh, Rabia Emhamed Al Mamlook, Neda Halalsheh, Muna Alkasasbeh   Engineering, Construction and Architectural Management. 2021; aheadofp(aheadofp)   [Pubmed]  [DOI]   40 
Applying machine learning approach to predict students’ performance in higher educational institutions 

 Mohammed Nasiru Yakubu, A. Mohammed Abubakar   Kybernetes. 2021; aheadofp(aheadofp)   [Pubmed]  [DOI]   41 
Role of COVIDsafe app and control measures in Australia in combating COVID19 pandemic 

 Hafiz Syed Mohsin Abbas, Xiaodong Xu, Chunxia Sun   Transforming Government: People, Process and Policy. 2021; 15(4): 708   [Pubmed]  [DOI]   42 
Modeling and parameter optimization of the papermaking processes by using regression tree model and full factorial design 

 JOSÉ L. RODRIGUEZALVAREZ,,ROGELIO LOPEZHERRERA,IVÁN E. VILLALONTURRUBIATES,GERARDO GRIJALVAAVILA,JORGE L. GARCÍA ALCARAZ   TAPPI Journal. 2021; 20(2): 123   [Pubmed]  [DOI]   43 
Prospects for the creation and use of paired and multiple correlation and regression models in beekeeping 

 O. Galatiuk,A. Lakhman,T. Romanishina,V. Behas   Naukovij věsnik veterinarnoď medicini. 2021; (1(165)): 58   [Pubmed]  [DOI]   44 
MultiObjective Optimization of WEDM of Aluminum Hybrid Composites Using AHP and Genetic Algorithm 

 Amresh Kumar,Neelkanth Grover,Alakesh Manna,Raman Kumar,Jasgurpreet Singh Chohan,Sandeep Singh,Sunpreet Singh,Catalin Iulian Pruncu   Arabian Journal for Science and Engineering. 2021;   [Pubmed]  [DOI]   45 
Prediction Mechanisms to Improve 5G Network User Allocation and Resource Management 

 Christos Bouras,Rafail Kalogeropoulos   Wireless Personal Communications. 2021;   [Pubmed]  [DOI]   46 
Instructional leadership as a controlling function in secondary schools in Rangwe Sub County, Kenya: Influence on students’ learning outcomes 

 John James Juma, Zachary N Ndwiga, Milcah Nyaga   Educational Management Administration & Leadership. 2021; : 1741143221   [Pubmed]  [DOI]   47 
The influence of culture on the development of youth entrepreneurs in a selected suburb in Cape Town 

 Nashwin Davids,Robertson Tengeh,Rodney Duffett   EUREKA: Social and Humanities. 2021; (2): 24   [Pubmed]  [DOI]   48 
Nanotoxic Effects of Silver Nanoparticles on Normal HEK293 Cells in Comparison to Cancerous HeLa Cell Line 

 Xiongwei Liu,Kuizhong Shan,Xiaxia Shao,Xianqing Shi,Yun He,Zhen Liu,Joe Antony Jacob,Lichun Deng   International Journal of Nanomedicine. 2021; Volume 16: 753   [Pubmed]  [DOI]   49 
Intention to Screen for Cervical Cancer Among Child Bearing Age Women in Bahir Dar City, NorthWest Ethiopia: Using Theory of Planned Behavior 

 Wallelign Alemnew, Getu Debalkie, Telake Azale   International Journal of Women's Health. 2020; Volume 12: 1215   [Pubmed]  [DOI]   50 
An Economic Analysis on Years of Schooling of the Children Related to Financial Support from Family and Govt. & NonGovt. Institutions 

 Vartika Tanania, Shipra Shukla, Shambhavi Singh   British Journal of Arts and Humanities. 2020; : 665   [Pubmed]  [DOI]   51 
Estimation of Moisture Content in XLPE Insulation in Medium Voltage Cable by Frequency Domain Spectroscopy 

 A. K. Das,N. Haque,A. K. Pradhan,S. Dalai,B. Chatterjee,A. Mukherjee   IEEE Transactions on Dielectrics and Electrical Insulation. 2020; 27(6): 1811   [Pubmed]  [DOI]   52 
Country’s Entrepreneurial Environment Predictors for Starting a New Venture—Evidence for Romania 

 Carmen Paunescu,Elisabeta Molnar   Sustainability. 2020; 12(18): 7794   [Pubmed]  [DOI]  



