A COMPARATIVE STUDY OF THE MULTIPLE LOGISTIC REGRESSION, LINEAR DISCRIMINANT ANALYSIS AND QUADRATIC DISCRIMINANT FOR ESTIMATING THE MISCLASSIFICATION ERROR RATE OF INFANT BIRTH OUTCOME

© 2020 The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. 358 A COMPARATIVE STUDY OF THE MULTIPLE LOGISTIC REGRESSION, LINEAR DISCRIMINANT ANALYSIS AND QUADRATIC DISCRIMINANT FOR ESTIMATING THE MISCLASSIFICATION ERROR RATE OF INFANT BIRTH OUTCOME


INTRODUCTION
Infant mortality is a fundamental measure of a country's level of socio-economic development and the quality of life, especially of mothers. Infant mortality has remained a national and global concern and its importance in the socio-economic assessment of the country's development cannot be overstated. Sub-Saharan Africa and South Asia face the greatest challenges in child survival and currently account for more than 80% of deaths of children under the age of five worldwide. Several factors have been identified as contributing to the increase in the infant mortality rate in most developing countries. Studies have shown that there is a close relationship between educational attainment and lower mortality rates.
Although there are vagaries of statistics and estimates of infant mortality for different countries and the world from different sources, the patterns and trends are specifically similar. Among general trends, the global under-five mortality rate fell by almost 47% between 1990 and 2012 (measuring 90 deaths per 1,000 live births in 1990 and 48 in 2012) while the trend in sub-Saharan Africa is likely to increase. Globally, several causes of death in children under five have been noted, including: pneumonia, which accounts for up to 17% of all deaths, complications of premature births which cause about 15 % of child deaths, childbirth complications (10%), diarrhea (9%) and up to 7% due to malaria (United Nations Interagency Group for Estimating Infant Mortality, 2013). In addition, a survey in Bangladesh shows that the infant mortality rate was highest (1.64%) for children of illiterate mothers and lowest (0.54%) for children with educational attainment. of the mother was secondary and higher (Uddin et al., 2009). Educated mothers are more likely than illiterate mothers to provide a healthy environment, nutritious food, and to have a better understanding of reproductive health when designing and providing health care for their children. Literate mothers has been expected to give birth to healthier babies because they themselves tend to be healthier and are likely to experience lower mortality in their children at all ages.
Researchers have continually raised alarm on the high rate infant mortality which still remains a major issue of concern in Nigeria. Also, it is sad to note that the quality of care given during and after birth of an infant is rather rudimentary as there is generally insufficient Government investment in public healthcare. The worst-case scenario is that majority of births do occur in unorthodox facilities and babies only get to hospital after irreparable damage may have occurred. There is need for analytical studies both at the sub-national and household levels in order to produce more reliable empirical evidence to inform the design of policies and programmes that could improve child health outcomes in various countries. Also, some literature noted that it would be methodologically wrong to fit a single-level standard regression model in the analysis of child survival, because of the hierarchical nature of mortality data. Meanwhile, such studies that consider hierarchical structure of mortality data to establish the contextual factors influencing infant and child mortality are limited in Nigeria. However, improving the counting of stillbirths and neonatal deaths is important to tracking Sustainable Development Goal 3.2 and improving vital statistics in low-income and middle-income countries (LMICs). However, the validity of self-reported stillbirths and neonatal deaths in surveys is often threatened by misclassification errors between the two birth outcomes. Study such as Liu et al. (2016) recommend examining the extent of stillbirths being misclassified as neonatal deaths for larger sample size in Malawi or other developing countries. There is need to determine the misclassification error rate associated with reporting stillbirths and neonatal death in Anambra State, Nigeria, hence, the essence of this study. Richardson et al. (2014) stated the reduction in child mortality is necessary in order to attain sustainable development goals. They identified the existence of a major challenge in the procurement of healthcare services by individuals which is determined to a large extent by their level of income. In their study, infant mortality rate, underfive mortality rate and neonatal mortality rate were modeled against household income and controlled for access to anti-natal care, access to safe water and sanitation, neonatal mortality rate, maternal education and household size in Nigeria. The findings of their study revealed that household income has significant effect on neonatal mortality rate in Nigeria but household income has insignificant effect on infant and under-five mortality rates in Nigeria. Also, it was found that household size has significant effect on infant mortality rate and neonatal mortality rate in Nigeria. In addition, findings revealed that access to anti-natal care has significant effect on under-five mortality rate in Nigeria.

REVIEW OF RELATED LITERATURE
Amzat and Adeosun (2014) examined the nature of relationship between infant mortality and some socioeconomic and demographic variables. Also, assessed the proximate covariate that influences the survival of an infant using the 2003 Nigeria Demographic and Health Survey Data (NDHS). They used sequential probity model to examine the relationship between the dependent variables (infant's death and age at death) and predictor variables for both correlated and uncorrelated error terms. The findings of their study showed that in both of the situation with correlated and uncorrelated error terms, infant's being alive or death is positively affected by education, birth order number, duration of breast feeding and negatively affected by both total children born and place of delivery. There exist significant differences among the predictor variables on the probability of infant's death at neonatal and post neonatal period. Also, the correlation between the error terms was found to be significant.
Adetoro and Amoo (2014) stated that despite the global decline in infant mortality rate from 90 deaths per 1,000 live births in 1990 to 48 in 2012, Nigeria is yet to record any substantial improvement. Infant mortality in Nigeria increased from 138 per 1,000 live births in 2007 to 158 per 1,000 live births in 2011 against the Millennium development Goal target of 71 per 1,000 live births. They used data from the Nigeria Demographic and Health Survey (NDHS) 2008 to investigate the predictors of child (aged 0-4 years) mortality in Nigeria. They statistical tool employed were cross-tabulation and binary logistic regression techniques. The findings of their study showed that mortality rate was highest (49.14%) for children of illiterate mothers and lowest (13.29%) among mothers with higher education. Also, the result of the logistic regression analysis revealed that, education of both parents and occupation of mothers were found statistically significant to reduction in child mortality rate. It was equally found that mothers' wealth index, age at first birth and usual of place of residence have substantial impact on child mortality in Nigeria. They concluded that increase in women education could increase age at first birth and mitigate the risk of poor child health outcomes.
Adepoju (2015) examined the differentials in child mortality rate across socioeconomic, demographic and selected health characteristics in rural Nigeria, employing the 2008 National Demographic and Health Survey data. The findings of his study on health attributes and morbidity pattern of mother and child revealed that most of the respondents did not have access to good health facilities and antenatal care. As a result, more than three-quarters of the respondents delivered their babies at home and had less than 24 months birth interval between pregnancies. Results showed that child mortality rate was highest among illiterate mothers, mothers without a source of income, under aged women (less than 20 years) and among fathers whose primary livelihood lie in agriculture. Regional analysis showed that the North-Western zone had the highest child mortality rate followed by the North-Eastern zone, while the South-South zone had the lowest. With respect to health attributes, children delivered at home, who were never breastfed and of multiple births had high mortality rates. Gender differentials showed that the rate of mortality was higher for male than for female children but lowest for children who had been fully immunized and whose mothers were aged between 21 and 30 years. Jacdonmi et al. (2016) reviewed the trends and patterns of breastfeeding, causes of infant mortality and breastfeeding of infants from birth to six months, followed by appropriate and adequate complementary feeding for two years and above, as a strategic intervention against infant mortality and the need to create awareness about the benefits of breastfeeding. The outcome of their review showed that breastfeeding protects infants from several infections such as diarrhoea, pneumonia, gastrointestinal infections, urinary tract infections, sudden infant death syndrome and others which are probable causes of infant deaths. They noted that as breastfeeding provides adequate nutrition to infants, protects them from diseases and infections, it is a cost-effective method/intervention to reduce infant mortality. Liu et al. (2016) assessed the extent and correlates of stillbirths being misclassified as neonatal deaths by comparing two recent and linked population surveys conducted in Malawi, one being a full birth history (FBH) survey, and the other a follow-up verbal/social autopsy (VASA) survey. The result of their study found that one-fifth of 365 neonatal deaths identified in the FBH survey were classified as stillbirths in the VASA survey. Neonatal deaths with signs of movements in the last few days before delivery reported were less likely to be misclassified stillbirths (OR = 0.08, p<0.05). It was found that having signs of birth injury has impact on higher odds of misclassification (OR = 6.17, p<0.05).

METHOD OF DATA COLLECTION
Secondary source of data used in this study obtained from the records department of General Hospital Onitsha from 2007-2016. The data comprises of Status of infant birth, Mothers parity, Age of mother, Weight of baby, Mothers education status, Number of Bookings before gestation and Gestation Age.

METHOD OF DATA ANALYSIS
The statistical tools used in this study include Multiple logistic regression, linear discriminant analysis and Quadratic discriminant analysis.

MULTIPLE LOGISTIC REGRESSION ANALYSIS
The Multiple logistic regression analysis is a statistical tool used in predicting categorical placement in a dependent variable or the probability of category membership on a dependent variable based on multiple independent variables. The independent variables can be either dichotomous (i.e., binary) or continuous (i.e., interval or ratio in scale). Multiple Logistic regression allows for two or more categories of the dependent or outcome variable (categories such as 0, 1, 2 or -1, 0, 1). Logistic regression uses maximum likelihood estimation to evaluate the probability of categorical membership. Logistic regression does necessitate careful consideration of the sample size and examination for outlying cases. Like other data analysis procedures, initial data analysis should be thorough and include careful univariate, bivariate, and multivariate assessment. Specifically, multicollinearity should be evaluated with simple correlations among the independent variables. Also, multivariate diagnostics (i.e. standard multiple regression) can be used to assess for multivariate outliers and for the exclusion of outliers or influential cases. Sample size guidelines for multinomial logistic regression indicate a minimum of 10 cases per independent variable (Schwab, 2002). Logistic regression is often considered an attractive analysis because it does not assume normality, linearity, or homoscedasticity. Logistic regression does have assumptions, such as the assumption of independence among the dependent variable choices. This assumption states that the choice of or membership in one category is not related to the choice or membership of another category (i.e., the dependent variable). The assumption of independence can be tested with the Hausman-McFadden test (Hoffmann, 2003). Furthermore, multinomial logistic regression also assumes non-perfect separation. If the groups of the outcome variable are perfectly separated by the predictor(s), then unrealistic coefficients will be estimated and effect sizes will be greatly exaggerated.
Suppose we have n independent observation with p explanatory variables. The qualitative response variable has k categories. To construct the logits in the multinomial case one of the categories is considered the base level and all the logits are constructed relative to it. Any category can be taken as the base level, we shall take category k as the base level in our description of the method. Since there is no ordering, it is apparent that any category may be labeled k. Let π j denote the multinomial probability of an observation falling in the j th category. To obtain the relationship between this probability and the p explanatory variables, X1, X2, ...,Xp. The general logistic regression model is expressed as Letting all the s π' add to unity, (1) will then have the form for j = 1,2, . . . , (k -1). The model parameters are estimated by the method of maximum likelihood.

LINEAR DISCRIMINANT ANALYSIS
Discriminant analysis is used in situations where the clusters are known a priori. The aim of discriminant analysis is to classify an observation, or several observations, into these known groups. Discriminant analysis is a method used to distinguish between groups of populations Πj and to determine how to allocate new observations into groups. Where yi denotes the j th sub-matrix of y corresponding to observations of group j and Hj denotes the (nj X nj) centering matrix (Wolfgang and Léopold, 2007). The within within-group-sum of squares measures the sum of variations within each group.
The between-group-sum of squares is given by x ( a n y n The vector a that maximizes (7) is the eigenvector of B W -1 that corresponds to the largest eigen value. Hence, the corresponding discriminant rule is

QUADRATIC DISCRIMINANT ANALYSIS
Quadratic Discriminant Analysis (QDA) is a variant of LDA in which an individual covariance matrix is estimated for every class of observations. QDA is particularly useful if there is prior knowledge that individual classes exhibit distinct covariances. One weakness of QDA is that it cannot be used as a dimensionality reduction technique.
In QDA, k ∑ is required for each class of rather than assuming k = ∑ ∑ as it is done in LDA. The quadratic discriminant function is given as Since QDA estimates a covariance matrix for each class, it has a greater number of effective parameters than LDA. The quadratic discriminant analysis is quadratic in nature and contains a second order terms.
The classification rule for the quadratic discriminant function: The classification rule is equally similar to LDA since all that is expected is to find the class k which maximizes the quadratic discriminant function.    Where, E1 is the error rate for group 1, E2 is the error rate for group 2, and E3 is the error rate for group 3 Misclassification error rate = E1 + E2 + E3= 0.5992 The result presented in table 3 found that the misclassification error rate for birth outcome is 0.5992 (59.92%). This implies that 59.9% of birth outcomes were of the original cases was misclassified. The result obtained in table 4 showed that the prior probabilities of the groups were 0.228503, 0.40168 and 0.36981 for Alive, Neonatal death and Still birth respectively. The result obtained in table 5 shows the group means for the variables by the three birth outcomes Alive, Neonatal death and still birth. The result obtained in table 6 showed the standardized coefficient for Maternal Education Status in the first function is greater in magnitude than the coefficients for the other variables with a coefficient of 0.95693 while the standardized coefficient for Booking Status in the second function is greater in magnitude than the coefficients for the other variables with a coefficient of 1.06807. Hence, Maternal Education Status and Bookings Status will have the greatest impact for first and second function respectively.  7 showed that the misclassification error rate for birth outcome is 0.5931 (59.31%). This implies that 59.3% of birth outcomes were of the original cases was misclassified.  The result obtained in table 9 shows the group means for the variables by the three birth outcomes Alive, Neonatal death and still birth. The result presented in table 10 revealed that the misclassification error rate for birth outcome is 0.5956 (59.56%). This implies that 59.6% of birth outcomes were of the original cases was misclassified.

CONCLUSION
This study employed multiple logistic regression, linear discriminant analysis and the quadratic discriminant analysis for estimating infant birth outcome and misclassification error rate of the birth outcomes in Anambra State. The birth outcomes of interest were the neonatal death, still birth and Alive. The result of the findings using the multiple logistic regression analysis showed that Mothers Education Status (MES) and Booking contributed significantly on the logistic model while factors of Parity, Sex, Age of Mother (AOM), GA, Year, and Birth Weight (BW) were found to be insignificant on birth outcome in Anambra State.
The misclassification error rate for birth outcome in Anambra State using the multiple regression approach is 0.5992 (59.92%). This indicated that 59.9% of birth outcomes were of the original cases was misclassified to estimate the birth outcome. Also, findings of the study equally showed that the prior probabilities of the groups for the linear and quadratic discriminant analysis were 0.228503, 0.40168 and 0.36981 for Alive, Neonatal death and Still birth respectively. Further findings revealed that the standardized coefficient for Maternal Education Status in the first linear function is greater in magnitude than the coefficients for the other variables while the standardized coefficient for Booking Status in the second linear function is greater in magnitude than the coefficients for the other variables. This result indicated that the Maternal Education Status and Bookings Status have the greatest impact for first and second linear function respectively. In addition, the result of the misclassification error rate for birth outcome using the linear discriminant analysis is 0.5931 (59.31%). The misclassification error rate for birth outcome using quadratic discriminant analysis is 0.5956 (59.56%). Based on the findings of this study, linear discriminant approach is the best alternative in estimating misclassification error rate of infant birth outcome followed by quadratic discriminant analysis and the least is multiple logistic regression .The findings clearly confirmed that the linear discriminant analysis is best with misclassification error rate of 59.31% which is in line with findings by Liu et al. (2016) who found that one-fifth of 365 neonatal deaths identified in the full birth history (FBH) survey were classified as stillbirths in the verbal/social autopsy (VASA) survey. The study recommends linear discriminant method for estimating the misclassification error rate of infant birth outcome in Anambra State and urgent need for design and implementation of policies, projects, and programmes that will give priority to essential child care thereby improving the quality of life of Nigerian child.