IJETMR
AN EFFICIENT COMPROMISED IMPUTATION METHOD FOR ESTIMATING POPULATION MEAN

An Efficient Compromised Imputation Method for Estimating Population Mean

 

Sandeep Mishra 1 Icon

Description automatically generated

 

1 Association of Indian Universities, New Delhi, India

 

Background pattern

Description automatically generated

A picture containing logo

Description automatically generated

ABSTRACT

This paper suggests a modified new ratio-product-exponential imputation procedure to deal with missing data in order to estimate a finite population mean in a simple random sample without replacement. The bias and mean squared error of our proposed estimator are obtained to the first degree of approximation. We derive conditions for the parameters under which the proposed estimator has smaller mean squared error than the sample mean, ratio, and product estimators. We carry out an empirical study which shows that the proposed estimator outperforms the traditional estimators using real data.

 

Received 16 July 2022

Accepted 19 August 2022

Published 05 September 2022

Corresponding Author

Sandeep Mishra, smisra1983@yahoo.co.in

DOI10.29121/ijetmr.v9.i9.2022.1216  

Funding: This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Copyright: © 2022 The Author(s). This work is licensed under a Creative Commons Attribution 4.0 International License.

With the license CC-BY, authors retain the copyright, allowing anyone to download, reuse, re-print, modify, distribute, and/or copy their contribution. The work must be properly attributed to its author.

 

Keywords: Missing Data, Mean Square Error, Imputation, Bias, Ratio Estimator

 

 

 


1. INTRODUCTION

Imputation means replacing a missing value with another value based on a reasonable estimate. Information on the related auxiliary variable is generally used to recreate the missing values for completing datasets. Incomplete data is usually categorized into three different response mechanisms: Missing Completely at Random (MCAR); Missing at Random (MAR); and Missing Not at Random (MNAR or NMAR) Little and Rubin (2002). Missing completely at random (MCAR): Missing data are randomly distributed across the variable and unrelated to other variables. Missing at random (MAR): Missing data are not randomly distributed but they are accounted for by other observed variables. Missing not at random (MNAR): Missing data systematically differ from the observed values. From the above-mentioned classifications of missing data, we, in the present study, have assumed MCAR.

Auxiliary information is important for survey practitioner as it is utilized to improve the performance of the methods. It may be utilized at the design stage or the estimation stage of the survey to get the more efficient estimator. At estimation stage ratio, product and regression methods are traditionally used. Bhal and Tuteja (1991) introduced exponential ratio and product estimator for estimation of population mean. Many modifications have been proposed using these methods till date. For handling missing data on the study variable several extensions and developments were proposed in the literature. Singh (2003) suggested product estimation for imputation. Shakti Prasad (2018) adapts exponential product type estimator given by Bahal and Tuteja (1991) and proposed exponential estimators for imputation. Kadilar and Cingi (2008) investigated some ratio-type imputation methods and proposed three new estimators to overcome the problem of the missing data. Diana and Perri (2010) proposed three regression type estimators which were more efficient than the Kadilar and Cingi (2008). The present article suggests a general ratio product exponential type method of imputation and accordingly proposed three estimators using the different amount of available auxiliary information as utilized by Ahmad et al. (2006), Kadilar and Cingi (2008), and Diana and Perri (2010). The proposed methods are than compared by traditional procedure of imputation. The proposed estimators come out to be more efficient than the usual ratio, product, regression, and exponential method for handling missing observations to estimate the population mean.

Given a finite population , the objective is to estimate the population mean . A simple random sample wor, , of size  is drawn from the population . Let the responding units be  from the  sampled units. Let us denote  as the set of responding units and  the set of non-responding units, i.e.,  is observed for  but for units in  the values are not available and hence imputed values are derived by some method. In this paper we shall use the following notations:

: Population Size;  Sample size; : Number of responding units; : Population means of study variate  and auxiliary variate  respectively; :  Standard Deviation of study variate  and auxiliary variate  respectively; : Coefficient of variation of study variate  and auxiliary variate  respectively; : Correlation coefficient between  and ; .

 

2. Some existing methods of imputation

1)     The mean method of imputation suggests replacing the missing observations with the mean of the observations available on response units i.e.

 

Then the estimator of the population mean  is given by

 

                                    and its MSE is given by

 

 

 

                                                                                                       (1.1)

 

2)     The ratio method of imputation uses information on one auxiliary variable  and calculates the missing values by


Where

 

This gives the resulting estimator by

 

 

The MSE of  is given by

 

                                          (1.2)

 

It is noted that, in the presence of missing data, the availability of information on auxiliary variable  in the data set supports suggesting efficient estimators.

 

3)     Diana and Perri (2010) proposed three estimators as by using different regression-type method of imputation such that the imputed data is given by

 

 

 

For these methods the resulting estimators are

 

 

 

                                                                (1.2)

 

                                                                                   (1.3)

 

                                                           (1.4)

 

They proved that the suggested estimators are more efficient than the Kadilar and Cingi (2008) estimators.  is always more efficient than both  and  , whereas  perform better than  if the condition

 

 

3. The proposed Estimator

The estimator suggested here is inspired by the Sahai (1985) estimator of population mean in case of simple random sampling, and is defined as

 

 

With the above imputation method, the resulting estimator of the population mean  is obtained as

 

                                                                                                          (2.1)

 

 and  are constant chosen suitably so that their choice minimizes the mean square error of the resultant estimator and   is a real constant. Our goal in this paper is to discuss the suggested estimators for different values of  and have a comparative study of the suggested estimator for these values of  in order to get the minimum MSE.

 

4. First Degree Approximation to the Bias

To derive the Bias and MSE expressions of the proposed estimator upto , we define

 

 

Thus, we have  

 

The expectation of these  are   

 

And under simple random sampling without replacement,

 

 

 

where ,   .

 

Now representing (2.1) in terms of ,  we have 

 

 

 

 

We assume that the sample is large enough to make  and  so small that contributions from powers of degree higher than two are negligible. By retaining powers up to  and , we get

 

 

 

 

 

                                                                                                                           (2.2)

 

Theorem 2.1.  The conditional bias up to the first order of approximation of the estimator  is given by the estimator is given as

 

 

Where  and

 

Proof: From (2.2) we have

 

                                                                                                               (2.3)

 

Taking expectation on both side we obtain the bias of  to order  as