Gambar halaman
PDF
ePub

The consumption and expenditure data that were obtained from the suppliers did not list the annual amounts. Instead, the supplier provided the monthly billing records generally for a 15-month period. Some periods began as early as October 1996 and others ended as late as June 1998. These records listed the amount purchased, the cost of the purchase, and the date of purchase. For natural gas and electricity, the amount purchased was usually equivalent to the amount consumed. The major exception occurred when the supplier had estimated the bill for the billing period. For fuel oil, kerosene, and LPG, the fuel purchased in 1997 may be consumed in 1998 instead of 1997. Conversely, the fuel consumed in 1997 may have been purchased in 1996. The procedures that were used to calculate the annual consumption and expenditure amounts for electricity and natural gas were designed to avoid estimated bills when possible. The annual consumption and expenditure amounts for fuel oil, kerosene, and LPG reflected the amounts purchased. No attempt was made to distinguish between the amount purchased and the amount consumed for fuel oil, kerosene, and LPG.

Nonresponse Statistics

The proportion of households that did not sign authorization forms for suppliers to release billing data was in the range of 3 to 9 percent for the five fuels. Overall the proportion was 8 percent. Most households that signed authorization forms did so at the time of the personal interview or at the time of completing the mailed questionnaire. To maximize the number of households with records, however, a follow-up request was mailed to those who did not sign a form at the time of the personal interview. About 17 percent of this group returned signed forms in response to the mail request and, therefore, were included in the energy supplier survey.

Factors affecting nonresponse are somewhat different for fuel oil, kerosene, and LPG than they are for electricity and natural gas (Table B4). The most frequent reasons for nonresponse for households using fuel oil, kerosene, or LPG were that the company was unknown or not contacted and that the dealer could not identify the customer. A number of factors contribute to this nonresponse. First, many customers purchase fuel from a number of dealers on a cash-and-carry basis. Second, some customers use several different energy suppliers and pay cash for deliveries. In both cases, few records are kept and efforts to get consumption records for households rarely are successful.

Refusal of companies to participate in the survey was not a significant factor. Some additional factors related to the quality of fuel records are discussed in the following section on data processing and imputations.

Usable Records

Of a total of 5,900 households that participated in the 1997 RECS, 5,898 used electricity (Table B5). For 81 percent of these cases, the electric utilities provided usable billing records. On the other hand, 229 sample households used kerosene, but the kerosene suppliers provided usable kerosene billing data for only 15 percent of these.

Households lacking consumption records because they do not pay fuel bills directly to fuel suppliers occur most frequently among users of natural gas and fuel oil (see Table B5). These households represent 12 percent of the users of natural gas and 23 percent of the users of fuel oil.

Not all the fuel records that were collected in the energy supplier survey could be used. For example, some records covered too few months and other records were incomplete (Table B5). The problem of nonusable records is small for the metered fuels (electricity and natural gas) since the partial-year records of electricity and natural gas were considered usable. For fuel oil and LPG, the problem of nonusable records was more serious, since 6 percent of fuel oil and 4 percent of LPG records were nonusable. Partial-year records for these fuels were not acceptable.3

3.

The number of households with partial-year records, as a proportion of total households using the fuel, is 7 percent for electricity and 7 percent for natural gas.

Energy Information Administration

A variety of information from household respondents as well as from suppliers was reviewed and used as a basis for declaring a fuel oil, kerosene, or LPG record complete or incomplete. Questionnaire information from respondents include the number of suppliers and an estimate of the annual number of deliveries. Suppliers provided dates of onset and termination of service to the household.

Table B4. Energy Consumption Records for Survey Households Using Electricity, Natural Gas, Fuel Oil, Kerosene, or LPG, 1997

(Percentage of Households Using the Energy Source)

[blocks in formation]

"Data were unusable for electricity and natural gas if the records covered less than 5 months and included seasonal use (heating or cooling) or if the records covered less than 2 months. Data were unusable for fuel oil, kerosene, and LPG if the record covered less than 1 year.

"Households in this group are those that purchased kerosene primarily on a cash-and-carry basis. These households supplied estimated purchases of kerosene during the household interview. In addition, if a household indicated that it had the ability to use LPG, fuel oil, or kerosene-but planned no purchases during 1997-the household was assigned a zero consumption.

'These data exclude households that paid for some, but not all, uses of fuel.

'Represents or rounds to zero.

Source: Energy Information Administration, Office of Energy Markets and End Use, Forms EIA-457A-G of the 1997 Residential Energy Consumption Survey (RECS). RECS Public-Use Data Files.

Imputations

Households with nonusable records, as described earlier, and households with no records had their annual energy consumption imputed using nonlinear regression techniques. The equations were developed by using RECS sample households that had approximately a full year of acceptable data. Separate regression equations were developed for the five fuels: electricity, natural gas, fuel oil, kerosene, and LPG. These equations are described in Appendix C, "End-Use Estimation Methodology." Regression equations were used to estimate 15 percent of the electricity consumption, 19 percent of the natural gas consumption, 37 percent of the fuel oil consumption, 26 percent of the kerosene consumption, and 24 percent of the LPG consumption (Table B5).

The strategy for imputing consumption varied across fuels for two reasons. First, fuels differ in the number of ways they can be used. Electricity, for example, is used for a large number of appliances, water heating, space heating, and space cooling. Kerosene, on the other hand, is used almost exclusively for space heating. As a result, the equation for electricity includes a larger number of terms to represent all of the possible end uses. Second, the number of sample cases also influenced the analysis strategy. For the electric and natural gas equations, there was a large number of sample cases, allowing for the inclusion of a greater number of factors. For example, the electricity equations included a variable for the price of electricity.

A final adjustment was made to all imputed fuel quantities. To maintain the variance structure of the unimputed fuel-consumption data, an error term was added to the predicted fuel consumption rather than imputing a single value for all households with equivalent values for all independent variables in the regression equation. This allowed estimates for sampling error to be calculated without separating imputed from unimputed data.

Energy Information Administration

Missing energy expenditures data were imputed by applying a cost factor to the imputed consumption. The cost factor for electricity and natural gas was derived from the energy consumption records of households in the same neighborhood or geographic area as the household that had missing data. The cost factor for fuel oil and kerosene and LPG was based on regression fits for cost versus quantity for all fuel users.

[blocks in formation]

Source: Energy Information Administration, Office of Energy Markets and End Use, Forms EIA-457A-G of the 1997 Residential Energy Consumption Survey.

One group of households that was particularly likely to have their consumption imputed by use of the regression procedures was apartments in buildings of 5 or more units. The amount of their electricity consumption that was imputed was somewhat higher than the average for all households (30 percent); however, 66 percent of their natural gas consumption was imputed and all of their fuel oil consumption was imputed.

Estimation of Sampling Error

Sampling error is the random difference between a survey estimate and a population value. It occurs because the survey estimate is calculated from a randomly chosen subset of the entire population. The sampling error averaged over all possible samples would be zero, but there is only one sample for the 1997 RECS. Therefore, the sampling error is nonzero and unknown for the particular sample chosen. However, the sample design permits sampling errors to be estimated. This section describes how the sampling error is estimated and how it is made available to readers of this report who are interested in the precision of the estimates in this report.

Throughout this report, standard errors are given as percents of their estimated values; that is, as relative standard errors (RSE). The RSE is also known as the coefficient of variation. Computations of standard errors are more conveniently described, however, in terms of the estimation variance, which is the square of the standard error.

[blocks in formation]

For a given population parameter Y that is estimated by the survey statistic Y', the relative standard error of Y', is given by:

Thus the standard error of Y', is given by:

(1)

RSEY',

[blocks in formation]

For some surveys, a convenient algebraic formula for computing variances can be obtained. However, the RECS used a multistage area sample design of such complexity (see Appendix A, "How the Survey Was Conducted") that it is virtually impossible to construct an exact algebraic expression for estimating variances. In particular, convenient formulas based on an assumption of simple random sampling, typical of most standard statistical packages, are entirely inappropriate for the RECS estimates. Such formulas tend to give severely understated standard errors, making the estimates appear much more accurate than is the case. Instead, the method used to estimate sampling variances for this survey was balanced half-sample replication. The balanced half-sample replication method involves calculating the value for a statistic by using the full sample and calculating the value for each of a systematic set of half samples. (Each half sample contains approximately one-half of the observations contained in the full sample.) The variance is estimated by using the differences between the value of the statistic calculated by use of the full sample and the values of the statistic calculated by use of each of the half samples.

As mentioned above and in Appendix A, "How the Survey Was Conducted," the national total number of households is not estimated from the survey results. The household weights are ratio-adjusted so that the total weighted number of households equals the number obtained from the CPS. The same is true for the total number of households in the 15 cells mentioned above (nine Census divisions plus six States). The balanced half-sample replicate procedure used for RECS assumes that the CPS numbers are exact and are not subject to error. Any error in the CPS results can be considered as a bias in the RECS results and not as part of the sampling error for RECS. The weights for each half sample are also constructed such that the national total and the total for the 15 cells match the CPS numbers. As a result, the half-sample estimate for the RSE of the national total number of households and the RSE's for the totals in the 15 cells will always be zero. Also, the half-sample estimate of the RSE will be close to zero whenever the statistic involved is a household count that is close to a control total. Examples of this are the national total for the number of households that use electricity and the number of households that have a refrigerator.

Generalized Variances

For every estimate in this report, the RSE was computed by the balanced half-sample replication methods described above. This RSE was used for any statistical tests or confidence intervals given in the text, or to determine if the estimate was too inaccurate to publish (RSE greater than 50 percent).

Space limitations prevent publishing the complete set of RSE's with this document. Instead, a generalized variance technique is provided, by which the reader can compute an approximate RSE for each of the estimates in the detailed tables. For the statistic in the ith row and jth column of a particular table, the approximate RSE is given by:

[merged small][merged small][ocr errors][merged small]

where R, is the RSE row factor given in the last column of row I, and C, is the RSE column factor given at the top of column j. This value for the relative standard error can be used to construct confidence intervals and to perform hypothesis tests by standard statistical methods. However, because the generalized variance procedure gives only approximate RSE's, such confidence intervals and statistical tests must also be regarded as only approximate.

For a few table cells, there were no sample cases, hence no estimate and no RSE. As a result, some of the arrays of directly estimated RSE's had a few missing values. In such cases, the formulas given above for row and column factors still apply, but only after appropriate estimates have been substituted for the missing values.

The estimation procedure used to obtain the row and column factors does not use RSE's that are less than 1.0 percent or greater than 50.0 percent. In addition, if the statistic for a cell is not listed for any reason (high RSE, small cell sample size, or missing data), the RSE for that cell is not used in the procedure. The RSE for this cell is treated as if there was a missing value for this cell. This convention is used because the product of the row and column factors frequently is an inaccurate estimate for these RSE's. Using these cells in the calculation of the row and column factors may result in factors that give inaccurate RSE estimates for other cells.

Energy Information Administration

Whenever a household count is a control total, its RSE is zero. Hence, RSE's of control totals are not used in the row column factor calculations. Rows that contain only control totals have a row factor equal to zero. Rows that contain only household counts that are close to control totals do not have a listed row factor. A footnote tells the reader that the RSE's for all statistics in these rows are less than 1.0 percent. This occurs because the half-sample estimates for the RSE's for all statistics in the row are less than 1.0 percent. The row factors for these rows should be a positive number but the number will be small.

Energy Information Administration

« SebelumnyaLanjutkan »