Gambar halaman
PDF
ePub
[ocr errors]

unit would be submitted for re-punching. A card would be punched for each rejected work unit. These cards would be sent to the computer to be processed with the work unit tape and the run to drop rejected work units. The rejected work units would be dropped and the remaining ones would be processed through the computer edit.

The normal process would be to re-punch the rejected work units, destroy the old cards, and repeat the card-to-tape operation for I rejected work units. Providencia chose this 2 alternative in order to permit the accepted work units to move immediately to the machine Credits.

4. PLANNING FOR MACHINE EDITS

After the data have been transferred to magnetic tape, the next procedure will be to have all of the entries on the tape undergo : a thorough machine edit by the computer to prepare the data for further processing. A è full edit procedure (not usually practical) = would incorporate all the individual routines that are developed by subject-matter special- ists in the NSO.

4.1

Control units for machine editing

The unit of control for the machine edit will be the unsorted tape records for provinces. They may consist of one or more of the work units as established earlier. As mentioned earlier, ED's within a province can be put onto the tape in any sequence as long as all ED's within an edit batch are from the same county. The edit batch can be as small as a single card-to-tape work unit or as large as all the work units for an entire province. As with the card-to-tape work units, the most important thing to consider in planning the size of a work unit for editing is the availability of the computer.

The initial work units for editing should be small. Also, there may be some benefit from printing those records which require more than the specified minimum number of corrections of imputations. The subject-matter analysts could review this detail to make sure the edit program is functioning properly. When it is determined that the edit program is satisfactory, the work units can be larger; this will reduce time for set-up on each run.

4.2 Methods of error correction

As discussed in chapter V-3, there are three types of error situations which must be corrected. These are: (a) omissions, (b) impossible codes, and (c) inconsistent entries. They result from enumeration error, coding error, or key punch error. No matter what the source or what type of error, these situations are similar in that a valid code must be inserted in place of a blank or erroneous entry.

When possible, a correct code is determined on the basis of other entries for the same type of establishment in the same ED. That is, an entry is imputed or assigned for most items and omitted for a few items.

One method of assigning an entry is to determine the estimated distribution of possible answers based on data from previous censuses or surveys, then fill blank entries in accordance with the general distribution.

Another method of imputing entries is to use "live" data from the census itself. This procedure would assign the code of the last previous establishment that shares certain characteristics, rather than using a weighted average of possible responses. In this way,

local variations in the distribution of characteristics are automatically taken into This method is sometimes referred

account.

to as the "hot deck" procedure.

Under certain circumstances it is inappropriate to fill a blank entry with a valid code. This is true if there is insufficient information upon which to base a decision. In this case, it is often better to insert a code for "not reported" than to assign a code which might distort the results. Even if sufficient data are available, the computer edits necessary to determine accurately which code should be assigned could be extensive. If the item is of secondary importance it would be unwise to use computer time and space with extensive editing which may not significantly improve the data. Thus, the category "not reported" could be published for several of the items on the Providencia questionnaire.

4.21 Definition of "hot deck."--Many of the imputations in Providencia will be made by the "hot deck" procedure. This method is so named because the data are constantly changing. Storage locations in the magnetic core of the computer are assigned for those items which will use this method of allocation.

This location then becomes the "deck" or "hot deck" cell where the data are stored. With the response of each new establishment, the answer to a specific question is inserted into the cell to replace the response of the previous one. Therefore, the data are changing rapidly; they are known as "hot" data. Since the deck is constantly being updated with a new entry, whenever a blank item occurs that uses the "hot deck," the data are taken from the most recent entry of a particular "hot deck" cell. The amount of storage space needed in the computer will depend on the number of "hot decks" the edit routine requires.

4.22 Definition of "cold deck."--In the event that the first establishment to be edited has a blank item, there must be values available for imputation. These initial or starting values, as developed by the subject-matter

technicians in the NSO, are referred to as "cold deck" entries; these represent the most likely answers. Also, when the computer finishes processing the data for one county and begins with a new editing unit from another county, it may be inappropriate to use the information in the hot decks, since it was derived from the previous county. Therefore, for each hot deck storage cell, the computer has stored in its memory a "cold deck" entry. Before an edit work unit is examined, the computer transfers these "cold deck" cells into the hot deck storages. The "cold deck" response is kept in the hot deck storage only until it can be replaced by a valid entry for that cell from an establishment in the new county. Very soon, all of the "cold deck" entries will have been replaced, and the "hot deck" storage will be totally filled with locally reported responses.

4.3 Edit diary

At the time of the edit run, a diary will be prepared. The diary will provide information which will enable the analyst to determine whether the edit procedure has an adverse affect on the data, and whether the edit rules are reasonable. It is also a means of evaluating the quality of the census.

4.31 Description of diary.--The edit diary will be prepared by county within the province (an ED is considered too small for this purpose). If individual records are printed because the imputation rate for the county exceeds the specified maximum, the record would be printed in two ways: (a) as it appeared on the punch card and (b) as it was changed by the edit. A summary would give, in addition to basic control counts, the total number of imputations for each item and the imputation rate for each. Such information can be extremely valuable in evaluating current censuses and planning future statistical programs

4.32 Review of diary.--The reviewer will review the diary to determine the acceptability of the data and to take corrective action where indicated. For purposes of control, the first items that should be checked are the tabulated counts of the number of establishments in the county. The county

counts tabulated after the machine edit should agree with the accepted county counts that were produced after the review of the card-totape diary.

The next item to inspect is the number of imputations and the imputation rate. It is possible that imputations for a single count could exceed the acceptable rate but that it would have little or no effect on the data for the county, province, or other publication area. A pre-determined error percentage should be agreed upon, and those counties which show an error rate higher than the acceptable figure will have to be fully reviewed and manual corrections made. The Providencia NSO plans to include in their published reports some statistics on the extent of imputations for the principal items, and the reason for the imputation procedure.

4.33 Corrective action.--When counts tabulated during the edit do not agree with the accepted card-to-tape counts, the discrepancy must be resolved. When the imputation rate exceeds the desired limit, there is a possibility that the instructions for editing-coding or the procedure for the machine edit should be revised. This type of corrective action normally would be handled in the pre-tests or in connection with the review of the first few work units that are edited.

If the computer programs have been accurately written, they should be more reliable and more consistent than a manual editing procedure. Therefore, when a county is rejected, it is likely to be an unusual

one; for example, it may contain blanks or inconsistencies because the editor-coder omitted many codes or the enumerator did not get the information. If the rejected county will affect the data for the publication area, the reviewer may want to re-process all or part of it or, as a last resort, attempt to have an enumerator get the desired information.

5. MACHINE EDIT SPECIFICATIONS

The computer editing consists of tests for scope and reasonableness, activity coding, consistency checks, corrections and imputations. The speed and accuracy of the computer will be utilized for further, more sophisticated edits than were performed manually. Further, the machine edits cross-check the totals that were coded manually; for some items, the machine edits check the key punching also. The records are processed through all of the edit steps before being rejected or accepted. This permits all errors to be detected and individually noted on an error print-out sheet. The specifics of the machine edit for scope and reasonableness are outlined in sequence A.

5.1 Tests for scope and reasonableness (Sequence A)

The three scope tests (A1 to A3) are designed to determine whether the nature of the primary activity of the establishment comes within the scope of the industrial census.

The first test of reasonableness (A-4) seeks to find out if the report makes "economic sense." This is done by comparing the reported current income against the reported current outlays. The relationships are not considered "reasonable" if the report shows that current outlays ("out-of-pocket costs") amount to less than 40 percent of current revenue. At the other extreme, out-of-pocket loss situations (that is, where payrolls,

materials, etc., costs equal or exceed total receipts) are not considered reasonable.

The final test in this sequence, A-5, asks only that total value of shipment and receipts be largely accounted for by the detail entries appearing in item 18. (See detail flow chart for Sequence A--exhibit V-6-4.) The following specifications for the five "Scope and Reasonableness Tests" are concerned with the records relating to the manufactures questionnaires (long and short). Modifications of these specifications are required for the mining and electricity and gas forms, especially in item number references.

5.11 Step A-1: Value of resales (VR). Does the value of resales (VR) exceed the sum of the product entries and contract work? If this is the case, the entire questionnaire is printed along with the message "Out-of-Scope Business," plus any other errors detected later in this process. If there is no entry for VR, the edit checks to see if there is an entry for products bought and resold (cost of resales). Then VR is imputed as 100/85 of the cost of resales.

5.12 Step A-2: Value of receipts for repair and installation (VIR).--If there is an entry for VIR, does this entry exceed the sum of products and contract work receipts? If it does, the records are flagged and the message "Out-of-Scope Construction," is printed along with all items on the questionnaire. 5.13 Step A-3: Value of other miscellaneous receipts (VO).--If there is no entry for VO, the entries have passed the scope tests and are ready for step A-4. Otherwise, do the entries for VO exceed the sum of products and contract work receipts? the message "Out-of-Scope" is printed along with all records for the questionnaires.

If they do,

5.14 Step A-4: Ratio of input to output.--Input is the cost of materials and payroll; output is the value of shipments and receipts. The relationship between input and output should be reasonable.

The ratio is completed by adding total cost of materials and total payroll and then dividing this sum by the item 18 total. If the resulting ratio falls in the range between 0.4 and 1.0, the edit proceeds to step A-5. If it does not, the edit substitutes the sum of the detail for each of these critical items rather than the entry shown on the total line. If such substitution for the total in items 7, 13, and 18, tested successively, does not bring the ratio within the acceptable range of 0.4 to 1.0, the report fails the A-4 test. Such failures are flagged by the printed message "A-4 Reject."

If substitution of the sum of the detail for one or more of the entered totals yields a ratio within the acceptable range of 0.4 to 1.0, the edit corrects the total and proceeds to step A-5.

5.15 Step A-5: Total value of products, contract work, and other receipts.--Does the sum of the entries for value of products, contract work, and other receipts in item 18 equal the total of item 18 plus or minus $25,000? If it does, and the records have not been flagged, the scope and reasonableness tests have been passed. If it does not, a message is printed indicating this fact and whether this sum is smaller or larger than the given tolerance.

5.2 Activity coding (Sequence B)

This sequence establishes the proper classification of manufacturing establishments according to primary economic activity.

5.3 Consistency checks, corrections and

imputations (Sequence C)

This sequence tests all records against standards which were determined from the actual reports submitted by industry in the reference year, 1975. Editing by ratios then applies these separate standards in passing judgement on entries to the 12 principal items of inquiry. This method is also applied to provide a valid basis for imputing missing item totals.

6. TABULATION OF DATA

A great deal of time, money, and effort will have been expended on the census up to #this point. Unless the statistics are tab#ulated and printed in understandable form i with minimum delay, they will lose some of their effectiveness. Data that are produced from an industrial census are presented in tabular form. Tabular presentation can be defined as an orderly arrangement of numerical information in columns and rows. A welldesigned table presents statistical information in a concise and orderly fashion. of the tables that will be published in Providencia will be computer print-outs.

6.1 Review of table formats

Most

As described in chapter III-6, heading consists of the table number, title, and headnote when necessary. The column head is that portion of the table which identifies the

[ individual column and describes the data in each column. The stub or caption describes the specific data in a row and makes clear the relationship among items.

The body consists of the statistical and other information appearing in the tally cells. The cell is the basic unit of presentation. The description of a cell entry consists of the combination of the row stub and the column

head. These cells are tabulated from the collected data.

Table formats should be developed early in the census planning to insure, first of all, that all items and categories are provided for in the collection of data. The table outlines are usually set up by the subject-matter specialists in close collaboration with the data-processing technicians. The latter will need to analyze the table outlines in order to develop the most efficient procedure for writing the computer programs--editing, sorting, interpreting concepts, performing computations, balancing, etc.

6.2 Appraisal of resources

The cost, time, and method of tabulation are mainly dependent on the personnel and machine resources that are available. The number and type of tables and the amount of data will also have a definite effect, but not as great. Well-trained and highly capable personnel can do a great deal to compensate for lack of machine resources; however, a large, high-speed computer can do little to compensate for the lack of human resources.

6.21 Machine capabilities.--All digital computers in use today have the ability to process a census. The speed and ease of processing may vary greatly. The computer with the fastest internal speeds (calculations and data movement) is of little benefit if the amount of memory is too small, and/or the input/output media is slow. Small memory may require that the more complex tables be prepared by two or more runs (programs). Input media such as cards or very slow tape units will slow the computer to their speed.

An initial input of cards, as has been suggested for Providencia, combined with high speed tapes is quite satisfactory. A computer with high speed input/output units and adequate

« SebelumnyaLanjutkan »