Gambar halaman
PDF
ePub

that which Professor Karl Pearson has given, in pari materiâ, for the statistics of house valuations ("Contributions," No. II, Transactions of the Royal Society, p. 398). But the identity is only apparent, since N denotes in Professor Pareto's formula all the incomes at or above a certain x, while the corresponding symbol in Professor Pearson's formula denotes only the valuations at a certain æ. It is true no doubt that the difference between the two curves is rather in form than in fact.12 Still I am not satisfied that the form preferred by Professor Pearson is decidedly preferable. I submit that it might be possible to find other forms equally appropriate.13

I do not forget that there are certain theoretical arguments in favour of the Pearsonian formula, and I have allowed a certain weight to them in the critical paper above referred to. I have allowed great weight to the authority of Professor Karl Pearson. Where opinions on a matter of this sort differ, the presumption is certainly in favour of the author who has made the greatest advance in the science of Probabilities which has been made since the era of Poisson.

III. Referring to Professor Pearson's splendid series of "Mathematical Contributions to the Theory of Evolutior," in the Transactions of the Royal Society, I may advert here to one of his latest improvements in the method of statistics. I allude to his having been the first to point out the most probable determination of the constant which characterises the law of frequency for two correlated phenomena" ("Contributions," No. III, p. 265). As he intimates, this most probable value has the same sort of advantage over others as the most probable value of the constant in the simple case of the normal law of error has over other determinations of that constant.15 The formula had been obtained before by a method16 which may be described in Professor Pearson's phrase as of somewhat arbitrary character."

66

IV. I take occasion to adduce a property of the theory of correlation which has some bearing on an important practical problem, namely, how to identify individuals by anthropometric records.

The method of identification devised by Dr. Bertillon and

12 Owing to the smallness of the constant B (see Appendix). This was pointed out to me, along with some other remarks of which I have here availed myself, by Professor Pearson himself.

13 See Appendix, Note II.

14 The constant r in the formula given in Appendix, Note III; to which the three constants employed in some statements of the formula (e.g., Journal of the Royal Statistical Society, 1893, p. 671) may, by a well known theory of Mr. Galton, be reduced.

15 See "Methods of Statistics," by the present writer, Journal of the Royal Statistical Society, Jubilee volume, 1885, p. 188, where observe that the term modulus is used in the same sense as Professor Pearson's standard deviatiɔn.

16 I allude to the method which I have described as the most accurate in the paper on "Exercises on the Calculation of Error" in the Philosophical Magazine for July, 1893, pp. 100 and 101.

now generally practised, consists in classifying persons according to the degree in which they possess each of several attributes, e.g., stature, head length, &c. Three degrees of each attribute are commonly distinguished: large, medium and small. If the attributes are independent of each other, it may be expected that the whole set of persons who have been observed will be divided into 3" equal groups, n being the number of attributes.

For example, when there are two attributes, let x and y denote the deviations of each from its mean. On the axis x, a horizontal line through O, take the points P and P' such that the ordinates through those points divide the whole number of observations into three equal groups. This will be the case if OP OP' =

305 x modulus of the first attribute.

[blocks in formation]

On the axis y, a vertical throngh O, take points Q and Q' such that the co-ordinates through those points also effect a tripartite division. If the attributes are independent of each other, it may be expected that the whole set will be divided into nine groups, such as the following:

SL, small degree of attribute I, large of, II
ML, medium

SS, small

[ocr errors]

I,

II

[ocr errors]
[ocr errors]

1, s:nall,, II

aud so on.

(The compartment MM has not been labelled in the figure, to avoid confusion).

But, as a matter of fact, attribute are frequently correlated ; so that a large value of No. I attribute generally goes with a large value of No. II. In this case the whole set is not divided into nine equal sub-groups; the anthropometer finds himself with respect to several of his compartments in the position of old Mother Hubbard.

The remedy which has hitherto been employed for this evil is to break up each of the three divisions formed by the three degrees

small, medium and large of a first attribute into three sub-divisions formed by such values of a second attribute as are tentatively found to break up each division into three equal sub-divisions. Geometrically the two horizontal lines (not represented in our figure) which break up each division into three will not now, as in the case of independent attributes (represented in our figure) be the same for different divisions. The principle is extended to the case of any number of attributes. This method, as practised by the distinguished anthropometer who occupies the Bureau of Identification in Scotland Yard, seems to leave nothing to desire on the score of practical efficiency.

A certain advantage in respect of theoretical elegance may, however, attach to another remedy for the defect due to correlation, which is afforded by employing attributes which are not correlated. Such a set of attributes can always be found, formed by combining the original attributes in definite proportions as follows from the mathematical theory of the subject."

The theory may be illustrated by a concrete instance, that of stature and cubit which are closely correlated, the coefficient r being approximately 9.

Dealing with observations on some 300 individuals submitted to me by Mr. Galton, I find for the independent characteristics— 0.2 stature (in inches) + 0.66 cubit (in inches)

First characteristic =
Second

=

[merged small][ocr errors][merged small][merged small]

For example, John Doe is 60 inches in height, and has a cubit length 18 inches.

[merged small][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small][merged small]

These characteristics each afford a tripartite division of the whole set of observations by two limits demarcating small (below lower limit), medium (between both limits), and large (above higher limit). The position of the limits for each characteristic is found to be as below.

For the first characteristic

(Small).............25°1............ (medium).............2592............. (large).

For the second characteristic

(Small)..................................1·5................................(medium)...................................1'7.................................(large).

John Doe, for example, is small with respect to the frst characteristic, and also small with respect to the second.

If Peter Doe has the same cubit length as John, but stature = 70, then Peter is medium with respect to the first characteristic, and large with respect to the second.

This principle may be extended to any number of attributes. There are always as many independent characteristics as attributes. I have roughly worked out the characteristics for three attri

17 See Appendix, Note III.

butes in a concrete case; that of stature, cubit, and knee height. The characteristics are approximately

[merged small][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][ocr errors][ocr errors][ocr errors][merged small][ocr errors]

And for the limits (each effecting a tripartite division of the observations, and all together a ninefold division)

Limits of characteristic I................ 276 and 28.6 (inches)

[ocr errors][ocr errors][ocr errors][ocr errors][merged small]

0'66,, 0.94

III................ —4·12,, · 3.90

[ocr errors]

For instance, let John Doe have stature and cubit as before, and let his knee height be 21 inches (a common observation, 2018 being the mean).

[merged small][merged small][ocr errors][merged small][merged small][ocr errors][merged small][merged small][merged small][merged small][merged small][merged small]

Comparing these measurements with the limits before given we see that John Doe is

[merged small][ocr errors][ocr errors][merged small][ocr errors][merged small][ocr errors][ocr errors][merged small]

Perhaps the closeness of the limits for characteristics other than the first may excite suspicion. For the purpose of criminal identification, the compartments, it is said, must be wide. The natural change of the body, and perhaps deliberate attempts at disguise cause considerable "probable error" in the measurement of each individual, so that the boundary between the compartments becomes indefinite. I find, however, both à priori from the mathematics of the subject, and à posteriori in one concrete case-that of cubit and stature- that the proposed method is not more precarious in this respect than the accredited

one.

The only disadvantage of the method is that it is more troublesome. I do not conceal the magnitude of this disadvantage, but I think it worth while putting the theory on record, on the chance that others may be able to make some use of it.

APPENDIX.

I. Professor Pareto's second approximation for the number of incomes at or above a certain amount is N =

A
(x + a) a

10 - Br

where a and B are additional constants. The differential (with respect to a) of this function is to be compared with functions representing the number at a certain amount. When ẞ is small the differential does not differ much in form from the original, which is identical in form with Professor Pearson's curve for representing the number (of house-valuations) at a certain amount.

II. A curve suited to represent the number of houses of a certain value, or incomes of a certain amount, and similar statistics, may be obtained on the lines indicated in the Appendix to my paper in the Journal of the Royal Statistical Society for September, 1885. This curve represents in a particular case the frequency of magnitudes each of which is a function of one of a set of magnitudes which obey the normal law of error. In general, as pointed out elsewhere (Philosophical Magazine, November, 1882), a curve thus derived from a normal curve would be itself normal. For example, if human statures conform to the normal law, the squares of human statures will also conform. This is due to the fact that the range of fluctuation, as measured by the modulus, which is about 37 inches, is small compared with the average, which is, say, 67 inches. It is otherwise where this relation does not hold. In the case of temperatures, for example, assuming that this phenomenon obeys the normal law, the squares (and other functions) of temperatures forming a normal group will themselves form such a group if the temperatures are measured from absolute zero, the average being such a one as is ordinarily experienced, say Fahrenheit zero. But if the same temperatures are measured from that zero their squares will no longer forin a normal curve, but one of which the equation is y =

1

[ocr errors]
[ocr errors]

e, a curve which

continually trends downwards to the right of the origin; at which the ordinate is infinite. There is not the game objection to an infinite ordinate that there is to the infinite integral involved in Professor Pareto's first approximation (above, p. 533); and the curve is I think in this respect not ill adapted to the statistics under consideration. A rough representation of the distribution of house values given by Professor Pearson in his Contributions No. II (Transactions of the Royal Society, 1895, p. 398) have been obtained by me by putting c = 175 in the above written curve (107. being taken as the unit of the abscissa). It is indeed very rough, with an error, as measured by a method which Professor Pearson has employed, of some 20 per cent. on the area given by observation, which ought to be conterminous with the curve.

1

But then it is only a first approximation. To obtain a second approximation let us suppose our curve to be generated by taking, not the square, but some other simple function of magnitudes obeying the normal law, y = --e. Whereas in the first approximation we transformed this curve by putting = 2, let us now put = (x + a)2, where a is a new constant. According to the rule for the transformation of frequency-curves, we shall obtain a fairly simple curve, which continually trends downwards to the right of the origin (at which the ordinate is infinite) if a√2 × c. The equation of the curve is α <

[merged small][merged small][merged small][merged small][ocr errors]

being substituted for § after the transformation has been effected.

« SebelumnyaLanjutkan »