Gambar halaman
PDF
ePub

refer to a case of real discontinuity, we may simplify the work by putting c = 1 immediately, and only reckoning the values of the three remaining constants. The third row of figures headed "c known" refers to this method.

The series given are, I think, sufficient to show that even with a small number of observations (600-700) moderately close results may be obtained. It must be noted that we need not necessarily expect the experimental figures to agree closely with the simple à priori theory here applied. If a die be badly cut, the chance of any one face coming uppermost is not ; if a handful of dice be all badly cut the chance may in fact be different for each one, and the results obtained by tossing them will not be represented by a point-binomial at all. I think this is the reason for the erratic character of the figures in the last two sets, for certainly the dice used for them were very far from true cubes.

It should be remarked that to compare several sets of observations, the same method of reduction should be used for all: different methods lead to slightly divergent results.

Sec. 5. Continuous and Discontinuous Variation.
Frequency-Curves.

All these cases of artificial chance have one limitation in common they can only give rise to discontinuous frequency polygons. In tossing a handful of dice, for instance, we can only throw a whole number of sixes, say four or five, but not any intermediate number. Step-by-step variation like this occurs of course in several cases of "natural" statistics, e.g., in the number of petals on flowers, in the number of children in families, and so on, though there is no à priori reason for supposing that the frequency will in this case be represented by a binomial series.

On the other hand, however, we have numerous cases in which there is no obvious unit of variation; the variation is apparently continuous, e.g., in the case of measurements on men or animals. We now want a curve such that its area between any two verticals will represent the frequency of values of the variable between corresponding limits for the theoretical representation of our statistics; a discontinuous polygon no longer suffices. In practice of course we have still to go by arbitrary units; we do not count, for example, how many men there are of one exact height, but how many there are with heights between 5 ft. 1 in. and 5 ft. 2 in., how many between 5 ft. 2 in. and 5 ft. 3 in., and so on.

Normal Curve.

Experience shows that such frequency-curves take a very great variety of forms, ranging from approximate symmetry

through all degrees of skewness, to distributions with a maximum at one end of the range, or maxima at both ends and a minimum near the centre. Until the publication of Professor Pearson's memoir the only curve which had been shown to be at all generally applicable was Laplace's or the normal curve

[blocks in formation]

Yo being the ordinate at the origin and a constant. This curve has been considerably employed, notably by Quetelet and by Mr. Francis Galton, for representing various statistics, and its figure is now fairly well known. The curve is symmetrical and tails off more or less rapidly in both directions, according to the value of a, asymptoting to the base. Only one way of deducing it immediately concerns us: it may be considered as a limit to the form of the symmetrical binomial

[ocr errors]

for all values of n. That is to say, we may consider a continuous variation curve to be generated by rendering possible some form of interpolation between the absolute units of the point-binomial. The question thus suggested itself to Professor Pearson: would a curve having the same relation to the skew binomial as the normal curve has to the symmetrical binomial prove equally useful in statistics? Was it not probable, in fact, that being more general it would be more useful?

Generalised Binomial Curve.

The curve thus obtained (Professor Pearson's Type III) is of the form2

[blocks in formation]

It is a curve limited in range at one end (at a distance (-a) from the maximum ordinate) and tailing off to infinity at the other. At the limited end the curve may either be tangential to the base, or may rise sharply at any angle with it. Both of these types occur in actual statistics, the first for example in curves showing the frequency of various heights of the barometer, the second in frequency diagrams of divorce with duration of marriage, of mortality with age, at certain periods, and so on. For special values of the constants the maximum ordinate of the curve may be at the limit of the range. Evidently this is a great advance in generality on the symmetrical normal curve.

[ocr errors]

3

2 This curve was obtained earlier by De Forrest, vide Nature," vol. lii,

p. 317 (1895). ("The Analyst," vols. vi, ix, and x.)

3 Various examples will be found in Professor Pearson's paper loc. cit.

Hypergeometrical Series and Derived Curves.

But we need not stop at this point in the analogues with artificial chance. Let us return to one of our examples of the case of a point-binomial, Queletet's drawings of balls from a bag. To get this point-binomial each single ball had to be returned when drawn and recorded, then another ball drawn and so on; the records only being grouped into clusters of six afterwards. But suppose all six balls had been drawn at once, what then? Evidently a point-binomial would no longer serve to describe the results. There are now four constants instead of three: the chance of drawing the first single ball p, the number N in the bag, the number n drawn out at once, and the distance c between the ordinates. These data lead to a more general form of polygon, that may be called, after the series to which it corresponds, a "point-hypergeometrical." Using the same geometrical relation that holds between the normal curve and the symmetrical-binomial, Professor Pearson derived from this two new curves, the first (his Type IV):—

[merged small][merged small][ocr errors][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][ocr errors][merged small][merged small][merged small][ocr errors][merged small][ocr errors][merged small][merged small]

The first or second form results from a given hypergeometrical according as the number of balls withdrawn lies within certain limits or no, both are of great generality, and include Form (2) and the normal curve as special cases. Form (3) is of unlimited range in both directions, Form (4) limited in both direc tions; both curves are, in general, skew. The justification of all these curves, derived on the hypothesis of possible fractionising of the discontinuous steps of the polygon, lies in experience; all are found to give very close representations of frequency polygons in many classes of statistics.

Curve (4) is the form which we shall find occurs uniformly in the distributions dealt with in the sequel, so it is as well to note the chief relations of its shape to the values of its constants.

The value = 0 corresponds to the maximum ordinate of the curve. The curve extends to a distance a1 to the left, and a to the right of the maximum ordinate.

If the curve is to be tangential to the base at both extremities, m, and my must both be positive and greater than unity.

If a1 = a2 or m1 = m, the curve becomes symmetrical (but not normal).

The ordinates will generally be vanishingly small for some distance before the curve comes to the limit of its range in either direction. The smaller the ratio of m1/a, the further do the ordinates remain sensible towards either limit.

It may be said that too much weight cannot be attached to the value of the range, for all these curves, normal included, are only limits to polygons which themselves never extend to infinity, although the curves do. We can, however, often check the range in practice. There may be some obvious physical limit to our frequencycurve in one direction, and we can see how close the range gives this. The range may thus justify itself: if we find that it gives one known limit reliably, why not accept the other limit as equally valid? There may be really a limited range in types (1) and (3), but there is no means of getting it; this does not seem to be any reason for refusing to accept the limits where they are determinate.

Sec. 6. Criterion.

As I stated in briefly describing the method of fitting a pointbinomial to a series of observations (sec. 4), the same method remains available for fitting any type of frequency-curve. But before we can proceed to actual fitting we require some criterion to tell us which of the forms (1)-(4) to use. This criterion is afforded very simply by the moment-coefficients, M2, M3, &c. Let

[merged small][merged small][merged small][merged small][subsumed][ocr errors][subsumed][merged small][merged small][merged small][ocr errors]

(approximately) for the observations, a normal curve may be used. If

[blocks in formation]

the distribution is of the form given in equation (2)-(generalised binomial type).

If

(3) * is negative

whatever the value of B1

the distribution is of the trigonometrical type, equation (3); and finally, if

(4) K is positive

whatever the value of ẞ1

the distribution is of the limited range type equation (4).

The number we will call "the criterion," as it tells us to which type of distribution any frequency polygon belongs. In the first and second cases the relation = 0 will never, of course, be exactly satisfied; it may be read " its own probable error."

K

must be small compared with

It should further be noticed that for the normal curve B2 = 3; for the symmetrical form of the trigonometrical curve ẞ is greater than 3; while for the symmetrical form of the limited range curve B is less than 3. These relations come at once by putting B1 = 0 in the criterion.

In fitting any statistical frequency polygon to a theoretical curve, the moment-coefficients must be reckoned, the B's and the criterion determined, before any further steps can be taken. One of the most curious points that has presented itself in handling statistics by these generalised curves is the apparent constancy of type, i.e., of the sign of the criterion, in certain classes of material.

Sec. 7. Mode, Median, and Mean.

Let us now pass on and consider certain terms that can be defined generally without reference to any special type of distribution.

Let C (Fig. 2) be the centroid or centre of gravity of any frequency-curve. The vertical through C meets the axis of

[graphic][subsumed][ocr errors][subsumed][merged small]

measurement in a point termed the mean or average value of the variable. We are using this term in its ordinary sense; it scarcely requires further explanation.

Let dd be a vertical dividing the area between the curve and axis into two equal portions. The point in which dd meets the axis is termed the median. Values of the variable greater and less than the median occur with equal frequency.

Finally let the point in which the maximum ordinate cuts the

« SebelumnyaLanjutkan »