### Nearest Neighbour Analysis

I have in mind here the comparison of the dispersal of 2 or more names in a modern context. For example:-

- if you wished to compare the spread of Chinese and Indian names in a city suburb.
- Or compare the distribution of ‘Catholic’ and ‘Protestant’ surnames in Belfast between the 1970s and now.

NNA is a measure of distribution, and not of ‘pattern’ and does have limitations. The higher the number of points, the higher the reliability of the result; and 30 surname plots would be considered the minimum.

######
Procedure

- Measure the distance between each and every surname point, and average the result (D
_{obs}). - Derive the density of the points (d) , by dividing the number of surname points, by the area under consideration(e.g. ward, parish, registration district, county).
- Calculate the expected mean of a random distribution of points (r
_{e}) over this area:

r_{e}= 1 divided by (2 * square root of d) - The nearest neighbour statistic can then be obtained by dividing the Average distance by the mean random of distribution:

D_{obs}/r_{e}

The Value of the nearest neighbour statistic (R_{n}) can range from 0 (extremely clustered) to 2.15 ( an ordered and uniform distribution). A value of 1 would suggest a random distribution

It is now up to the surname analyst to explain the resulting distribution

######
Be aware that: the above equations are based on two assumptions:

- The points are located within an infinite area
- The points are free to locate anywhere within that area

These are severe restrictions in the case of surname study as quite a few factors come into play: propinquity of kin, economic conditions, lines of transport, geomorphology. And if the area under study changes, then this affects the density of the points.

“Because of this problem of study area delimitation, one should be very wary indeed of comparisons made between nearest neighbour analysis results from different areas.”

But

“The technique provides a very useful descriptive measure of point patterns, particularly for quantifying the increase or decrease in dispersion or clustering of a pattern through time, provising the definition of the study area remains the same.”

David Ebdon Statistics in Geography, 2nd ed, Blackwell, 1985, p148-9.

So this technique may perhaps be useful for comparing the temporal change in a surname distribution within a specified area, such as a parish or registration district, provided its boundaries have not changed in the interim, or for comparing the distribution of two surnames in the same area, or just perhaps the distribution of a widely-dispersed surname, using the area of England, Scotland, or Wales as a baseline. But the resulting index number for a surname, does not imply that the distributions are the same as it is “possible for arrangement of points which are very dissimilar to have identical mean nearest-neighbour distances.”