• Home
    • About the Guild
    • About one-name studies
    • Starting your ONS
    • Conducting ONS (videos)
    • Join Us
    • Guild Shop
  • Studies
    • Surnames A-Z
    • Recent Registrations
    • Registered Websites
    • Registered Societies
  • News
    • General News
  • Forums
    • Guild Facebook page
  • Events
    • Calendar
    • Conference & AGM
    • Seminar events
    • Webinar Events
  • Resources
    • Journal
    • Members’ Websites
    • DNA
    • Modern Surnames
    • Those Who Served
    • Newspaper Index
    • Guild Indexes
    • Pharos ONS Courses
    • Speakers
  • Help
    • Reset your password
    • Contact Us
  • Log In

Guild of One-Name Studies

One-name studies, Genealogy

Is your surname here?

    • 2,524 members
    • 2,297 studies
    • 8,188 surnames

Macro scale

Investigating surname distribution and frequency – the macro scale

The distribution of the leading British surnames in 1881:

SMITH JONES WILLIAMS
TAYLOR BROWN DAVIES
EVANS THOMAS WILSON

(Maps created using Surname Atlas © Archer Software.)

These maps illustrate the fact that even leading names are not evenly distributed. Each has its own signature. These individual distribution patterns are detectable, even in the 21st century

  • Place these names into categoraries i.e. Patronymics, Occupational, Locatives, Topographicals, Nicknames. The main categorary – in this instance Patronymic – can be subdivided into Genitival (Jones, Williams, Evans, Davies) or straight (Thomas).
  • How many of these names are of Welsh origin?
  • Why do Welsh names predominate? Consider also the relative population sizes of England and Wales. Compare with leading Scotiish and Irish names.
  • What percentage of the top 9 names are Welsh? (For the late 20th century, about 50% by number of bearers) – a living example of the contribution of Wales to the socio-cultural complex that is Britain.

  • The overall frequency curve for all names begins to flatten out into a very long tail. The UK as a whole has very,very many names with only a few bearers. Typically, these rare names are locative.

    The graph starts near the 0; 0 point, rapidly rises to about 90; 10 and then slowly rises to 100; 100. It is difficult to understand what is going on in this graphic presentation, as all the activity seems to take place for low values of “Percentage of surnames”.

    Re-displaying the data on a semi-logarithmic scale is more revealing.


    (Source: © Ken Tucker, Carleton University)

    Now, one can see that the most popular 1% of all names, accommodate over 70% of the population, and that 90% of the surnames, from 10% to 100% -the rare surname types- accommodate a mere 9% of the population. The distribution of surnames is thus highly skewed.

    (Actually, the above 2 graphs are for contemporary US names – thanks, Ken – but the slopes would be very similar for the UK. Canadian surnames are similar, suggesting that the shape of the curve is not peculiar to the USA but is intrinsic at least to English language surname distributions. Source: Ken Tucker)

  • For England and Wales, the top 300 surnames encompass 36% of the population of England and Wales (and if E/W has 0.5 million surnames(?) then, the top 500 (as 1% of names) surnames should then cover 70% of the population)
  • An analysis of the NHS Central Register (England and Wales) found that 965 surnames covered about 50% of the population of England/Wales/IOM, with the following frequency distribution:
  • population surnames
    10% 24
    20% 84
    30% 213
    40% 460
    50% 954
    60% 1,908
    70% 3,912
    80% 10,214
    90% 100,000
    100% 1,071,603

  • Notice the long tail that forms after the 1,000th surname. Whereas the first 10% of the population covers 24 names, the last 10% contains 90,000.
  • This is not to say that there are 1 million names in England and Wales. The NHS Central Registry was not built for this purpose, and is subject to list inflation. Besides the national population is in constant flux; new names arriving (or being created through hyphenation), rare names disappearing through emigration or on death. One can never give a definitive figure, merely an indication. To be very cautious, I would see that it is merely indicative that the size of the [UK] surname pool is probably in the range of 0.75-1.25 million names, although a recent unpublished study would suggest a lower range.
  • Cumulative frequencies for Scotland, from 3 historic surveys:


    (Surnames prefixed with “Mac” or “Mc” were counted as one.)

  • For the Victorian sample, the top 50 names accounted for 29.65% of the sample; for 1935, 26% of the sample; for 1958, 25.53% of the sample. Point to note: the Victorian sample size was less than a quarter of the later surveys.
  • For the UK, the top 100 surnames cover 20% of the population.
  • This is an exercise that can be repeated from the ranking of top names on this site, and from the names on the GRO(S) site
  • For any large database of surnames, the frequency/ranking complies with Zipf’s law i.e. there is a direct relationship between the raw figure and the rank.
  • If the data is plotted on a log-log scale, then the result conforms to a straight line that represents a power-law.

(For more see Statistics section)

National surname signatures

The data can be expressed in other ways. For example, the next table is an extract from the 1881 UK census data. (I took the data from the Surname Atlas CD.)

A B C D
Frequency Names No. of Names Population of all names at this frequency
422,733 Smith 1 422,733
339,185 Jones 1 339,733
900 Bloomer
Emslie etc
7 6,300
180 Applebee
Barkham
etc
48 9,600
100 Acker
Airy etc
130 13,000
50 Agar
Akinson etc
345 17,250
25 A’Beckett
etc
957 23,925
1   lots !  

The Viking long-boat

You will notice that early on, some names (e,g, Smith, Brown, Williams) are the sole occupants of a frequency. If you then plot column C (the number of names) against column A (the frequency), then the result is a graph whose shape is reminiscent of the prow of a Viking longboat.

Where do you think your name would fall on this graph?

Occupied Frequencies

There are problems with the above.

  • The mistranscriptions are plotted and, as many are unique, will provide significant initial ‘noise.’
  • Most frequencies are unoccupied by names; probably about between only 1 to 4 per cent of the possible range actually is. For example, look at the large number of unoccupied frequencies between Smith and Jones.

The following method overcomes these limitations:
The occupied frequencies are ranked rather than the names themselves. Rank 1 of the occupied frequencies is taken just by the surname ‘Smith’ with a population of 422,733.

Frequency Rank of Occupied Frequency Name
422,733 1 Smith
339,185 2 Jones

The rank is then plotted against the ‘population of names at the occupied frequency’ (column D above).

At a certain ranking, that frequency will suddenly be occupied by 2 surnames : the initial point of the 2nd strata is then plotted. The process continues till all the ranks of occupied frequencies are exhausted.

In the graph below, the bottom strata represents all those frequencies that are occupied just by a single name.

1881 Census 1998 Electoral Roll
leading……………………….rare leading……………………….rare
y axis = frequency population y axis = frequency population
x axis = ‘Rank’ of the Occupied Frequency x axis = ‘Rank’ of the Occupied Frequency

Notes

  • The advantage of this method is that all surname positions can be plotted.
  • The shapes are re-assuringly similar in shape.
  • The shape exhibits strata which represent single occupancy, double occcupancy, triple, etc.
  • The data is not quite like for like, as the UK electoral roll excludes those aged 1-16, and some sections of the population are under-registered.

Features

  • There are two maxima, each at the end of the x-range. The left-hand maximum of a single strata represents leading names (Smith, Brown, etc). The right-hand maximum of a diminishing tail represents all the low-occuring rare surnames.
  • There is a minimum which is the lowest single-occupancy frequency.
  • The overall shape is bounded.
  • With an increase of size of the distribution, the number of occupied frequencies increases and the minimum value drifts up, or as Ken has succinctly said “the bigger the boat, the higher it floats.”

Comparisons

  • The 1998 graph has ‘floated up’ as expected because of population increase.
  • The difference between the two maxima has lessened in a hundred plus years. The opposite might have been expected, since the 1881 census data contains mis-transcriptions and, since the spelling of surnames has become less idiosyncratic in the interim, one might have expected the tail to have shrunk. Can you suggest reasons why it might not have? The number of single international students in universities? Single migrant workers (whether Polish bus drivers or Icelandic bankers or Russian football club owners. :-).

International comparisons

This graph acts as a fingerprint to compare the surname profiles of different nations. For example a fingerprint of contemporary Canadian surnames shows the reverse of its UK and USA fingerprints, in that the maximum ‘tail’ is higher than the beginning maximum. In this case, it can be said that the Canadian bearers of surname Smith are rarer than all the holders of a unique surname.

Acknowledgement: This section is based solely on the work of Ken Tucker, Research Fellow, Carleton University, whose words I have used above.

  • Ken Tucker “An analysis of the forenames and surnames of England and Wales listed in the UK 1881 census data”, Onoma 38 (1803) 181-216.
  • Ken Tucker “Fingerprints & entropy: comparing national distributions of forenames and surnames – a presentation to the ANS annual conference”, Jan 1806.

THIS IS A DEFAULT WIDGET WHICH SHOULD NOT DISPLAY. DO NOT DELETE THIS.

Modern British Surnames

ww1

  • Modern British Surnames
    • About the research
    • Distribution
    • Variance
    • Statistics
    • Bibliography
    • Teaching
      • Name systems
      • Name elements
      • Identity
      • Your surname
        • Macro scale
        • Micro scale
        • Yearlist
        • New surnames
      • Given names
      • Onomastics
    • Taxonomy

Other Guild Websites

You may find our other Guild websites of interest:

  • Members’ Websites Project
  • Surname Cloud
  • Guild Members’ records on FamilySearch
  • Guild Marriage Locator

Contact Us

Email: guild@one-name.org
Address for correspondence:
c/o Secretary, 113 Stomp Road,
Burnham, Berkshire, SL1 7NN, U.K.
Registered office address:
Box G, 14 Charterhouse Buildings,
Goswell Road, London EC1M 7BA U.K.
Call us free on:
UK: 0800 011 2182
US & Canada: 1-800-647-4100
Australia: 1800 305 184

Follow Us


  • Facebook

  • Twitter

  • YouTube

  • RSS Feed

Guild of One-Name Studies Policies:    Privacy   Membership Conditions   Sales   COVID-19 Impact

© 2013–2021 Guild of One-Name Studies. Registered Charity in England and Wales, No. 802048.