• Home
    • About the Guild
    • About one-name studies
    • Starting your ONS
    • Conducting ONS (videos)
    • Join Us
    • Guild Shop
  • Studies
    • Surnames A-Z
    • Recent Registrations
    • Registered Websites
    • Registered Societies
  • News
    • General News
  • Forums
    • Guild Facebook page
  • Events
    • Calendar
    • Conference & AGM
    • Seminar events
    • Webinar Events
  • Resources
    • Journal
    • Members’ Websites
    • DNA
    • Modern Surnames
    • Those Who Served
    • Newspaper Index
    • Guild Indexes
    • Pharos ONS Courses
    • Speakers
  • Help
    • Reset your password
    • Contact Us
  • Log In

Guild of One-Name Studies

One-name studies, Genealogy

Is your surname here?

    • 2,687 members
    • 2,391 studies
    • 8,463 surnames

Approaches for the more frequent names

Posted 3 March 2016 by Debbie Kennett

  • Page
  • Discussion
  • History

Revision for “Approaches for the more frequent names” created on 20 March 2018 @ 22:58:41

Title
Approaches for the more frequent names
Content
<p>For some people, the very thought of launching a study into one of the more frequent names, leaves them shaking their head. Not only is it the thought of gathering and managing seemingly infinite amounts of data, but it is the thought of reconstructing the total lineage. This page is aimed at supporting those people – helping them to decide whether or not to move forward and, if they do, then suggest approaches.</p> <p>The data collected in order to support a one-name study increases in quantity as the name is more frequent. While some infrequent names can be studied without the support of a computer, such a tool quickly becomes mandatory as the frequency of a name, and so the volume of data, increases. That said, the tasks one undertakes in studying more frequent names are the same as for the less frequent – computers just make the task feasible.</p> <p>With the sheer size of the task, the MOST important task is deciding the aim and approach for the study. The rest of the page is split between the main aspects of a study as described in the <a href="/wiki/guild-wiki/introduction-to-one-name-studies/seven-pillars-of-wisdom-the-art-of-one-name-studies/" target="_blank">The Seven Pillars of Wisdom: The Art of One-Name Studies</a>.</p> <ul> <li>Data collection</li> <li>Analysis</li> <li>Synthesis</li> <li>Publicising the study</li> <li>Communicating with other researchers and responding to enquiries</li> <li>Publication of results</li> <li>Preserving the study</li> </ul> <h3>Aim and approach</h3> <h4>Vision</h4> <p>The first step is to have a clear <em>vision</em> as to what the outcome of the study is expected to be. A vision is deliberately ''fuzzy'' and has no timetable. It is the thing you are aiming for in the dim and distant future and can be very useful when feeling swamped by the study and you can see no way of progressing. It is also useful to help bind disparate groups together.</p> <p><strong>Gray/Grey Study approach: </strong>The <em>vision</em> for the GRAY/GREY one-name study is that there is a body of knowledge defining the origins of the GRAY/GREY name, how the name spread across the globe, the family trees, where the names occur now and how it differs from other names.''</p> <h4>Focus</h4> <p>With any study you need to decide on your legacy to the one-name study that you have undertaken. One member felt that with her ability to read the old records she should place a high priority on their transcription and publication (cursive writing skills may be lost with this generation of children who start very early with computers). Particularly with a large study, this idea could help you focus when/if you feel that the study is becoming unmanageable.</p> <h4>Objectives</h4> <p>Once the study has a vision, then one needs to look what it is intended to do in the near future - the objectives. These should be time-limited and an approach is to have objectives for the coming year. As for the ''vision'', these often prove very useful in helping in times where the very scope of what is being undertaken is overwhelming. The term of a year can be decreased if appropriate (for example to the next three / six months), but it is suggested that longer periods are not useful.</p> <p><strong>Gray/Grey Study approach:</strong></p> <ul> <li>Bring together appropriate toolsets to support the study.</li> <li>Collect data: continue the UK data collection and start that for the USA.</li> <li>Synthesise families: continue the work on the families in Yorkshire (England).</li> <li>Integrate this study with DNA studies.</li> <li>Report on progress.</li> </ul> <p><em>Note</em>: these Gray/Grey objectives are not precise enough to drive the study and would need re-writing if, for example, they are used to co-ordinate efforts of various parties. A better example for the ''collect data'' would be:</p> <p><em>Collect data: By Jan 2014 to have completed the acquisition of UK Census data (1841-1901) from Ancestry.com and have defined the acquisition plan for the USA.</em></p> <h4>Teams / Generations</h4> <p>A study into the more frequently-found names is, by its nature, large. There are so many ways it can be resourced. For example, one can have an individual plodding away in their spare time, a large society of dozens (or hundreds) of people, or one or more (family) groups. It is likely that the study will take more than one generation of people. Having a clear set of standards will enable such groups and individuals to effectively communicate.</p> <p><strong>Gray/Grey Study approach</strong>: The GRAY study is a one-man band, but with informal links to a number of narrow family groups as well as a person looking in a geographically restricted area and time-span. There is no clear succession plan and it is the intention that all data will be held by the Guild should the registered member cease to be able to carry on (see <strong>Preserving the study</strong> below).</p> <p><strong>Chandler Study approach</strong>: The Chandler Family Association (CFA) has one Guild member, 630 dues-paying members in 11 countries, an international Executive Committee, a Genealogy Panel to deal with enquiries, a Lineages Database, a Y-DNA project and a number of Chapters. The DNA Project (400+ participants) identifies genetically-distinct Chandler families (82 so far, we expect about 150 eventually). We put genetic family members in touch with one another, internationally, and some become working groups to research their families further. We encourage them to form a Chapter of the CFA (at least one must be a CFA member) with a leader we call a Research Director whose job is to co-ordinate the group, keep the CFA informed, and comply with certain standards, mainly for data entry. This is our way of dealing with a large project - centralise what needs to be centralised, and de-centralise (but co-ordinate) everything else.</p> <h4>Standards</h4> <p>One way to tackle a large study is to develop standards for the whole study – for example, data recording standards covering things like personal names, place names, punctuation, date formats, etc. – and then allow each genetic family to set up its own working group, adhering to those standards. Question: how do you eat an elephant? Answer: cut it up into pieces.</p> <h4>Personal names</h4> <p>As with all things, one records what is found. However, tasks such as data analysis and synthesis often need a ''translation''. For example, one needs to show that ''Jno'' is ''John''. Then there are occasions where a person regularly uses a middle name, or even something completely different. In order to reconstruct the lineage, one needs to know that, for example, Jno Fdrck Smith was ''officially'' Frederick John Smith. How one records this translation needs to be clearly stated in the study standards so that the various teams, or generations, know how it was done and is to be done.</p> <h4>Date formats</h4> <p>This is often causes confusion. Not only is there the mm/dd v dd/mm (is 9/11 the 9th of November or the 11th September), but we have the 'what century is it' (is 9/11/11 1411, 1811 or 2011) and then when does the year start (January or March or ...). As one's study works in various parts of the world, one would find records using different calendars - e.g. Chinese, Hebrew, Hindu and Islamic. Over the ages, you will find others - e.g. the Julian calendar. You should:</p> <ul> <li>Record the date using the calendar used at that time and place, and clearly state which calendar in your records and reporting.</li> <li>Translate the date into the calendar being used for the study (e.g. Gregorian) - still keeping the original (e.g. in a ''note'')</li> <li>Use the same date format throughout all records and clearly state it (some people choose to spell out the Month - e.g. Sep 11, 2011 or 11Sep2011)</li> <li>Where there is the risk of ambiguity - be pedantic and spell it out</li> </ul> <p>Some studies mandate the use of the international standard – ISO8601: ''<em>The purpose of this standard is to provide an unambiguous and well-defined method of representing dates and times, so as to avoid misinterpretation of numeric representations of dates and times, particularly when data is transferred between countries with different conventions for writing numeric dates and times</em>". There are many options within this standard and the study may choose to mandate one or two before using. (It is worth a read anyway – even if long-winded in parts.)</p> <h4>Place names</h4> <p>As in the use of dates, place names are of particular concern - no matter how large the study. For example, one should be very clear as when the name was valid. An English county in 1880 is not necessarily the same as a county with the same name in 1980, and may no longer even exist. From the original 13 British colonies, the USA has evolved into 50 states and their names and extents have evolved - so do you record the place name as it was back in 1873, or as it is today? Hawaii wasn't part of the USA before 1959. One approach is to record both. You need to record the original place name because that places associated events in context and facilitates further research. However, the current name assists in many aspects of reporting, as well as indicating where similar documents can be found today. Where the family history program being used doesn't facilitate the two place names, then an approach is to record today's name in the formal ''place name'' field, while recording the name used in the source in a ''note'' field.</p> <h3>Data collection</h3> <h4>Scope</h4> <p>The significant volumes of data around the world can make<strong> Data collection</strong> the first daunting challenge. However, as in the light-hearted question – "How do you eat an Elephant?" – the answer is one bit at a time.</p> <p>The objectives of the study will be the guide. For example, the objective could be ''By Jan 2014 to have recreated the lineages in Tasmania''. In order to achieve this, one will have needed to acquire the necessary data. But one needs to be strict in only acquiring the data necessary to construct the lineage - the rest of the data pertaining to the name in Tasmania can wait. However, if the objective is "By January 2014 to have completed the acquisition of Tasmanian data up to December 1939", then one does that. However, words such as ''completed'' are difficult when used in the context of a one-name study since there is always more data to be found if one has the time, money and resources. One approach is to use conditional text – for example: "By January 2014 to have completed the acquisition of Tasmanian data from FamilySearch and Ancestry.com up to December 1939".</p> <h4>Tools</h4> <p>Two notes:</p> <ul> <li>"A computerised tool is only as good as the user.'' One needs to understand just how the task is to be done, one record at a time, before looking at the thousands of records to be collected</li> <li>''There is no one tool that does everything.'' As the study progresses, you will need a number of tools and trying to make a tool do something it was not designed to do may work, but is prone to mistakes and errors.</li> </ul> <p>This section looks at tools for <strong>Data extraction</strong> and at the back-end <strong>Database</strong>.</p> <h4>Data extraction</h4> <p>In many ways, <a href="/wiki/guild-wiki/collect/">Data collection</a> for a more frequent name is no different from that for a rarer name. People still go to records offices and delve into archives (e.g. hard-copy or microfilm), they still visit cemeteries and memorials and they still receive newspaper clippings from people. The methods used for this are no different. The main differences are when one is accessing the main on-line sources.</p> <p>Online sources have far more data. For example there are 3,036 entries for GRAY/GREY on the index of "Victoria Assisted &amp; Unassisted Passenger Lists 1839-1923" held on Ancestry.com. Yes, one can take each record at a time and transcribe it to a family history application, but this is slow and tedious. A faster method is to download the data items into an application like a spreadsheet or document processor where the data can be formatted for input into your database (see below). A significant number of studies download into a spreadsheet – for example the ones found in <a href="http://office.microsoft.com/en-gb/excel" target="_blank">Microsoft Office Excel</a>, or the free <a href="http://www.openoffice.org/product/calc.html" target="_blank">Apache Openoffice Calc</a>  There is a lot of information on the Guild <a href="/guild-mailing-list/" target="_blank">Guild Mailing List</a> and <a href="/forums/forum/webforum/" target="_blank">WebForum</a> on how to extract data into spreadsheets.</p> <h4>Database</h4> <p>This section starts with looking at the collection of all your information and then concentrates on alpha-numeric data.</p> <p>In the context of a one-name study, a database is the collection of all your data - no matter what format. So it includes the paper, the digital images, audio &amp; video recordings (SD cards, tapes) and the digital data. This data can be in a multitude of places from card-board boxes, lever-arch files, photograph albums, computers and media such as CD-ROM, "thumb drives" and in the "cloud". Management of such a database is non-trivial and becomes even more problematic when one considers that parts of it could be with different people around the world. Most, if not all, studies use a method of relying on the individual members of the study knowing what they have and where. The use of tools in managing these all-encompassing databases will be the subject of a future page on this Wiki.</p> <p>Databases holding alpha-numeric data are what are usually referred to when one talks about databases. For most studies, the database is hidden from view – being managed by the family gistory application being used and the user need know nothing about it. For example The Master Genealogist (TMG) uses a Visual FoxPro database, while The Next Generation of Genealogy Sitebuilding© (TNG) uses mySQL tables. However, as the volumes of data increase, one does need to be aware of the constraints. Maybe the application itself doesn't have a constraint, but the computer you are running it on may start to feel sluggish as the volumes increase. The best sources of information on these constraints are the support for the application being used.</p> <p>Where one is using a spreadsheet to hold your data, then there are also constraints that need to be taken into account. For example, Excel 2003 was limited to 65,536 rows in each worksheet while the 2010 version is limited to 1,048,576. That does sound a lot –  but considering that the GRAY/GREY name has 300,000 names in the UK census (1841-1901) alone –  and this is not the largest study – one does need to take the limit into consideration. Yes – you can put each census on a separate sheet, but then analysing the data as a whole becomes more difficult. The biggest constraint is then that of the computer. Analysing 300,000 records (e.g. checking place names against a gazetteer, or checking for valid data items) needs a "powerful" computer with "lots" of memory and "fast" disks. (The terms are vague, but a "core i3" laptop with 4GB of memory struggles with the more complex queries on 300,000 records in Excel 2010.)</p> <p>When a spreadsheet starts to be a hindrance (e.g. too slow) then one can consider a database such as <a href="http://office.microsoft.com/en-gb/access/" target="_blank">Microsoft Access</a> or the free <a href="http://www.openoffice.org/product/base.html" target="_blank">Apache OpenOffice Base</a>. These systems are designed to be far more efficient (e.g. faster) with the larger volumes of data but still have limits. While the limit of 2GB for an Access file sounds a lot, one can soon approach it and so the study would need to look at using multiple files or, as the GRAY/GREY study is doing, look at using the free SQLserver application as a back-end for Access.</p> <p>Another approach is to use <a href="http://www.custodian3.co.uk" target="_blank">Custodian</a> – an application developed specifically for one-name studies. The current version has Access as the underlying database but, unless one wishes to use the SQL queries, one needs no knowledge of databases. (I've not looked at the feasibility of using the Custodian data for data analysis or synthesis.)</p> <h3>Analysis</h3> <p>To be defined.</p> <h3>Synthesis</h3> <p>The area of synthesis is where the volumes of data associated with a more frequent name can be of real benefit as well as a challenge.</p> <p>For example, the statistical techniques in looking at such as emigration, longevity, infant mortality, number of pregnancies, illegitimacy rates and distance between places of birth and marriage could deliver more statistically significant results with greater volumes of data.</p> <p>(For a discussion on statistical significance - look at the Wikipedia article on <a href="http://en.wikipedia.org/wiki/Statistical_significance" target="_blank">statistical significance</a>.)</p> <h4>Lineage Reconstruction</h4> <p>The challenge is when one looks at lineage reconstruction. With the less frequent names, it is not uncommon to find the study has linked every individual into a tree. However, when one has hundreds of thousands of individuals, this becomes problematic. This is not a <strong>no go</strong> issue.</p> <p>One needs to be very clear that, when publicising the one-name study for the more frequent name, it is clear that an objective is not to place every one with the name into a "tree". That is not to say that the vision could not say it – but then a vision has no time limit and so could take many decades to achieve.</p> <p>Some types of surname, for example occupational names, have multiple points of origin, so bearers of that surname are not all related. DNA studies of such names will put participants into matching groups, which might be called genetically-distinct lines or genetic families: nuclear families are mother, father and their children; extended families are all the grandparents, aunts and uncles, cousins etc, most of whom probably know each other; genetic families consist of many extended families, who probably DON'T know each other, but who have a common ancestor further back than most of them will have traced. Lineage Reconstruction would not, of course, be attempted ACROSS genetic families.</p> <p><strong>Gray/Grey Study approach</strong>: The GRAY/GREY study reconstructs lineages of individual families as part of research resulting from enquiries and, while reconstruction is an aim – it is unlikely to be reached for a long, long time.</p> <p>Work is being undertaken to research the feasibility of <a href="/wiki/guild-wiki/analyse/automated-lineage-reconstruction/" target="_blank">Automated lineage reconstruction</a>, but it is at a very early stage.</p> <h3>Publicising the study</h3> <p>There is no difference in publicising a study whether one is carrying out one on a less frequent name or one with a high frequency. However, one needs to be very clear as to what it is you are currently doing. For example, if you are starting your study with data from Tasmania, then you need to ensure you don't imply that your data is world-wide. Giving the wrong impression could leave you with dis-satisfied people and a poor reputation.</p> <p>An example from an advertisement in a magazine, or website, could be "While this study aims to gather data from around the world in due course, it currently concentrates on Tasmania and aims to move to the rest of Australasia in 2013 before looking at North America ...''</p> <h3>Responding to enquiries</h3> <p>While the higher the frequency of the name does mean that more people are interested in it, the volume of enquiries is not necessarily large. For example, the GRAY/GREY study receives in the order of one query a month. The CHANDLER study, on the other hand, receives two per week.</p> <p>As with all studies, one should respond to all enquiries in a timely manner. However as the study starts, most enquiries will probably require access to data which has yet to be addressed. In such cases, it is very important that one responds in a positive manner which makes it clear that the aim of the study is to look at the name world-wide and that you will address their query as soon as you can.</p> <p><strong>Gray/Grey study approach</strong>: Personally, on receiving the request I go online to see what I can find and report those findings as part of my response. The benefits are that you have hopefully created a good impression of the study and may set up communication with a helpful source. In this way I have gathered little islands of data around the world – each being able to be a seed for when I'm able to address that geography.</p> <h3>Publication of results</h3> <p>Each study differs in what the aims are, and so how much work there is to report on. This does not necessarily depend on the frequency of the name, more on the amount of time is spent on the study. What would change with the size of the study is if one chose to publish the raw data collected, or any data arising from the analysis. While data from the smaller studies could be published in book form - for example as an appendix to a report – when one is talking about many mega-bytes of data, or hundreds of images, then one needs a different approach.</p> <p>Where the volumes of data allow, then one could include the report and the data on a DVD. This would allow publication by sending out the DVD. However, the larger studies can easily produce data too large to fit on a DVD and one needs to look at such as on-line publication.</p> <p>Online publication – text / audio/ video / images / data –  to be defined.</p> <h3>Preserving the study</h3> <p>The main issue in preserving the larger study is the sheer volume of information –  be it audio or video recordings, photographs, data, reports, etc. That said, the techniques used are the same as for any other size of study. (See <a href="/wiki/guild-wiki/preserve/how-to-safeguard-and-preserve-your-study/" target="_blank">How to safeguard and preserve your study</a>)</p> <h3>Further reading</h3> <p><a href="/members/comps/2010/1298_FISHER_Analysis.pdf" target="_blank">The Fisher Surname Study: a model for the family reconstruction of high-frequency surnames?</a> by former Guild member John Fisher</p> <h3>High-frequency names</h3> <h4>List of high-frequency Guild-registered one-name studies</h4> <p>The following gives an example of the names being studied by Guild members - together with their count and rank in the 1990 census of the USA and, in brackets, the rank in the UK 2002 census:</p> <ul> <li>Martin (678,951 – 16th (28th))</li> <li><a href="/profiles/phillips.html" target="_blank">Phillips</a> (370,563 – 45th (42nd))</li> <li>Hall (497,400 –  26th (20th))</li> <li>Allen (494,913 – 27th (40th))</li> <li>Nelson (402,894 – 39th (241st)</li> <li>Phillips (370,563 – 45th (42nd))</li> <li>Bell (290,979 – 58th (60th))</li> <li><a href="/profiles/gray.html" target="_blank">Gray</a> (263,622 –  69th (82nd))</li> <li>Watson (256,161 –  72nd (44th))</li> <li>Brooks (256,161 –  73rd (118th))</li> <li><a href="http://chandlerfamilyassociation.org/" target="_blank">Chandler</a> (84,558 –  322nd (&gt;250th))</li> </ul> <p>USA Source: <a href="http://names.mongabay.com/most_common_surnames.htm" target="_blank">Mongabay</a> <br /> UK Source: <a href="http://www.taliesin-arlein.net/names/search.php" target="_blank">Taliesin-Arlein</a> </p> <h4>High-frequency names in Europe</h4> <ul> <li><a href="https://en.wikipedia.org/wiki/List_of_the_most_common_surnames_in_Europe" target="_blank">Wikipedia list of the most common surnames in Europe</a></li> <li><a href="http://i.imgur.com/Gtc4EKo.png" target="_blank">Map created by Reddit user Teepr</a></li> </ul> <h4>High-frequency names in USA</h4> <ul> <li><a href="https://www.census.gov/topics/population/genealogy/data/2000_surnames.html" target="_blank">Data from the US Census Bureau</a></li> </ul>
Excerpt


OldNewDate CreatedAuthorActions
20 March 2018 @ 22:58:41 Karen Burnell
20 March 2018 @ 22:58:39 [Autosave] Karen Burnell
17 March 2018 @ 21:58:50 Kim Baldacchino
13 March 2016 @ 01:30:46 [Autosave] Debbie Kennett

THIS IS A DEFAULT WIDGET WHICH SHOULD NOT DISPLAY. DO NOT DELETE THIS.

Disclaimer

Please note that material in this Wiki is provided by Guild members and is not necessarily endorsed by the Guild or its Committee. The content is regularly updated, but the Guild makes no guarantees that the information provided is up to date or accurate.

Other Guild Websites

You may find our other Guild websites of interest:

  • Members’ Websites Project
  • Surname Cloud
  • Guild Members’ records on FamilySearch
  • Guild Marriage Locator

Contact Us

Email: guild@one-name.org
Address for correspondence:
c/o Secretary, 113 Stomp Road,
Burnham, Berkshire, SL1 7NN, U.K.
Registered office address:
Box G, 14 Charterhouse Buildings,
Goswell Road, London EC1M 7BA U.K.
Call us free on:
UK: 0800 011 2182
US & Canada: 1-800-647-4100
Australia: 1800 305 184

Follow Us


  • Facebook

  • Twitter

  • YouTube

  • RSS Feed

Guild of One-Name Studies Policies:    Privacy   Membership Conditions   Sales   COVID-19 Impact

© 2013–2021 Guild of One-Name Studies. Registered Charity in England and Wales, No. 802048.