Wednesday, September 22, 2021

DNA Research Framework 2 - Organising your DNA data and determining match groups

 There are five parts to the DNA Research Framework:

  1. Understand DNA Basics
  2. Know what you are working with
  3. Combine genetic and genealogical research
  4. Use genetic research to prove and expand your pedigree
  5. Continuous review

Within the framework we apply a DNA research methodology to ensure we systematically and methodically review our results, to improve our productivity and success rates.

The following blog posts provide more detailed information about the DNA Research framework, applying the DNA Research Methodology and building your DNA analysis skills:

This post contains reference material relevant to Module 2
The ISOGG site also has a lot of useful material refer - ISOGG Beginners' guides to genetic genealogy.  You can find earlier material relevant to Module 1 here.


Know what you are working with - Total cMs, the broad approach
This module is designed to help you understand how to identify/manage/organise your DNA data so you know what you are working with. 

Recap on the key concepts from Module 1:
Before venturing into the labyrinth of chromosome analysis, make sure you have maximised your findings from your AncestryDNA (and results from other sites) and have attempted to identify all your closest matches up to 3rd cousins.  You should have also applied the grouping process (Leeds Method or similar) at AncestryDNA (and other sites) to identify likely match groups for each of your 16 x 2nd great grandparents.  

Ideally, you want to be fairly certain that your closer matches appear to support your pedigree out to your 2GGP's (as much as possible).  When working with your DNA matches you need to ensure your are working from a solid base and that the pedigree you have researched, is in fact your true genetic ancestral line too.  If you have no matches on some of these lines, and/or a group of unknown matches sharing large amounts of DNA with you, you may need to question whether your documented pedigree is accurate.  

Know what you are working with - Chromosome analysis
After working with your DNA results broadly for a time you will probably want to delve into analysing your results at the chromosome level.  To confirm your pedigree beyond 3rd cousins or where there is no documented paper trail  (with the exception of parent/child and sibling relationships) you need to undertake detailed chromosome analysis, using segment data.  

Download your segment data:-

Decide how you are going to organise your data and keep track of research undertaken.  These are the 3 main methods:
  1. DNA Painter.  The visual method suits many, but DNA Painter lacks the ability to manipulate data, draw upon historical notes, sort like groups in different ways etc.  Even so, the site has lots of fabulous features so get your account now!  This video outlines its features -  Introduction to DNA Painter.
  2. The spreadsheet method - Jim Bartlett's Blog of 2014  can give you some ideas about how to design your spreadsheet, he updated it in 2021.  If you are serious about working with results in the longer term, you will want to retain any work you do and not fall into the trap of constantly duplicating your research.  This necessitates capturing a lot of data as you go.  Beware - over time this spreadsheet will become extremely large and you need good technical skills to be able to manage it.  Jim has made further enhancements to his spreadsheet since the introduction of Pro Tools in 2024.
  3. In January 2022, Danielle Lautrec also published a useful blogpost about how she uses Excel in her DNA analysis.  
  4. Genealogical DNA Analysis Tool - This is a 'built for purpose' database, specifically for analysing your autosomal DNA results.  It keeps everything organised and all in one place.  Before starting to work with this tool you need to understand the underlying theory of chromosome analysis. 

Apply the research methodology to your analysis process:

Understand the difference uses of the term 'in common with'. Distinguish between shared matches and shared segments

Focus on triangulation to identify match groups who share a common ancestor and don't waste your time chasing likely false positive matches:

Once you have mastered the underlying theory of 'triangulated segments' and 'triangulated groups' you may wish to experiment with other tools that provide quick ways to identify matches who might be related to you in the same ancestor group.  Consider using clustering tools to help you quickly identify groups of interest to explore, but remember to keep in mind whether the clusters are based on 'shared matches' or 'shared segments'.

2024 Note:  Some of these systems are currently unavailable due to the restrictions on downloads, check each site for the latest information.


Suggested activities

A number of activities have been developed to help you apply 'Module 2' in practice.   

In the 2021 and 2023 series we had this exercise using the My Heritage site, however due to temporary disablement of downloads at My Heritage it cannot be completed at this time.  Once downloads return (or if you still have old reports), it would be advisable to re-visit this exercise.

For the 2024 program the activity has been modified.  A new exercise has been developed using GEDmatch data to practice identifying triangulated groups and false positive matches.  The My Heritage exercise has also been adapted and is now limited to identifying triangulated groups, however it introduces a different tool to help extract data, called Pedigree Thief.

GEDMatch - Exercise to identify triangulated groups and false positive matches
My Heritage - Modified exercise to identify triangulated groups using Pedigree Thief

If you are not very familiar with the My Heritage site this video may be a good introduction.  Alternatively, contact SAG Education for recent webinars presented at the Society of Australian Genealogists for both My Heritage and GEDmatch.


Advanced reading:
Jim Bartlett has written a four part series on the distribution of Triangulated Groups, which you might find of interest.  This follows his earlier discussion on Triangulating your genome using My Heritage, in December 2020.


2024 NOTE:  Unless you have old data available to you, the suspension of data downloads at My Heritage since late 2023 will impact on your ability to complete Jim's activity at this time.


Veronica Williams
Originally posted: 22 September 2021
Last updated: 2 August 2024