Tuesday, October 19, 2021

All DNA Sites - Working from chromosome data to find your MRCA

In Module 3 of 'Combining genetic and genealogical research' we discussed ways to work from your chromosome data to maximise your chance of finding your MRCA.  The purpose of undertaking this exercise is to consolidate your understanding of the principles and processes we discussed.  By examining results at all DNA sites and working between them, you will increase your knowledge of specific segment areas which may lead to more successful outcomes.

In Module 2 we discussed all the DNA companies that provide chromosome data analysis tools including GEDmatch, FTDNA, MyHeritage and 23andMe, including how to download match and segment data and triangulation techniques.  As the suggested activity for Module 2 was focussed on the MyHeritage site, make sure you have also practised applying the same techniques at the other chromosome data sites where you have either tested or uploaded your DNA data.

Use your identified triangulated groups across all sites to solve the puzzle of how matches might fit into your pedigree. I recommended analysing everything you know about a match and trying to identify the likely ancestral line/group they belong to, before making contact.

Before moving to Module 4, make sure you understand the theory behind two sides to every chromosome, plus how to identify triangulated segments and groups. You should also have decided upon your method for retaining details of the DNA matches you have researched, the segments they match on and the results of your analysis, to avoid rework and improve your productivity.


Using triangulated groups to help solve the puzzle


Which match to work on for this exercise?

Perhaps there is a perplexing match on a DNA site that provides chromosome data and you don't know how they fit, or choose a known relative.  We will call this match - Match A.

  • It's best to choose a large match, ideally one who matches on a number of segments;  
  • Apply what you have already learnt in Module 2;
  • Examine their match at the testing site you found them, try to identify a side, check if they are in any broader triangulated groups with others on EACH of the different chromosomes/segments;
  • Run any cluster reports you can depending upon where you found the match - remember to check if the cluster report is showing 'shared matches' or 'shared segments';
  • Find any pedigree information (provided by the match or found by searching one of their ancestors) and use it to try and identify your MRCA;
  • Build research trees to expand the known genealogy of both your match and your own tree;
  • Update your preferred recording method with the results of your analysis;
  • It doesn't matter if you find the MRCA or not - it is always useful to go to the next step by looking at the implications of this result on all the other sites. 

NOTE:  Choosing a match you already know can help to focus on names and locations that you are familiar with and you may find it easier to identify how others in each of the TG's might connect.  If working with a known match, particularly if they are a close relative, remember others in the TG may be connecting further back on only ONE of your shared ancestors ancestral lines.


Combining Your match Data

* Download the matches and segment data for your kit from each of the chromosome data sites where you have your DNA uploaded - GEDmatch, My Heritage, FamilyTreeDNA and 23andMe (Module 2).

* Combine all reports into one new 'master' list, how you do it depends on your chosen method of storing and maintaining your data (Module 2).   

* If you are using the spreadsheet method, you might like to first add a column to the start of every report from each of the 4 companies and insert the testing company name for every match in the spreadsheet - before combining them into one list. 

 * Alternatively, you could make each companies report a separate colour, so when you combine them you can easily recognise the different sites.

* If preferred, it is possible to work with the separate spreadsheets, but it may become more time consuming to have to go back and forth between each one in the analysis process.

*  Sort the report by chromosome number, then segment start position (smallest to largest). It might look something like this:


You can also limit the size of your spreadsheet by setting a default cut off, ie 15cMs, but be mindful that some lower matches may have trees that could provide the answer you are looking for!

Sorting by name will help you see if people are duplicated across the sites.


NOTES:  

* If you have any AncestryDNA match lists, it can be useful to add these into your master list, to help identify common names, across platforms.

* If you are finding manipulating spreadsheets troublesome, uploading your matches into DNA Painter using their Bulk Import Tool may be a good alternative (under the settings icon in the chromosome map).  Whilst the matches can't be sorted into a chronological name list, the visual presentation suits many researchers and you can just work on one chromosome at a time.

* Once you have come to grips with the underlying theory of chromosome analysis I would recommend using the GDAT database (Genealogical DNA Analysis Tool).


Going to the next level with chromosome data

You will have already applied the Module 2 research techniques at the testing site where you found 'Match A'.

* Identify each of the segments you share with Match A - it should look something like this:


* Then for each chromosome go to your 'new master list' and find other matches who overlap in the same segment area.  NOTE:  It won't be exact, but find the overlapping segment area that includes the segment area where you match 'Match A', even if the area does not include the entire segment - just make sure the 'overlapping' segment area is at least 7cMs.

* Examine all the names and segment matches to see if there are any duplication in matches across sites - these matches are key to working across platforms where you cannot directly compare matches.  NOTE: I call these 'Bridge matches' as they are matches that will allow you to connect your groups across sites.  J Burke in the list below appears to be a 'Bridge match', appearing to be on FTDNA and My Heritage.


* Apply the same process we applied in Module 2 for each of the testing sites, identifying potential triangulated groups AT EACH testing site for the identical segment area.

* Can you determine the 'side' or MRCA for any matches that are on multiple sites?  NOTE: Whilst matches on different sites cannot be considered 'triangulated' DNA segments in the strict sense of the word because they can't be compared to each other, they can be used as 'clues' to help you work with the specific segment area.  This applies only if they are actually also 'triangulating' with other matches on the different DNA sites.

* If you can identify sides at one site and you can find a 'bridge match' you can use that match to start to hypothesise about the TG segment area.  NOTE:  Over time, as you work through your match list you should start to see patterns forming for start and end locations along each chromosome.

* Develop hypotheses for sides, or ancestral groups based on the combined knowledge you have gained from examining the segment area at all sites. 


Can AncestryDNA help?

* Whilst AncestryDNA doesn't provide segment data, it has the best collection of pedigree information available.  It is often disappointing that so many of their matches don't upload to other DNA sites where we can access the chromosome data.

* You should have already applied the broad approach to classifying your matches at AncestryDNA using the Leeds or Groupings processes (refer - AncestryDNA course).

* You should also have already approached your largest matches and any matches of specific interest to encourage them to upload to a chromosome site.  Not only will the knowledge of the specific chromosomes help your research, it provides additional information to potentially connect your Ancestry groups with your triangulated segment groups.

* When analysing your chromosome data be on the lookout for common names that might suggest your AncestryDNA match might also be on a chromosome site.

* At GEDmatch, look for kits starting with 'A', these are old AncestryDNA kits who are likely to also be on your matchlist there.  Sometimes, you won't be able to find them, this could be due to the Timber algorithm at AncestryDNA eliminating some segments so they don't appear as a valid match.  If you match only on the X or 23rd chromosome, they will not be reported as matches at AncestryDNA, but they will show up as matches at GEDmatch and 23andMe.  NOTE:  FTDNA does also provide X data, but will only report it as a match if your have match on chromosomes 1-22.

* If you have found a match at a chromosome site who is also in your AncestryDNA match list, go back and try to find them at AncestryDNA.  What 'group' have they been allocated to?  Consider how this might apply to your Triangulated Group if you have identified one.  NOTE:  Remember that the grouping technique at AncestryDNA is working on 'shared matches' - they are only 'clues' not 'evidence' and may lead you down the wrong path, particularly for lower cMs matches. Be careful about what conclusions you draw for matches under 30cMs.


Run clusters to identify groups

* In Module 2 we discussed a number of sites who provide cluster information which can help to identify groups of matches who are likely to be related to each other.  Run as many different types of clusters as you can to help identify matches you may have missed.  Make sure you distinguish between 'shared match' and 'shared segment' clusters when drawing conclusions.

* Are there more people in the 'shared segment' clusters than you have identified so far in your Triangulated Groups?  Examine them and see if they can provide more information.

* Sometimes clusters based on 'shared matches' may reveal additional triangulated groups.  Explore everyone in the cluster using the process we discussed in Module 2.

Check out my 2021 blogpost Clustering for Chromosome Analysis for more tips and information.


Do the genealogy....

* Examine all available pedigree information for your shared matches in each triangulated group at all the sites, looking for common names and locations;

*  Explore pedigree tools to help you automate the searching process within pedigrees;

*  Develop research trees, expand your matches trees to find the common ancestor;

* Whilst we started looking for the relationship between you and Match A, if none can be identified, find other relationships between matches in each of the triangulated and cluster groups.  If you can identify pairs within triangulated and cluster groups, it is likely your shared ancestor is somewhere back from the ancestral couple those matches share.


Rinse and Repeat....

* Hopefully by now you will have explored multiple TG's in the segment areas on chromosomes where you matched 'Match A'.  

* If 'Match A' was a known ancestor, let's say on the 'maternal' side, you now know that because chromosomes have two sides - any matches who don't match 'Match A' must be the opposing side so are 'paternal' matches (or false positives). 

* Go back and repeat the steps to investigate the matches and TG's on opposing side that you have not yet explored, once again comparing matches in the same segment area at all the sites.

* Don't forget to record your findings in a way that will be preserved.  This can help speed up the analysis process when yet get additional matches in these segment areas down the track.

* Continue to look at both sides of the chromosome concurrently, this helps you to be more productive in working through your genome and will save time in future analyses.


Remember:

* Every unknown match is an opportunity to find out more about your family;

* Don't ignore known matches - every known match is an opportunity to expand your pedigree; and

* Continually look for the patterns that help put the puzzle pieces together!


Refer to the Blogpost for Module 3 for links to relevant material associated with this exercise.


Veronica Williams

Originally published: 19 October 2021

Last updated:  12 July 2023