Friday, September 6, 2024

DNA Painter - Creating your chromosome map

In Module 3 of 'Combining genetic and genealogical research' we discussed ways to work from your chromosome data to maximise your chance of finding your MRCA.  The purpose of undertaking this exercise is to consolidate your understanding of the principles and processes we discussed and to prepare for our next workshop.  By examining results at all DNA sites and working between them, you will increase your knowledge of specific segment areas which may lead to more successful outcomes.  

So far in the course, we have completed a number of exercises where we identified confirmed and triangulated segments in our genome. We may have identified the shared common ancestor for some of these, or at the very least identified a side or likely ancestral couple based on others sharing in each triangulated group.

This activity aims to help you visualise the segment analysis you have undertaken so far and map your known 'confirmed' and 'triangulated group' segments.  There are many approaches to creating chromosome maps, this is just one way to help you get started.

By now you should understand the theory behind two sides to every chromosome, including how to identify triangulated segments and groups. You should also have decided upon your method for retaining details of the DNA matches you have researched, the segments they match on and the results of your analysis, to avoid rework and improve your productivity.  Practice reviewing clusters of shared matches to identify triangulated segments and complete this exercise to start to build a chromosome map.


EXERCISE 
STEP 1:
  • Create an account at DNA Painter.  A free account allows you to develop one chromosome map.
  • Choose the best candidate to map for your current research.  Ideally, map yourself, or one of your parents;
  • Familiarise yourself with DNA Painter and concepts of chromosome mapping.  Suggested resources include the DNA Painter Blog and this 'free' introductory Webinar.






EXERCISE STEP 2:
  • Decide how you are going to map your segments, I favour mapping confirmed segments to the ancestor from whom I inherited the segment.  Others may choose to map to the parents - ie the ancestral couple.  It is a personal choice.  The instructions in this blogpost will be to map to to specific ancestor.
  • First we are going to identify all our close cousins, up to 3rd cousins, where we have a confirmed paper trail and their total shared segments are within the predicted ranges - AND their match is on a segment data site.
  • If you have not already done so, do a 'one on one' with each of them and add the results to your match and segment data lists.
  • I would recommend starting at second cousins as they only share one of your grandparent lines and are the most useful.  
  • Personally, I do not map first cousins because I already have my mother tested, but again this is personal choice.  Any segments I share with my first cousins came from our maternal grandparents and from a mapping perspective would just be mapped to my mother, providing limited value.  I also find they complicate the map.
  • However if you have no parents tested first cousins can be particularly useful to help you determine the sides for your matches and should be mapped to your respective parent.
  • First cousins once removed from the older generation may be different.  I do use my mothers first cousin on my map, because their segments when mapped are like a second cousin - only inherited from one of my grandparent lines.  
  • First cousins once removed from the younger generation provide the same information as first cousins and can only be mapped to my mother, so I don't include them for the same reason;
  • First cousins, 1C1R and children can be particularly useful for inferred mapping which is a complex technique that we will briefly touch on in Module 4.  I prefer to map these segments on a unique map for that purpose.
  • Follow the instructions on DNA Painter to map the segments of all your known cousins who meet the criteria in dot point 2;
  • You should end up with a map that looks something like this one, it may not look as populated, that will depend on how long you have been researching and the number of matches you have identified at chromosome sites.


Example segment map



EXERCISE STEP 3:
  • Make sure you have included all the segments you identified in the Module 1 exercise - Identifying the segments of a known ancestor - GEDmatch;
  • As these are all known matches, they all should be mapped to either the maternal or paternal sides and there should be limited ancestor overlap;
  • Review the map to ensure that any overlapping segments are coming from the same ancestor, or another further back on the line of the more distant ancestral couple.  
  • All overlapping matches should triangulate if you have mapped them correctly as they are inherited from the same distant ancestor.
  • Depending on how you labelled your matches, when you expand one of the chromosomes it might look the image below.   I label my matches to correspond with the MRCA couple and the ahnentafel number of the ancestor who gave me the segment, the match name remains constant.  In this case 007 is my maternal grandmother Mona Veronica Murphy, segments inherited from her are labelled in dark green.  Her parents are John Murphy and Rebecca Cassidy.  I share the MRCA of Murphy-Cassidy with my 1C1R MLAS, however because he triangulates with all my Cassidy-Sweeney cousins in the second segment area, that segment has been mapped as a Cassidy-Sweeney segment and coloured light green to reflect being inherited from my great grandmother Rebecca Cassidy, 

Expanded maternal segment



EXERCISE STEP 4:
  • Next we are going to map the 'Triangulated groups' we have already identified in each of the previous exercises.
  • You will need to have come up with a unique name for each of your groups.  In DNA Painter we will map each triangulated group as a 'New Match';
  • If we are clear where the segment is coming from, we could map it to the appropriate ancestor - eg. you share the group with a paternal second cousin, so you know the segment was inherited from your paternal grandfather - so we could map the segment to the paternal grandfather.  However this may not always be possible and you may need to allocate some groups to 'Both sides' until the MRCA is identified.
  • Review the output of GEDmatch exercise from Module 1 Identifying the triangulated segments of an unknown ancestor - GEDmatch and 'Paint all New Matches'.
  • Review the output of My Heritage exercises from Module 2  Identifying triangulated segments and groups - My Heritage (downloads)Identifying triangulated segments and groups - Pedigree Thief and the My Heritage Cluster report and 'Paint all New Matches'.
  • The example below shows how the segment map might look if you can't allocate sides.  Once you determine an MRCA you would then map the segment to the ancestor you inherited it from, ie paternal or maternal. 


Segments not yet allocated to a side


As you continue to identify triangulated segment groups add them to this map to keep track of where you are up to.  If you use GDAT the segment map will auto-populate, however DNA Painter maps are easier to share with others and there are many other ways you can utilise them.

Over time, as your map populates, it will help you identify possible ancestral lines for new matches and improve your analysis processes.   Refer to the Module 3 blogpost for other relevant material.

If you have a paid DNA Painter account you can create more than one map.  Now that downloads are available again from Family Tree DNA, why not download your matches and use the Bulk Import Feature at DNA Painter to look at all your matches on one map.


Veronica Williams

Originally published: 6th September 2024








Saturday, August 3, 2024

My Heritage - Modified exercise to identify triangulated groups using Pedigree Thief

In Module 2 of  'Organising your DNA data and determining match groups' we discussed ways to organise data and a process for working with My Heritage to identify match groups.  Before moving to Module 3, make sure you understand the theory of segment triangulation and have established a method for retaining records of your DNA research, to avoid rework and improve your productivity.

My Heritage is a DNA testing site that allows uploads from other testing companies. It's free to upload, but if you are not a My Heritage subscriber you will be required to pay a small 'one off' unlock fee to access DNA tools.  My Heritage has usually has two downloads that are useful for chromosome analysis, however these are currently DISABLED.

In the 2021 and 2023 programs we completed an exercise at My Heritage to consolidate your understanding of how to determine triangulated segments, identify potential match groups and whether a match is likely to be a false positive. It is not possible to do that for the 2024 program given the reports are unavailable.  

For the 2024 program, this modified 'Module 2: Exercise 2' seeks to use My Heritage data to continue to practicing analysing your matches manually to ensure you appreciate the underlying theory of shared matches vs triangulated segment matches to identify potential match groups.  Remember, whilst shared matches are 'CLUES' of a shared ancestor, only shared and triangulated segments are 'EVIDENCE' of a common ancestor.  Unfortunately due to the downloads being disabled we cannot extend the exercise to assess whether other matches are false positives, however 'Module 2: Exercise 1' using GEDmatch does attempt to cover this.

Due to download data not being available from My Heritage, we are going to use 'Pedigree Thief' a tool made available by loading a chrome extension, to help speed up some of the manual data collection.




My Heritage also has a clustering tool which can assist you to determine where to start your research, but this is best utilised once you understand more about triangulation.  The clusters are an indication of matches in common, however understanding segment triangulation and how triangulated groups are formed will assist you to analyse the clusters more fully, helping to identify the key matches in the cluster more likely to share a common ancestor.

The following activities are suggested to help you apply 'Module 2' in practice.  If you have access to old downloaded data from My Heritage I recommend you complete the original exercise after completing the activities outlined in this blogpost.



Setting Up Pedigree Thief

First, you will need to download Pedigree Thief from the Chrome Web Store and update your browser.  Once loaded you should see the icon symbol above on your browser adjacent to the search bar. Before using it for the first time, right-click the icon symbol and select 'Help' from the menu. This will give guidance to setting up and using the extension. For further help join the Pedigree Thief Facebook Group.

Pedigree Thief also extracts pedigrees (as the name implies) and is an extremely useful product for that purpose. We will discuss that more in Module 3.



Part A:  Using the Broad Approach by Total cMs - My Heritage www.myheritage.com

The 'DNA matches list' which can normally be downloaded from My Heritage for each DNA kit is currently disabled.  This makes it difficult to analyse your match list by chromosome.  As a result, we are going to use Pedigree Thief to download the data that we need; it will also assist in reducing the amount of typing you have to do.  We are not going to be able to extract everything required, nor is it in the perfect format.  Hopefully, downloads will return soon and this is just a short term 'semi' solution.
  1. Login to My Heritage site and navigate to your DNA Matches list.
  2. First, review at least your top 5 matches at My Heritage that are not people you have tested or close family.  We want to ensure we know where our highest matches fit in, make sure to examine all those >100cMs.
  3. Next we are going to extract data using Pedigree Thief to start creating our master match list.  Unfortunately until the download reports return, much of our data will need to be downloaded one match at a time and slowly add to our list.
  4. Go to the My Heritage DNA Matches Page for your kit.  If the top 5 matches you want to examine are not all in the first 10, go to the bottom of the page and expand the view to 25.
  5. The first time you access Pedigree Thief by clicking on the icon in your browser the screen shown below will appear.  Don't worry if you don't see all these fields, we just need the first section at the moment.  
  6. Leave the number of pages to read at 1, then click on 'Read Matches'.  Pedigree Thief will read the matches on the page - either 10 or 25 depending on your selected view. 
  7. Once Pedigree Thief finishes its review (screen stops changing), click again on the Pedigree Thief icon and the screen will re-appear, usually with additional fields.
  8. Click the options to save, we want 'All Matches' and 'All Segments'.  The two files should be produced and saved to your download folder.  We are only working with the 'MyHeritage Match ....'  file for this part of the exercise.
  9. Retain the 'Shared DNA file'  for later use.



Updating your sheet
  1. Go to your Downloads folder and find the 'MyHeritage Match ....' file.
  2. Duplicate the 'MyHeritage Match ....' file and rename it 'My Heritage Match List, Total Shared cMs - Kit_IDXXXX'
  3. Add 3 new columns to your spreadsheet - Side, MRCA and Notes.
  4. Your My Heritage Match List, Total Shared cMs - Kit_IDXXXX' spreadsheet should have populated with a number of matches.
  5. Review the matches, do you know if they are maternal or paternal (or both - ie share both sides, close relations)?  Don't worry if you are not able to allocate sides at this stage.
  6. Think about the likely relationship, how many generations back in your tree might you expect to find the MRCA? (In Module 1 we discussed the Shared cMs tool to predict relationships?  You might like to compare those estimates to those in the My Heritage prediction column).
  7. Notate your spreadsheet with the known or likely MRCA couple if you can, this is not essential.
  8. Over time, the 'My Heritage Match List, Total Shared cMs - Kit_IDXXXX' spreadsheet will become your master list as you add data to the spreadsheet. Add any other matches you examine in this exercise to your list, to avoid future re-work.
You should end up with a sheet that looks something like the one below.




Part B:  Using Chromosome Analysis (Targeted Approach) - My Heritage www.myheritage.com

In Part B we want to examine a 'match of interest' who only shares on one segment (to make it easier to interrogate) so Joan's match #6 with 'Ian' would be suitable.  If your first 'match of interest' shares more than 1 segment, you will need to drill down further in the list to find another potentially suitable match to examine for this exercise.  

Matches sharing more than one segment will have mix of ICW (in common with) matches who may all share on different segments.  By choosing a match who only shares one segment - all 'triangulated' ICW matches should share on the same segment.  This will make the exercise more manageable.

For this exercise we are going to utilise the 'Shared DNA file .csv' you have already downloaded, to create a master 'Shared DNA' spreadsheet.


Step 1.  Prepare the spreadsheet:
  1. Open the first 'Shared DNA file .....csv' you downloaded via Pedigree Thief and duplicate it.
  2. Next, save it as a separate sheet and give it a working title, ie 'My Heritage Shared DNA file - Kit_IDXXXX'.
  3. Add 4 new columns to your spreadsheet - Side, TG#, MRCA and Notes.
  4. You may wish to hide the 2 Match ID columns and the 2 RSID columns to make the sheet more manageable.
  5. Sort the spreadsheet by chromosome, then start and end locations.
You should end up with a sheet that looks something like the one below.



NOTE:   You may prefer to use the My Heritage system for recording your overall notes and not wish to retain a separate master 'match' list, but the shared DNA file is essential for chromosome analysis.   It comes down to personal preference, over time you will work out what works best for you.


Step 2.  Extract data about our 'match of interest' from Pedigree Thief

  1. Login to My Heritage site and navigate to your DNA Matches list.  Search for the DNA match of interest and click on the purple 'Review Match List' button to open up the relatives page.
  2. Click on the Pedigree Thief icon in your browser toolbar as we did in Part A and read the match.  In this case there are 123 shared matches and in the first read, Pedigree Thief limited the search to 100 matches. 
  3. Next, you will need to click on the 'Read Match' radio button again for Pedigree Thief to analyse all the shared matches.  If there are many more shared matches you need to repeat this step until all matches are read.
  4. When the read is complete and all shared matches analysed, click the 4 radio buttons to produce files that will save to your downloads folder as we did in Part A.  Note in the completed read image below, of the 123 shared matches only 18 triangulations were identified.
  5. Add the information included in 'MyHeritage Match ....'  file  to your My Heritage Match List, Total Shared cMs - Kit_IDXXXX' file, if it is not already there.
  6. Add the information included in 'MyHeritage Shared DNA ....'  file  to your My Heritage Shared DNA file - Kit_IDXXXX'if it is not already there.
  7. Retain the 'ICW' and 'Triangulations' files for later use.

Initial Read

Completed Read



Notes about the Pedigree Thief files
  • The 'MyHeritage ICW ....'  file produces a list of all shared matches with your match but only includes the total cMs shared.  You may wish to consolidate these reports into another master file similar to the ones we created earlier, but this is optional.  Not all of these matches will share a common ancestor with you, but can be 'clues'.  For our purposes we are more interested in 'shared segments' and 'trianguation' data as 'evidence' of a shared common ancestor. 
  • The 'MyHeritage Triangulation ....'  file produces a list of all triangulations it encountered in the read of the match and details the match ID's and segments.  It is not easily imported into our My Heritage Shared DNA file - Kit_IDXXXX' but is a useful reference tool. Again, you may wish to consolidate these reports into another master file similar to the ones we created earlier, but this is optional. 

Both these files are designed to be imported into GDAT (Genealogical DNA Analysis Tool).  However, match and shared DNA data for GDAT relies on the My Heritage download files that are currently disabled.  We are in a very difficult position at present.  My suggestion is to upload the individual segment data to your My Heritage Shared DNA file - Kit_IDXXXX' using the process outlined in Part A.


Step 3.  Review triangulated segments for our match of interest on the My Heritage site

  1. Login to My Heritage site and navigate to your DNA Matches list.  Search for the DNA match of interest and click on the purple 'Review Match List' button to open up the relatives page.
  2. On your match's DNA matches page, the number of shared matches should be the same as the numbers shown in the Pedigree Thief 'Read'.  In my case, the match of interest 'Ian' has 123 shared matches listed on the My Heritage page.
  3. We are now going to examine the 'Triangulated segments' at My Heritage for our match of interest.  Work through the match list looking for the Triangulated Segment Symbol, shown in the orange box below.  Identify each match with the 'Triangulated symbol' - because our match of interest 'Ian' only shares on one segment, all the triangulated matches should also share on the same segment.
  4. When you get to the bottom of the page, click on the purple 'Show More Matches' box to examine more matches, work in this way until the last page.

Concurrently work with your Pedigree Thief data....
  1. As you work through the match list at My Heritage, refer to the The 'MyHeritage Triangulation ....'  file we produced earlier that you put aside.  You might like to ensure that every match with the symbol on My Heritage is also on this list, this is optional.  
  2. You should be able to see that all matches are on the same chromosome, in Ian's case - Chromosome 7.  You may wish to add a column for the name of each match which is not included in this report for reasons outlined earlier.  The Relative ID in column 2  of the 'MyHeritage Triangulation ....' file can be searched on  (Control/Command F - using the Relative ID) in the 'MyHeritage ICW ....'  file - it appears in Column 3 and the name of the match is in Column 4.  You can also view the Relative ID when viewing the match at My Heritage in the URL. The Relative ID for the compared match is where the second 'D-' (plus the string of numbers and letters) appears in the address.
  3. Pedigree Thief identified 19 triangulated matches for Mum and Ian.  By doing a manual count at My Heritage I found that there were 18 triangulated matches with Mum and Ian, 19 in total including Ian.  As they all triangulate with Mum and Ian on the 'same segment location', they should form a 'triangulated group' and all share a 'common ancestor'.  
  4. By examining the 'MyHeritage Triangulation ....' file I can see that all matches match in the same segment area ranging from shared cMs of 17.4 down to 7.3.  You may wish to add your initial match's segment data to this file to be a complete record of matches in this potentially triangulated group.  

Keeping records...
  1. My Heritage - If you know which ancestor is likely to have handed down the segment on this chromosome, add each of the triangulated matches to your ancestor group 'Label', including Ian (your match of interest).  In my case, I can tell the segment belongs to the Murphy-Bateman group, by adding the Murphy-Bateman label the each triangulated segment match shows a purple dot.
  2. Triangulated Group Data - Decide on a name for the triangulated group you hope to identify for the 'match of interest' from this exercise - it could be TG001-Side A (or whatever name you choose - it can reflect maternal/paternal if you know that information - eg M_001 or something similar).  Rename your 'MyHeritage Triangulation ....' file with your TG # - an example could be 'My Heritage Triangulation  TG1_C07_Maternal' file.
  3. You may wish to update both the notes section at My Heritage and your 'My Heritage Triangulation  TG1_C07_Maternal' file with the details of the probable MRCA group/TG name.

Reviewing the Triangulation...
  1. Return to your 'MyHeritage Triangulation ....'  renamed  'My Heritage Triangulation  TG1_C07_Maternal' file.  To determine the shared triangulated segment area, we look at the highest start location, then the lowest end location.   
  2. Sort the file by start and end locations.  You can see in this example the highest start location is 5.5 but the lowest end location is 5.3.  This indicates that there are likely to be two separate triangulated groups over the length of the match with Ian - he shares between 0 - 9.1.  The full length of the segment area could be coming from just one ancestor, but we must keep in mind for larger segments that the segment could be the mix of an ancestral couple, eg a 2MGGF and 2MGGM.  In this case I would monitor the full segment area between 0-9 as one group until further evidence emerges.
  3. Looking at your 'MyHeritage Triangulation ....'  renamed 'My Heritage Triangulation  TG1_C07_Maternal' file, if there are a very large number of triangulated matches all appearing to match on very similar start and end locations, double check that these are not in known false positive regions, or pile up areas.  The chromosome map at DNA Painter is probably the easiest place to check this.  If it is in a known false positive region, use a different match for the exercise, otherwise it may be too time consuming for what we are trying to demonstrate.  Make a note in the notes column.
  4. Your sheet should now look something like the one below.  
  5. For completeness, run reports with Pedigree Thief for each of the matches appearing in the triangulated group and save the first two reports 'save match data' and 'save chromosome data'.  Add the data contained in these files to your two master sheets we created previously if it is not already there - 'My Heritage Match List, Total Shared cMs - Kit_IDXXXX'  spreadsheet (Part A) and 'My Heritage Shared DNA file - Kit_IDXXXX' spreadsheet (Part B, Step 1).  Add the new details of the TG# number and possible MRCA (if known) to the master sheets.

Key icons for working at My Heritage




Notes on Labelling at My Heritage

My preference for working at My Heritage is to only add Labels on 'triangulated segments'.  It can be dangerous to mark matches as belonging to groups on the basis of a shared matches alone.  My Heritage reports segments down to 6cMs many of these can be false or belong to a different ancestor than other larger segments, ie you may share both a recent and a distant ancestor.  In addition, My Heritage uses imputation which can add segments to your match compared to other companies.  Both these situations can squew results if incorrectly used as the basis for labelling.

To be sure, only label triangulated segments.  Eventually other 'genetic cousins' for that line will all be labelled when analysed via their own DNA match page for the triangulated segments they share.


Part 4.  Review the other side of the chromosome 

Normally, we would examine matches in the same segment area who don't triangulate with our match of interest, to determine if they belong to a 'Triangulated group (TG)' on the other side of the chromosome.  If your match of interest is maternal, those in TG's the other side of the chromosome, in the same segment location, who don't match the maternal group, must be paternal.  Those in the same segment location who do not match either the maternal or paternal TG's, are likely false positive matches and should be discarded. 

As full segment match list downloads have been TEMPORARILY DISABLED by My Heritage (since September 2023!) we cannot complete this step at this stage.  We need to rely on amassing segment data in our master list to eventually be able to achieve this.

Part 5. Is the whole group triangulated?

You can now go back to the 'triangulated match tool' on the My Heritage site and check to see if all matches in the group you have identified triangulate with each other.  To do this:

  • Navigate to the DNA match page with your shared 'match of interest'.
  • Click on the first TG icon symbol - this will open up the Chromosome Browser Triangulation Tool;
  • 'Add and Remove' matches using the radio button on the top right of the page.
  • Only a maximum of 7 matches can be viewed at one time.

If the first segment was quite long, you may find the triangulations are in sub groups, like in the diagram below.  You will need to play around with your match comparisons to identify the subgroups, but eventually you might see this sort of pattern.  (NOTE:  This example is on Chromosome 10, a different  group to the one shared with Ian).



This is demonstrating that whilst everyone is triangulating with 'Match A' (red), they are coming into the group at different levels.  You and 'Match A' share the longest segment, so 'Match A' is probably a closer relation to you and your shared MRCA couple.  The other matches may reflect matches to the same MRCA couple or perhaps a segment belonging to an older ancestor from one of the ancestors in that MRCA ancestral couple.  

If you can identify the MRCA couple for Match A, others in each of sub groups could be segments coming from any of the 4 parents of that couple depending upon whether there was a recombination event in the segment area for Match A.  

For example if Match A (red) was a 2nd cousin, they share your great grandparents.  The subgroups could belong to either of those, or potentially different 2nd great grandparents depending on where recombination events occurred.  Whilst I would initially call this one triangulated group for the entire length of Match A, if you identify the subgroups as belonging to more distant ancestors, you may wish to rename your subgroups as separate TG's.

Read more about reviewing your DNA matches at My Heritage.


Next Steps - the quick way!

After you have mastered identifying your triangulated groups, access the 'My Heritage Auto Clusters' tool under the DNA tools menu.  Click 'Explore' the "Generate' after selecting the kit of interest. Your report will be emailed to you.  When reviewing the report remember that these are shared match clusters and will need to be examined carefully to identify triangulated groups.  
  • Check to see if your 'match of interest' or anyone else who appears in your triangulated group appears in the cluster report?;
  • Who else is in the cluster?
  • Which matches in the cluster are triangulated and who is only a shared match?
  • Explore some new groups (optional);
  • Don't forget to add any additional analysis to your two master spreadsheet lists 'My Heritage Match List, Total Shared cMs - Kit_IDXXXX'  spreadsheet (Part A) and 'My Heritage Shared DNA file - Kit_IDXXXX' spreadsheet (Part B, Step 1) for future reference.

Unfortunately, the defaults for cluster reports are pre determined by My Heritage and cannot be manipulated.  If no one in your identified TG appears in the cluster report, pick an alternative cluster of interest and apply what you have learnt from the exercise by examining matches in that cluster.


Need more challenges?

Have you identified matches at My Heritage with shared common ancestors on your mystery line?  See if you can identify any triangulated groups on the segments shared with them using the techniques in this exercise.


Veronica Williams
First Published: 3 August 2024

GEDmatch - Exercise to identify triangulated groups and false positive matches.

In Module 2 of  'Organising your DNA data and determining match groups' we discussed ways to organise data and a process for working with My Heritage to identify match groups.  Before moving to Module 3, make sure you understand the theory of segment triangulation and have established a method for retaining records of your DNA research, to avoid rework and improve your productivity.

The purpose of undertaking these exercises is to consolidate your understanding of how to determine triangulated segments, identify potential match groups and whether a match is likely to be a false positive.

In the 2021 and 2023 programs we completed this exercise My Heritage.  Due to temporarily suspension of downloads at many of the DNA sites, the data cannot be sourced for the exercise for the 2024 group.

This modified exercise seeks to use GEDmatch data to practice analysing your matches manually to ensure you appreciate the underlying theory.  Remember, whilst shared matches are 'CLUES' of a shared ancestor, only shared and triangulated segments are 'EVIDENCE' of a common ancestor.



1.  Identifying segments of interest

For this exercise, we are continuing to use the GEDmatch data we collected on our known ancestor match in Module 1, see the previous blogpost.

In my example I identified 3 'segments of interest' with my 'known match', who shared the MCRA of my problematic 2GGP's James Murphy and Elizabeth Bateman.   In the previous exercise if you had other matches who shared segments going back to the target 'ancestral couple', you were advised to repeat the process and also add their shared segments to your spreadsheet.  In my case there were 5 identified matches, but only 4 were on GEDmatch.  After doing 'one on ones' at GEDmatch between the additional three matches and my mother I identified a total of 14 additional segments of interest.


The remaining 'known' match was on My Heritage and shared 3 different segments, however her data has been excluded for the moment.


2.  Identifying matches in the same specific segment locations

Step 1:  First we need to access Tier 1 Tools at GEDmatch, which requires a subscription.   We are going to use the 'Segment Search' report.  This report provides a list of all matches by 'chromosome' and shared 'segment location'.  Run the report using the existing defaults and download the .csv file to your computer.

  • Open the 'Segment Search report'.csv file you downloaded from GEDmatch.
  • Save it as a separate sheet and give it a working title, i.e. 'Chromosome Analysis - Kit ID GEDmatch'. We will be amending this version, but we also want to retain the original 'Segment Search report'.
  • Now we are going to find the areas we identified as 'segments of interest' in Part 1.
  • Choose one of your largest segments from the list in Step 1 and find the matching segment location area in the newly created 'Chromosome Analysis - Kit ID GEDmatch' spreadsheet;
  • Identify all matches in your list, that match anywhere within the shared segment area of the selected segment of interest.
  • Highlight all the matches within the identified segment area.


In the following example I am using the segment on Chromosome 4 that is around 50cMs.  This segment starts at 162.9 and ends at 191.1. 

Step  2:  We will now update the spreadsheet to be able to add additional information as we examine each of the matches throughout the analysis process.

  • In your newly created 'Chromosome Analysis - Kit ID GEDmatchsheet, the columns we will be using will be Match Name 1, Match Name 2, Chromosome, Start Location, End Location, Centimorgans.  
  • Delete or hide any unnecessary columns for manageability (optional).  
  • You may also wish to format the Centimorgans column to display the numbers showing one decimal point so that it is easier to read (ie 168 is the default number, re-format the column so it would appear as 168.0). 
  • Freeze the header row so that you can easily see and sort each column heading later.
  • In your newly created 'Chromosome Analysis - Kit ID GEDmatch' sheet, add columns for side (eg P, M, Both, I/F), TG (Triangulated Group), MRCA (Most Recent Common Ancestor) and notes
  • This spreadsheet will now become your 'master sheet' for GEDmatch data for the relevant Kit_ID.


3.  Identifying triangulated groups within the identified segment location

We are now going to review each match in the identified segment area to determine sides and to identify any triangulated segments, on either side of the chromosome.  

  • Firstly, review the data collected from Exercise 3 in Module 1 and transfer the information to the newly created 'Chromosome Analysis - Kit ID GEDmatchsheet.
  • For matches who 'triangulate' with your 'known' match - allocate them to the same side of the family.
  • Groups of matches who all triangulated with each other should be allocated a TG number - develop and maintain a system that works for you.  As an example, you could number them as M###, P###, and Side 1_###, Side 2_### if you have not yet identified a side.  In this exercise we know the side so it will be either M### or P###.
  • If you know the MRCA (Most Recent Common Ancestor) notate them in the MRCA column;
  • Add any notes you think may be useful in the future as you explore matches.
  • You may find other close matches from your family in this list, as I did.  In my case, for such a large segment area, there are only limited matches appearing which was surprising.  Each segment area will be different, for every kit.  You may have a few or many matches within the segment area.  If you have a large number, reduce the cMs limit to say >15cMs to make the list more manageable for the exercise.
  • Your sheet may now look something like this.




4.  Identifying triangulated groups on the opposing side and false segments

The last part of the exercise is to use the information we have found about our triangulated group to inform us about the 'same segment area' on the 'other side of the chromosome'.  We will next review all the remaining matches who have not been marked to see if we can find matches on the other side of the chromosome and identify potentially false segment matches.

  • Firstly check that the match DOES NOT share the segment area with matches identified as belonging to the same side of the 'known' match.  Be sure to check the segment locations carefully, particularly if you are examining a long segment like this one as there can potentially be multiple triangulated segment areas along the length of the chromosome.
  • Then run 'people who match 2 kits' and see if there are any shared and triangulated matches on the other side of the chromosome.  If so, in this case they would be marked as 'paternal' and allocated a PXXX TG number.
  • If there are still matches appearing in the list, that do not belong to either the 'maternal or 'paternal' triangulated groups, then they must be 'false positives' segments or 'IBS (identical by state)'.  Update your sheet accordingly.
  • Remember we can only mark a segment as false when there are triangulated groups in the specific opposing segment locations - on both sides of the chromosome.
  • If there is no TG identified on the opposing side, the default position would be to allocate all the remaining matches to the other side for your known match.  Once a TG is identified in this area, the matches should be re-examined and allocated to Maternal, Paternal or False.  Marking segments on both sides as you go, assists with productivity and reduces duplicated effort down the track.  
  • Being systematic and methodical will help you in the analysis process. It may seem slow and tedious to start but you will benefit over time.



Next steps:

  • The next step in the process is to interrogate each of the matches in the 'triangulated group' (TG) and to aim to push back the segment a further generation by finding a more distant MRCA.  Everyone in the segment 'triangulated group' will share a common ancestor.  The segment could go back many generations, or the match could fit in even closer to you than the identified MRCA, but still shares the more distant MRCA with others.  Our goal is to push back one generation at a time and 'walk back the segment'.
  • Even if you cannot identify more MRCAs at this stage, add notes about names and locations that might be useful for further analysis down the track.
  • To fully explore all the clues for your ancestor of interest, you should repeat this exercise for all the identified 'segments of interest' identified in Section 1.  Hopefully, more clues will emerge with common surnames or locations between groups.  
  • It is not necessary to do all these segments for the purposes of Module 2, just do enough so you are confident with determining TG's on both sides and knowing when it is appropriate to mark some segments as 'false positives'.
  • If you are feeling particularly keen to do more and you have other close family tested, repeat all the the steps for the 'known' match with each of them, starting with Exercise 3 in Module 1.  Make sure you consider where the tester sits in the tree and the implications on 'DNA inheritance' when deciding who to examine.  Consider such things as parent/child relationships, if the match is on the maternal side and the mother has tested, there would be no need to examine the child or other descendants as relevant DNA would have already been identified from their mother.  After doing this for my kits for the descendants of the Murphy-Bateman 'ancestral couple', my list of 'segment areas of interest' increased from 14 to 32!

Veronica Williams

First published: 3 August 2024





Wednesday, July 10, 2024

GEDmatch - Exercise to identify segments of a known ancestor and push back further generations

In Module 1 of  'Understanding DNA Basics' we discussed the key concepts and theories relating to DNA analysis for genealogy.  Before moving to Module 2, make sure you understand the theory of segment triangulation.  By now you should have completed the exercises to understand the difference between 'shared matches', 'shared segments' and 'triangulated segments'.



This post is aimed to prepare you for examining the 'shared segments' of a known match and to drill down to identify more distant connections and push the segment further back in your pedigree.

The goal for Module 1 is to examine all the 'shared matches' between you and your known match and to identify those who also share 'triangulated segments' with you and your known match.  That is, all three of you match each other on the same chromosome, in the same segment location.


EXERCISE STEP 1:

* First, identify a close match on the line that relates to the one you need to investigate to achieve your identified DNA goal.  We will call this match - Target 1;

* If you have a number of relevant kits, use the oldest person who is likely to have the most DNA inherited from the target ancestor;

* Do a 'one to one' (position only) between yourself (or the oldest ancestor on your identified line to the target ancestor) and Target 1.  The result should look something like this:




* In my case this match is a known 2C1R to my mother.  Their shared ancestors are James Murphy and Elizabeth Bateman.  Great grandparents to my Mum and 2nd great grandparents for me.  The paper trail is strong, but we have not yet identified any shared ancestors back another generation, so this couple can only be considered DNA confirmed to the 'ancestral couple'.

* By interrogating these segment matches we hope to be able to push back the segment to a more distant ancestor - ie. 'walk back the chromosome';

* Update all the data you collect from this exercise to the main spreadsheet.  If you have other matches who share segments going back to the target ancestral couple, repeat this process and add their segments to your spreadsheet.

EXERCISE STEP 2:

* Do a 'People who match both kits, or 1 of 2 kits' between yourself (or the oldest ancestor on your identified line to the target ancestor) and Target 1. 

* Work though the list as we did in the previous post, to identify matches who share on the same segment location for each chromosome.  In my example this would be Chromosomes 4, 16 and 19.


EXERCISE STEP 3:

* For those matches who triangulate with yourself (or the oldest ancestor on your identified line to the target ancestor) and Target 1 on each chromosome, they each form specific 'triangulated segments'.

* You may have a number of matches who appear to match yourself (or the oldest ancestor on your identified line to the target ancestor) and Target 1.  For these matches we now need to do 'one on ones' to ensure that every match matches everyone else who matches in the same 'segment area'.

* If everyone matches each other, they form what is called a 'triangulated group'.

* Identify the number of matches for each potential 'triangulated group'.  For me, I should have at least three groups one on each of the chromosomes 4, 16 and 18.  If your segments are long, there may be more than one triangulated group on the length of the segment location on the chromosome where you and Target 1 match.

REMEMBER:  We are doing these exercises manually, which is the long way, so that you understand the analysis process.  In later modules we will discuss quicker ways to get the same information.


EXERCISE STEP 4:

* Generally, the next step in the process is to interrogate each of the matches in the 'triangulated group' (TG) to find an MRCA and to push back the segment a further generation.  Everyone in the segment 'triangulated group' will share a common ancestor.  The segment could go back many generations, or the match could fit in even closer to you than the identified MRCA, but still shares the more distant MRCA with others. Our goal is to push back one generation at a time and 'walk back the segment'.

* For Module 1, there is no need to undertake more steps, or interrogate the group.   However, for your own research you may wish to pursue your TG to try and find others in the group who share a known MRCA.

* We will discussing triangulated groups in Modules 2 and 3.

* We will return to these matches later in the course.


Veronica Williams
First published 13 Jul 2024
Last updated 18 Jul 2024