Saturday, August 3, 2024

My Heritage - Modified exercise to identify triangulated groups using Pedigree Thief

In Module 2 of  'Organising your DNA data and determining match groups' we discussed ways to organise data and a process for working with My Heritage to identify match groups.  Before moving to Module 3, make sure you understand the theory of segment triangulation and have established a method for retaining records of your DNA research, to avoid rework and improve your productivity.

My Heritage is a DNA testing site that allows uploads from other testing companies. It's free to upload, but if you are not a My Heritage subscriber you will be required to pay a small 'one off' unlock fee to access DNA tools.  My Heritage has usually has two downloads that are useful for chromosome analysis, however these are currently DISABLED.

In the 2021 and 2023 programs we completed an exercise at My Heritage to consolidate your understanding of how to determine triangulated segments, identify potential match groups and whether a match is likely to be a false positive. It is not possible to do that for the 2024 program given the reports are unavailable.  

For the 2024 program, this modified 'Module 2: Exercise 2' seeks to use My Heritage data to continue to practicing analysing your matches manually to ensure you appreciate the underlying theory of shared matches vs triangulated segment matches to identify potential match groups.  Remember, whilst shared matches are 'CLUES' of a shared ancestor, only shared and triangulated segments are 'EVIDENCE' of a common ancestor.  Unfortunately due to the downloads being disabled we cannot extend the exercise to assess whether other matches are false positives, however 'Module 2: Exercise 1' using GEDmatch does attempt to cover this.

Due to download data not being available from My Heritage, we are going to use 'Pedigree Thief' a tool made available by loading a chrome extension, to help speed up some of the manual data collection.




My Heritage also has a clustering tool which can assist you to determine where to start your research, but this is best utilised once you understand more about triangulation.  The clusters are an indication of matches in common, however understanding segment triangulation and how triangulated groups are formed will assist you to analyse the clusters more fully, helping to identify the key matches in the cluster more likely to share a common ancestor.

The following activities are suggested to help you apply 'Module 2' in practice.  If you have access to old downloaded data from My Heritage I recommend you complete the original exercise after completing the activities outlined in this blogpost.



Setting Up Pedigree Thief

First, you will need to download Pedigree Thief from the Chrome Web Store and update your browser.  Once loaded you should see the icon symbol above on your browser adjacent to the search bar. Before using it for the first time, right-click the icon symbol and select 'Help' from the menu. This will give guidance to setting up and using the extension. For further help join the Pedigree Thief Facebook Group.

Pedigree Thief also extracts pedigrees (as the name implies) and is an extremely useful product for that purpose. We will discuss that more in Module 3.



Part A:  Using the Broad Approach by Total cMs - My Heritage www.myheritage.com

The 'DNA matches list' which can normally be downloaded from My Heritage for each DNA kit is currently disabled.  This makes it difficult to analyse your match list by chromosome.  As a result, we are going to use Pedigree Thief to download the data that we need; it will also assist in reducing the amount of typing you have to do.  We are not going to be able to extract everything required, nor is it in the perfect format.  Hopefully, downloads will return soon and this is just a short term 'semi' solution.
  1. Login to My Heritage site and navigate to your DNA Matches list.
  2. First, review at least your top 5 matches at My Heritage that are not people you have tested or close family.  We want to ensure we know where our highest matches fit in, make sure to examine all those >100cMs.
  3. Next we are going to extract data using Pedigree Thief to start creating our master match list.  Unfortunately until the download reports return, much of our data will need to be downloaded one match at a time and slowly add to our list.
  4. Go to the My Heritage DNA Matches Page for your kit.  If the top 5 matches you want to examine are not all in the first 10, go to the bottom of the page and expand the view to 25.
  5. The first time you access Pedigree Thief by clicking on the icon in your browser the screen shown below will appear.  Don't worry if you don't see all these fields, we just need the first section at the moment.  
  6. Leave the number of pages to read at 1, then click on 'Read Matches'.  Pedigree Thief will read the matches on the page - either 10 or 25 depending on your selected view. 
  7. Once Pedigree Thief finishes its review (screen stops changing), click again on the Pedigree Thief icon and the screen will re-appear, usually with additional fields.
  8. Click the options to save, we want 'All Matches' and 'All Segments'.  The two files should be produced and saved to your download folder.  We are only working with the 'MyHeritage Match ....'  file for this part of the exercise.
  9. Retain the 'Shared DNA file'  for later use.



Updating your sheet
  1. Go to your Downloads folder and find the 'MyHeritage Match ....' file.
  2. Duplicate the 'MyHeritage Match ....' file and rename it 'My Heritage Match List, Total Shared cMs - Kit_IDXXXX'
  3. Add 3 new columns to your spreadsheet - Side, MRCA and Notes.
  4. Your My Heritage Match List, Total Shared cMs - Kit_IDXXXX' spreadsheet should have populated with a number of matches.
  5. Review the matches, do you know if they are maternal or paternal (or both - ie share both sides, close relations)?  Don't worry if you are not able to allocate sides at this stage.
  6. Think about the likely relationship, how many generations back in your tree might you expect to find the MRCA? (In Module 1 we discussed the Shared cMs tool to predict relationships?  You might like to compare those estimates to those in the My Heritage prediction column).
  7. Notate your spreadsheet with the known or likely MRCA couple if you can, this is not essential.
  8. Over time, the 'My Heritage Match List, Total Shared cMs - Kit_IDXXXX' spreadsheet will become your master list as you add data to the spreadsheet. Add any other matches you examine in this exercise to your list, to avoid future re-work.
You should end up with a sheet that looks something like the one below.




Part B:  Using Chromosome Analysis (Targeted Approach) - My Heritage www.myheritage.com

In Part B we want to examine a 'match of interest' who only shares on one segment (to make it easier to interrogate) so Joan's match #6 with 'Ian' would be suitable.  If your first 'match of interest' shares more than 1 segment, you will need to drill down further in the list to find another potentially suitable match to examine for this exercise.  

Matches sharing more than one segment will have mix of ICW (in common with) matches who may all share on different segments.  By choosing a match who only shares one segment - all 'triangulated' ICW matches should share on the same segment.  This will make the exercise more manageable.

For this exercise we are going to utilise the 'Shared DNA file .csv' you have already downloaded, to create a master 'Shared DNA' spreadsheet.


Step 1.  Prepare the spreadsheet:
  1. Open the first 'Shared DNA file .....csv' you downloaded via Pedigree Thief and duplicate it.
  2. Next, save it as a separate sheet and give it a working title, ie 'My Heritage Shared DNA file - Kit_IDXXXX'.
  3. Add 4 new columns to your spreadsheet - Side, TG#, MRCA and Notes.
  4. You may wish to hide the 2 Match ID columns and the 2 RSID columns to make the sheet more manageable.
  5. Sort the spreadsheet by chromosome, then start and end locations.
You should end up with a sheet that looks something like the one below.



NOTE:   You may prefer to use the My Heritage system for recording your overall notes and not wish to retain a separate master 'match' list, but the shared DNA file is essential for chromosome analysis.   It comes down to personal preference, over time you will work out what works best for you.


Step 2.  Extract data about our 'match of interest' from Pedigree Thief

  1. Login to My Heritage site and navigate to your DNA Matches list.  Search for the DNA match of interest and click on the purple 'Review Match List' button to open up the relatives page.
  2. Click on the Pedigree Thief icon in your browser toolbar as we did in Part A and read the match.  In this case there are 123 shared matches and in the first read, Pedigree Thief limited the search to 100 matches. 
  3. Next, you will need to click on the 'Read Match' radio button again for Pedigree Thief to analyse all the shared matches.  If there are many more shared matches you need to repeat this step until all matches are read.
  4. When the read is complete and all shared matches analysed, click the 4 radio buttons to produce files that will save to your downloads folder as we did in Part A.  Note in the completed read image below, of the 123 shared matches only 18 triangulations were identified.
  5. Add the information included in 'MyHeritage Match ....'  file  to your My Heritage Match List, Total Shared cMs - Kit_IDXXXX' file, if it is not already there.
  6. Add the information included in 'MyHeritage Shared DNA ....'  file  to your My Heritage Shared DNA file - Kit_IDXXXX'if it is not already there.
  7. Retain the 'ICW' and 'Triangulations' files for later use.

Initial Read

Completed Read



Notes about the Pedigree Thief files
  • The 'MyHeritage ICW ....'  file produces a list of all shared matches with your match but only includes the total cMs shared.  You may wish to consolidate these reports into another master file similar to the ones we created earlier, but this is optional.  Not all of these matches will share a common ancestor with you, but can be 'clues'.  For our purposes we are more interested in 'shared segments' and 'trianguation' data as 'evidence' of a shared common ancestor. 
  • The 'MyHeritage Triangulation ....'  file produces a list of all triangulations it encountered in the read of the match and details the match ID's and segments.  It is not easily imported into our My Heritage Shared DNA file - Kit_IDXXXX' but is a useful reference tool. Again, you may wish to consolidate these reports into another master file similar to the ones we created earlier, but this is optional. 

Both these files are designed to be imported into GDAT (Genealogical DNA Analysis Tool).  However, match and shared DNA data for GDAT relies on the My Heritage download files that are currently disabled.  We are in a very difficult position at present.  My suggestion is to upload the individual segment data to your My Heritage Shared DNA file - Kit_IDXXXX' using the process outlined in Part A.


Step 3.  Review triangulated segments for our match of interest on the My Heritage site

  1. Login to My Heritage site and navigate to your DNA Matches list.  Search for the DNA match of interest and click on the purple 'Review Match List' button to open up the relatives page.
  2. On your match's DNA matches page, the number of shared matches should be the same as the numbers shown in the Pedigree Thief 'Read'.  In my case, the match of interest 'Ian' has 123 shared matches listed on the My Heritage page.
  3. We are now going to examine the 'Triangulated segments' at My Heritage for our match of interest.  Work through the match list looking for the Triangulated Segment Symbol, shown in the orange box below.  Identify each match with the 'Triangulated symbol' - because our match of interest 'Ian' only shares on one segment, all the triangulated matches should also share on the same segment.
  4. When you get to the bottom of the page, click on the purple 'Show More Matches' box to examine more matches, work in this way until the last page.

Concurrently work with your Pedigree Thief data....
  1. As you work through the match list at My Heritage, refer to the The 'MyHeritage Triangulation ....'  file we produced earlier that you put aside.  You might like to ensure that every match with the symbol on My Heritage is also on this list, this is optional.  
  2. You should be able to see that all matches are on the same chromosome, in Ian's case - Chromosome 7.  You may wish to add a column for the name of each match which is not included in this report for reasons outlined earlier.  The Relative ID in column 2  of the 'MyHeritage Triangulation ....' file can be searched on  (Control/Command F - using the Relative ID) in the 'MyHeritage ICW ....'  file - it appears in Column 3 and the name of the match is in Column 4.  You can also view the Relative ID when viewing the match at My Heritage in the URL. The Relative ID for the compared match is where the second 'D-' (plus the string of numbers and letters) appears in the address.
  3. Pedigree Thief identified 19 triangulated matches for Mum and Ian.  By doing a manual count at My Heritage I found that there were 18 triangulated matches with Mum and Ian, 19 in total including Ian.  As they all triangulate with Mum and Ian on the 'same segment location', they should form a 'triangulated group' and all share a 'common ancestor'.  
  4. By examining the 'MyHeritage Triangulation ....' file I can see that all matches match in the same segment area ranging from shared cMs of 17.4 down to 7.3.  You may wish to add your initial match's segment data to this file to be a complete record of matches in this potentially triangulated group.  

Keeping records...
  1. My Heritage - If you know which ancestor is likely to have handed down the segment on this chromosome, add each of the triangulated matches to your ancestor group 'Label', including Ian (your match of interest).  In my case, I can tell the segment belongs to the Murphy-Bateman group, by adding the Murphy-Bateman label the each triangulated segment match shows a purple dot.
  2. Triangulated Group Data - Decide on a name for the triangulated group you hope to identify for the 'match of interest' from this exercise - it could be TG001-Side A (or whatever name you choose - it can reflect maternal/paternal if you know that information - eg M_001 or something similar).  Rename your 'MyHeritage Triangulation ....' file with your TG # - an example could be 'My Heritage Triangulation  TG1_C07_Maternal' file.
  3. You may wish to update both the notes section at My Heritage and your 'My Heritage Triangulation  TG1_C07_Maternal' file with the details of the probable MRCA group/TG name.

Reviewing the Triangulation...
  1. Return to your 'MyHeritage Triangulation ....'  renamed  'My Heritage Triangulation  TG1_C07_Maternal' file.  To determine the shared triangulated segment area, we look at the highest start location, then the lowest end location.   
  2. Sort the file by start and end locations.  You can see in this example the highest start location is 5.5 but the lowest end location is 5.3.  This indicates that there are likely to be two separate triangulated groups over the length of the match with Ian - he shares between 0 - 9.1.  The full length of the segment area could be coming from just one ancestor, but we must keep in mind for larger segments that the segment could be the mix of an ancestral couple, eg a 2MGGF and 2MGGM.  In this case I would monitor the full segment area between 0-9 as one group until further evidence emerges.
  3. Looking at your 'MyHeritage Triangulation ....'  renamed 'My Heritage Triangulation  TG1_C07_Maternal' file, if there are a very large number of triangulated matches all appearing to match on very similar start and end locations, double check that these are not in known false positive regions, or pile up areas.  The chromosome map at DNA Painter is probably the easiest place to check this.  If it is in a known false positive region, use a different match for the exercise, otherwise it may be too time consuming for what we are trying to demonstrate.  Make a note in the notes column.
  4. Your sheet should now look something like the one below.  
  5. For completeness, run reports with Pedigree Thief for each of the matches appearing in the triangulated group and save the first two reports 'save match data' and 'save chromosome data'.  Add the data contained in these files to your two master sheets we created previously if it is not already there - 'My Heritage Match List, Total Shared cMs - Kit_IDXXXX'  spreadsheet (Part A) and 'My Heritage Shared DNA file - Kit_IDXXXX' spreadsheet (Part B, Step 1).  Add the new details of the TG# number and possible MRCA (if known) to the master sheets.

Key icons for working at My Heritage




Notes on Labelling at My Heritage

My preference for working at My Heritage is to only add Labels on 'triangulated segments'.  It can be dangerous to mark matches as belonging to groups on the basis of a shared matches alone.  My Heritage reports segments down to 6cMs many of these can be false or belong to a different ancestor than other larger segments, ie you may share both a recent and a distant ancestor.  In addition, My Heritage uses imputation which can add segments to your match compared to other companies.  Both these situations can squew results if incorrectly used as the basis for labelling.

To be sure, only label triangulated segments.  Eventually other 'genetic cousins' for that line will all be labelled when analysed via their own DNA match page for the triangulated segments they share.


Part 4.  Review the other side of the chromosome 

Normally, we would examine matches in the same segment area who don't triangulate with our match of interest, to determine if they belong to a 'Triangulated group (TG)' on the other side of the chromosome.  If your match of interest is maternal, those in TG's the other side of the chromosome, in the same segment location, who don't match the maternal group, must be paternal.  Those in the same segment location who do not match either the maternal or paternal TG's, are likely false positive matches and should be discarded. 

As full segment match list downloads have been TEMPORARILY DISABLED by My Heritage (since September 2023!) we cannot complete this step at this stage.  We need to rely on amassing segment data in our master list to eventually be able to achieve this.

Part 5. Is the whole group triangulated?

You can now go back to the 'triangulated match tool' on the My Heritage site and check to see if all matches in the group you have identified triangulate with each other.  To do this:

  • Navigate to the DNA match page with your shared 'match of interest'.
  • Click on the first TG icon symbol - this will open up the Chromosome Browser Triangulation Tool;
  • 'Add and Remove' matches using the radio button on the top right of the page.
  • Only a maximum of 7 matches can be viewed at one time.

If the first segment was quite long, you may find the triangulations are in sub groups, like in the diagram below.  You will need to play around with your match comparisons to identify the subgroups, but eventually you might see this sort of pattern.  (NOTE:  This example is on Chromosome 10, a different  group to the one shared with Ian).



This is demonstrating that whilst everyone is triangulating with 'Match A' (red), they are coming into the group at different levels.  You and 'Match A' share the longest segment, so 'Match A' is probably a closer relation to you and your shared MRCA couple.  The other matches may reflect matches to the same MRCA couple or perhaps a segment belonging to an older ancestor from one of the ancestors in that MRCA ancestral couple.  

If you can identify the MRCA couple for Match A, others in each of sub groups could be segments coming from any of the 4 parents of that couple depending upon whether there was a recombination event in the segment area for Match A.  

For example if Match A (red) was a 2nd cousin, they share your great grandparents.  The subgroups could belong to either of those, or potentially different 2nd great grandparents depending on where recombination events occurred.  Whilst I would initially call this one triangulated group for the entire length of Match A, if you identify the subgroups as belonging to more distant ancestors, you may wish to rename your subgroups as separate TG's.

Read more about reviewing your DNA matches at My Heritage.


Next Steps - the quick way!

After you have mastered identifying your triangulated groups, access the 'My Heritage Auto Clusters' tool under the DNA tools menu.  Click 'Explore' the "Generate' after selecting the kit of interest. Your report will be emailed to you.  When reviewing the report remember that these are shared match clusters and will need to be examined carefully to identify triangulated groups.  
  • Check to see if your 'match of interest' or anyone else who appears in your triangulated group appears in the cluster report?;
  • Who else is in the cluster?
  • Which matches in the cluster are triangulated and who is only a shared match?
  • Explore some new groups (optional);
  • Don't forget to add any additional analysis to your two master spreadsheet lists 'My Heritage Match List, Total Shared cMs - Kit_IDXXXX'  spreadsheet (Part A) and 'My Heritage Shared DNA file - Kit_IDXXXX' spreadsheet (Part B, Step 1) for future reference.

Unfortunately, the defaults for cluster reports are pre determined by My Heritage and cannot be manipulated.  If no one in your identified TG appears in the cluster report, pick an alternative cluster of interest and apply what you have learnt from the exercise by examining matches in that cluster.


Need more challenges?

Have you identified matches at My Heritage with shared common ancestors on your mystery line?  See if you can identify any triangulated groups on the segments shared with them using the techniques in this exercise.


Veronica Williams
First Published: 3 August 2024