In Module 2 of 'Organising your DNA data and determining match groups' we discussed ways to organise data and a process for working with My Heritage to identify match groups. Before moving to Module 3, make sure you understand the theory of segment triangulation and have established a method for retaining records of your DNA research, to avoid rework and improve your productivity.
My Heritage is a DNA testing site that allows uploads from other testing companies. It's free to upload, but if you are not a My Heritage subscriber you will be required to pay a small 'one off' unlock fee to access DNA tools. My Heritage has usually has two downloads that are useful for chromosome analysis, however these are currently DISABLED.
In the 2021 and 2023 programs we completed an exercise at My Heritage to consolidate your understanding of how to determine triangulated segments, identify potential match groups and whether a match is likely to be a false positive. It is not possible to do that for the 2024 program given the reports are unavailable.
For the 2024 program, this modified 'Module 2: Exercise 2' seeks to use My Heritage data to continue to practicing analysing your matches manually to ensure you appreciate the underlying theory of shared matches vs triangulated segment matches to identify potential match groups. Remember, whilst shared matches are 'CLUES' of a shared ancestor, only shared and triangulated segments are 'EVIDENCE' of a common ancestor. Unfortunately due to the downloads being disabled we cannot extend the exercise to assess whether other matches are false positives, however 'Module 2: Exercise 1' using GEDmatch does attempt to cover this.
Due to download data not being available from My Heritage, we are going to use 'Pedigree Thief' a tool made available by loading a chrome extension, to help speed up some of the manual data collection.
My Heritage also has a clustering tool which can assist you to determine where to start your research, but this is best utilised once you understand more about triangulation. The clusters are an indication of matches in common, however understanding segment triangulation and how triangulated groups are formed will assist you to analyse the clusters more fully, helping to identify the key matches in the cluster more likely to share a common ancestor.
The following activities are suggested to help you apply 'Module 2' in practice. If you have access to old downloaded data from My Heritage I recommend you complete the original exercise after completing the activities outlined in this blogpost.
Setting Up Pedigree Thief
First, you will need to download Pedigree Thief from the Chrome Web Store and update your browser. Once loaded you should see the icon symbol above on your browser adjacent to the search bar. Before using it for the first time, right-click the icon symbol and select 'Help' from the menu. This will give guidance to setting up and using the extension. For further help join the Pedigree Thief Facebook Group.
Pedigree Thief also extracts pedigrees (as the name implies) and is an extremely useful product for that purpose. We will discuss that more in Module 3.
- Login to My Heritage site and navigate to your DNA Matches list.
- First, review at least your top 5 matches at My Heritage that are not people you have tested or close family. We want to ensure we know where our highest matches fit in, make sure to examine all those >100cMs.
- Next we are going to extract data using Pedigree Thief to start creating our master match list. Unfortunately until the download reports return, much of our data will need to be downloaded one match at a time and slowly add to our list.
- Go to the My Heritage DNA Matches Page for your kit. If the top 5 matches you want to examine are not all in the first 10, go to the bottom of the page and expand the view to 25.
- The first time you access Pedigree Thief by clicking on the icon in your browser the screen shown below will appear. Don't worry if you don't see all these fields, we just need the first section at the moment.
- Leave the number of pages to read at 1, then click on 'Read Matches'. Pedigree Thief will read the matches on the page - either 10 or 25 depending on your selected view.
- Once Pedigree Thief finishes its review (screen stops changing), click again on the Pedigree Thief icon and the screen will re-appear, usually with additional fields.
- Click the options to save, we want 'All Matches' and 'All Segments'. The two files should be produced and saved to your download folder. We are only working with the 'MyHeritage Match ....' file for this part of the exercise.
- Retain the 'Shared DNA file' for later use.
- Go to your Downloads folder and find the 'MyHeritage Match ....' file.
- Duplicate the 'MyHeritage Match ....' file and rename it 'My Heritage Match List, Total Shared cMs - Kit_IDXXXX'
- Add 3 new columns to your spreadsheet - Side, MRCA and Notes.
- Your My Heritage Match List, Total Shared cMs - Kit_IDXXXX' spreadsheet should have populated with a number of matches.
- Review the matches, do you know if they are maternal or paternal (or both - ie share both sides, close relations)? Don't worry if you are not able to allocate sides at this stage.
- Think about the likely relationship, how many generations back in your tree might you expect to find the MRCA? (In Module 1 we discussed the Shared cMs tool to predict relationships? You might like to compare those estimates to those in the My Heritage prediction column).
- Notate your spreadsheet with the known or likely MRCA couple if you can, this is not essential.
- Over time, the 'My Heritage Match List, Total Shared cMs - Kit_IDXXXX' spreadsheet will become your master list as you add data to the spreadsheet. Add any other matches you examine in this exercise to your list, to avoid future re-work.
- Open the first 'Shared DNA file .....csv' you downloaded via Pedigree Thief and duplicate it.
- Next, save it as a separate sheet and give it a working title, ie 'My Heritage Shared DNA file - Kit_IDXXXX'.
- Add 4 new columns to your spreadsheet - Side, TG#, MRCA and Notes.
- You may wish to hide the 2 Match ID columns and the 2 RSID columns to make the sheet more manageable.
- Sort the spreadsheet by chromosome, then start and end locations.
- Login to My Heritage site and navigate to your DNA Matches list. Search for the DNA match of interest and click on the purple 'Review Match List' button to open up the relatives page.
- Click on the Pedigree Thief icon in your browser toolbar as we did in Part A and read the match. In this case there are 123 shared matches and in the first read, Pedigree Thief limited the search to 100 matches.
- Next, you will need to click on the 'Read Match' radio button again for Pedigree Thief to analyse all the shared matches. If there are many more shared matches you need to repeat this step until all matches are read.
- When the read is complete and all shared matches analysed, click the 4 radio buttons to produce files that will save to your downloads folder as we did in Part A. Note in the completed read image below, of the 123 shared matches only 18 triangulations were identified.
- Add the information included in 'MyHeritage Match ....' file to your My Heritage Match List, Total Shared cMs - Kit_IDXXXX' file, if it is not already there.
- Add the information included in 'MyHeritage Shared DNA ....' file to your My Heritage Shared DNA file - Kit_IDXXXX', if it is not already there.
- Retain the 'ICW' and 'Triangulations' files for later use.
Initial Read |
Completed Read |
- The 'MyHeritage ICW ....' file produces a list of all shared matches with your match but only includes the total cMs shared. You may wish to consolidate these reports into another master file similar to the ones we created earlier, but this is optional. Not all of these matches will share a common ancestor with you, but can be 'clues'. For our purposes we are more interested in 'shared segments' and 'trianguation' data as 'evidence' of a shared common ancestor.
- The 'MyHeritage Triangulation ....' file produces a list of all triangulations it encountered in the read of the match and details the match ID's and segments. It is not easily imported into our My Heritage Shared DNA file - Kit_IDXXXX' but is a useful reference tool. Again, you may wish to consolidate these reports into another master file similar to the ones we created earlier, but this is optional.
- Login to My Heritage site and navigate to your DNA Matches list. Search for the DNA match of interest and click on the purple 'Review Match List' button to open up the relatives page.
- On your match's DNA matches page, the number of shared matches should be the same as the numbers shown in the Pedigree Thief 'Read'. In my case, the match of interest 'Ian' has 123 shared matches listed on the My Heritage page.
- We are now going to examine the 'Triangulated segments' at My Heritage for our match of interest. Work through the match list looking for the Triangulated Segment Symbol, shown in the orange box below. Identify each match with the 'Triangulated symbol' - because our match of interest 'Ian' only shares on one segment, all the triangulated matches should also share on the same segment.
- When you get to the bottom of the page, click on the purple 'Show More Matches' box to examine more matches, work in this way until the last page.
- As you work through the match list at My Heritage, refer to the The 'MyHeritage Triangulation ....' file we produced earlier that you put aside. You might like to ensure that every match with the symbol on My Heritage is also on this list, this is optional.
- You should be able to see that all matches are on the same chromosome, in Ian's case - Chromosome 7. You may wish to add a column for the name of each match which is not included in this report for reasons outlined earlier. The Relative ID in column 2 of the 'MyHeritage Triangulation ....' file can be searched on (Control/Command F - using the Relative ID) in the 'MyHeritage ICW ....' file - it appears in Column 3 and the name of the match is in Column 4. You can also view the Relative ID when viewing the match at My Heritage in the URL. The Relative ID for the compared match is where the second 'D-' (plus the string of numbers and letters) appears in the address.
- Pedigree Thief identified 19 triangulated matches for Mum and Ian. By doing a manual count at My Heritage I found that there were 18 triangulated matches with Mum and Ian, 19 in total including Ian. As they all triangulate with Mum and Ian on the 'same segment location', they should form a 'triangulated group' and all share a 'common ancestor'.
- By examining the 'MyHeritage Triangulation ....' file I can see that all matches match in the same segment area ranging from shared cMs of 17.4 down to 7.3. You may wish to add your initial match's segment data to this file to be a complete record of matches in this potentially triangulated group.
- My Heritage - If you know which ancestor is likely to have handed down the segment on this chromosome, add each of the triangulated matches to your ancestor group 'Label', including Ian (your match of interest). In my case, I can tell the segment belongs to the Murphy-Bateman group, by adding the Murphy-Bateman label the each triangulated segment match shows a purple dot.
- Triangulated Group Data - Decide on a name for the triangulated group you hope to identify for the 'match of interest' from this exercise - it could be TG001-Side A (or whatever name you choose - it can reflect maternal/paternal if you know that information - eg M_001 or something similar). Rename your 'MyHeritage Triangulation ....' file with your TG # - an example could be 'My Heritage Triangulation TG1_C07_Maternal' file.
- You may wish to update both the notes section at My Heritage and your 'My Heritage Triangulation TG1_C07_Maternal' file with the details of the probable MRCA group/TG name.
- Return to your 'MyHeritage Triangulation ....' renamed 'My Heritage Triangulation TG1_C07_Maternal' file. To determine the shared triangulated segment area, we look at the highest start location, then the lowest end location.
- Sort the file by start and end locations. You can see in this example the highest start location is 5.5 but the lowest end location is 5.3. This indicates that there are likely to be two separate triangulated groups over the length of the match with Ian - he shares between 0 - 9.1. The full length of the segment area could be coming from just one ancestor, but we must keep in mind for larger segments that the segment could be the mix of an ancestral couple, eg a 2MGGF and 2MGGM. In this case I would monitor the full segment area between 0-9 as one group until further evidence emerges.
- Looking at your 'MyHeritage Triangulation ....' renamed 'My Heritage Triangulation TG1_C07_Maternal' file, if there are a very large number of triangulated matches all appearing to match on very similar start and end locations, double check that these are not in known false positive regions, or pile up areas. The chromosome map at DNA Painter is probably the easiest place to check this. If it is in a known false positive region, use a different match for the exercise, otherwise it may be too time consuming for what we are trying to demonstrate. Make a note in the notes column.
- Your sheet should now look something like the one below.
- For completeness, run reports with Pedigree Thief for each of the matches appearing in the triangulated group and save the first two reports 'save match data' and 'save chromosome data'. Add the data contained in these files to your two master sheets we created previously if it is not already there - 'My Heritage Match List, Total Shared cMs - Kit_IDXXXX' spreadsheet (Part A) and 'My Heritage Shared DNA file - Kit_IDXXXX' spreadsheet (Part B, Step 1). Add the new details of the TG# number and possible MRCA (if known) to the master sheets.
- Navigate to the DNA match page with your shared 'match of interest'.
- Click on the first TG icon symbol - this will open up the Chromosome Browser Triangulation Tool;
- 'Add and Remove' matches using the radio button on the top right of the page.
- Only a maximum of 7 matches can be viewed at one time.
- Check to see if your 'match of interest' or anyone else who appears in your triangulated group appears in the cluster report?;
- Who else is in the cluster?
- Which matches in the cluster are triangulated and who is only a shared match?
- Explore some new groups (optional);
- Don't forget to add any additional analysis to your two master spreadsheet lists 'My Heritage Match List, Total Shared cMs - Kit_IDXXXX' spreadsheet (Part A) and 'My Heritage Shared DNA file - Kit_IDXXXX' spreadsheet (Part B, Step 1) for future reference.