Genemonkey explains....: August 2024

Saturday, August 3, 2024

My Heritage - Modified exercise to identify triangulated groups using Pedigree Thief

In Module 2 of 'Organising your DNA data and determining match groups' we discussed ways to organise data and a process for working with My Heritage to identify match groups. Before moving to Module 3, make sure you understand the theory of segment triangulation and have established a method for retaining records of your DNA research, to avoid rework and improve your productivity.

My Heritage is a DNA testing site that allows uploads from other testing companies. It's free to upload, but if you are not a My Heritage subscriber you will be required to pay a small 'one off' unlock fee to access DNA tools. My Heritage has usually has two downloads that are useful for chromosome analysis, however these are currently DISABLED.

In the 2021 and 2023 programs we completed an exercise at My Heritage to consolidate your understanding of how to determine triangulated segments, identify potential match groups and whether a match is likely to be a false positive. It is not possible to do that for the 2024 program given the reports are unavailable.

For the 2024 program, this modified 'Module 2: Exercise 2' seeks to use My Heritage data to continue to practicing analysing your matches manually to ensure you appreciate the underlying theory of shared matches vs triangulated segment matches to identify potential match groups. Remember, whilst shared matches are 'CLUES' of a shared ancestor, only shared and triangulated segments are 'EVIDENCE' of a common ancestor. Unfortunately due to the downloads being disabled we cannot extend the exercise to assess whether other matches are false positives, however 'Module 2: Exercise 1' using GEDmatch does attempt to cover this.

Due to download data not being available from My Heritage, we are going to use 'Pedigree Thief' a tool made available by loading a chrome extension, to help speed up some of the manual data collection.

My Heritage also has a clustering tool which can assist you to determine where to start your research, but this is best utilised once you understand more about triangulation. The clusters are an indication of matches in common, however understanding segment triangulation and how triangulated groups are formed will assist you to analyse the clusters more fully, helping to identify the key matches in the cluster more likely to share a common ancestor.

The following activities are suggested to help you apply 'Module 2' in practice. If you have access to old downloaded data from My Heritage I recommend you complete the original exercise after completing the activities outlined in this blogpost.

Setting Up Pedigree Thief

First, you will need to download Pedigree Thief from the Chrome Web Store and update your browser. Once loaded you should see the icon symbol above on your browser adjacent to the search bar. Before using it for the first time, right-click the icon symbol and select 'Help' from the menu. This will give guidance to setting up and using the extension. For further help join the Pedigree Thief Facebook Group.

Pedigree Thief also extracts pedigrees (as the name implies) and is an extremely useful product for that purpose. We will discuss that more in Module 3.

Part A: Using the Broad Approach by Total cMs - My Heritage www.myheritage.com

The 'DNA matches list' which can normally be downloaded from My Heritage for each DNA kit is currently disabled. This makes it difficult to analyse your match list by chromosome. As a result, we are going to use Pedigree Thief to download the data that we need; it will also assist in reducing the amount of typing you have to do. We are not going to be able to extract everything required, nor is it in the perfect format. Hopefully, downloads will return soon and this is just a short term 'semi' solution.

Login to My Heritage site and navigate to your DNA Matches list.
First, review at least your top 5 matches at My Heritage that are not people you have tested or close family. We want to ensure we know where our highest matches fit in, make sure to examine all those >100cMs.
Next we are going to extract data using Pedigree Thief to start creating our master match list. Unfortunately until the download reports return, much of our data will need to be downloaded one match at a time and slowly add to our list.
Go to the My Heritage DNA Matches Page for your kit. If the top 5 matches you want to examine are not all in the first 10, go to the bottom of the page and expand the view to 25.
The first time you access Pedigree Thief by clicking on the icon in your browser the screen shown below will appear. Don't worry if you don't see all these fields, we just need the first section at the moment.
Leave the number of pages to read at 1, then click on 'Read Matches'. Pedigree Thief will read the matches on the page - either 10 or 25 depending on your selected view.
Once Pedigree Thief finishes its review (screen stops changing), click again on the Pedigree Thief icon and the screen will re-appear, usually with additional fields.
Click the options to save, we want 'All Matches' and 'All Segments'. The two files should be produced and saved to your download folder. We are only working with the 'MyHeritage Match ....' file for this part of the exercise.
Retain the 'Shared DNA file' for later use.

Updating your sheet

Go to your Downloads folder and find the 'MyHeritage Match ....' file.
Duplicate the 'MyHeritage Match ....' file and rename it 'My Heritage Match List, Total Shared cMs - Kit_IDXXXX'
Add 3 new columns to your spreadsheet - Side, MRCA and Notes.
Your My Heritage Match List, Total Shared cMs - Kit_IDXXXX' spreadsheet should have populated with a number of matches.
Review the matches, do you know if they are maternal or paternal (or both - ie share both sides, close relations)? Don't worry if you are not able to allocate sides at this stage.
Think about the likely relationship, how many generations back in your tree might you expect to find the MRCA? (In Module 1 we discussed the Shared cMs tool to predict relationships? You might like to compare those estimates to those in the My Heritage prediction column).
Notate your spreadsheet with the known or likely MRCA couple if you can, this is not essential.
Over time, the 'My Heritage Match List, Total Shared cMs - Kit_IDXXXX' spreadsheet will become your master list as you add data to the spreadsheet. Add any other matches you examine in this exercise to your list, to avoid future re-work.

You should end up with a sheet that looks something like the one below.

Part B: Using Chromosome Analysis (Targeted Approach) - My Heritage www.myheritage.com

In Part B we want to examine a 'match of interest' who only shares on one segment (to make it easier to interrogate) so Joan's match #6 with 'Ian' would be suitable. If your first 'match of interest' shares more than 1 segment, you will need to drill down further in the list to find another potentially suitable match to examine for this exercise.

Matches sharing more than one segment will have mix of ICW (in common with) matches who may all share on different segments. By choosing a match who only shares one segment - all 'triangulated' ICW matches should share on the same segment. This will make the exercise more manageable.

For this exercise we are going to utilise the 'Shared DNA file .csv' you have already downloaded, to create a master 'Shared DNA' spreadsheet.

Step 1. Prepare the spreadsheet:

Open the first 'Shared DNA file .....csv' you downloaded via Pedigree Thief and duplicate it.
Next, save it as a separate sheet and give it a working title, ie 'My Heritage Shared DNA file - Kit_IDXXXX'.
Add 4 new columns to your spreadsheet - Side, TG#, MRCA and Notes.
You may wish to hide the 2 Match ID columns and the 2 RSID columns to make the sheet more manageable.
Sort the spreadsheet by chromosome, then start and end locations.

You should end up with a sheet that looks something like the one below.

NOTE: You may prefer to use the My Heritage system for recording your overall notes and not wish to retain a separate master 'match' list, but the shared DNA file is essential for chromosome analysis. It comes down to personal preference, over time you will work out what works best for you.

Step 2. Extract data about our 'match of interest' from Pedigree Thief

Login to My Heritage site and navigate to your DNA Matches list. Search for the DNA match of interest and click on the purple 'Review Match List' button to open up the relatives page.
Click on the Pedigree Thief icon in your browser toolbar as we did in Part A and read the match. In this case there are 123 shared matches and in the first read, Pedigree Thief limited the search to 100 matches.
Next, you will need to click on the 'Read Match' radio button again for Pedigree Thief to analyse all the shared matches. If there are many more shared matches you need to repeat this step until all matches are read.
When the read is complete and all shared matches analysed, click the 4 radio buttons to produce files that will save to your downloads folder as we did in Part A. Note in the completed read image below, of the 123 shared matches only 18 triangulations were identified.
Add the information included in 'MyHeritage Match ....' file to your My Heritage Match List, Total Shared cMs - Kit_IDXXXX' file, if it is not already there.
Add the information included in 'MyHeritage Shared DNA ....' file to your My Heritage Shared DNA file - Kit_IDXXXX', if it is not already there.
Retain the 'ICW' and 'Triangulations' files for later use.

Initial Read

Completed Read

Notes about the Pedigree Thief files

The 'MyHeritage ICW ....' file produces a list of all shared matches with your match but only includes the total cMs shared. You may wish to consolidate these reports into another master file similar to the ones we created earlier, but this is optional. Not all of these matches will share a common ancestor with you, but can be 'clues'. For our purposes we are more interested in 'shared segments' and 'trianguation' data as 'evidence' of a shared common ancestor.
The 'MyHeritage Triangulation ....' file produces a list of all triangulations it encountered in the read of the match and details the match ID's and segments. It is not easily imported into our My Heritage Shared DNA file - Kit_IDXXXX' but is a useful reference tool. Again, you may wish to consolidate these reports into another master file similar to the ones we created earlier, but this is optional.

Both these files are designed to be imported into GDAT (Genealogical DNA Analysis Tool). However, match and shared DNA data for GDAT relies on the My Heritage download files that are currently disabled. We are in a very difficult position at present. My suggestion is to upload the individual segment data to your My Heritage Shared DNA file - Kit_IDXXXX' using the process outlined in Part A.

Step 3. Review triangulated segments for our match of interest on the My Heritage site

Login to My Heritage site and navigate to your DNA Matches list. Search for the DNA match of interest and click on the purple 'Review Match List' button to open up the relatives page.
On your match's DNA matches page, the number of shared matches should be the same as the numbers shown in the Pedigree Thief 'Read'. In my case, the match of interest 'Ian' has 123 shared matches listed on the My Heritage page.
We are now going to examine the 'Triangulated segments' at My Heritage for our match of interest. Work through the match list looking for the Triangulated Segment Symbol, shown in the orange box below. Identify each match with the 'Triangulated symbol' - because our match of interest 'Ian' only shares on one segment, all the triangulated matches should also share on the same segment.
When you get to the bottom of the page, click on the purple 'Show More Matches' box to examine more matches, work in this way until the last page.

Concurrently work with your Pedigree Thief data....

As you work through the match list at My Heritage, refer to the The 'MyHeritage Triangulation ....' file we produced earlier that you put aside. You might like to ensure that every match with the symbol on My Heritage is also on this list, this is optional.
You should be able to see that all matches are on the same chromosome, in Ian's case - Chromosome 7. You may wish to add a column for the name of each match which is not included in this report for reasons outlined earlier. The Relative ID in column 2 of the 'MyHeritage Triangulation ....' file can be searched on (Control/Command F - using the Relative ID) in the 'MyHeritage ICW ....' file - it appears in Column 3 and the name of the match is in Column 4. You can also view the Relative ID when viewing the match at My Heritage in the URL. The Relative ID for the compared match is where the second 'D-' (plus the string of numbers and letters) appears in the address.
Pedigree Thief identified 19 triangulated matches for Mum and Ian. By doing a manual count at My Heritage I found that there were 18 triangulated matches with Mum and Ian, 19 in total including Ian. As they all triangulate with Mum and Ian on the 'same segment location', they should form a 'triangulated group' and all share a 'common ancestor'.
By examining the 'MyHeritage Triangulation ....' file I can see that all matches match in the same segment area ranging from shared cMs of 17.4 down to 7.3. You may wish to add your initial match's segment data to this file to be a complete record of matches in this potentially triangulated group.

Keeping records...

My Heritage - If you know which ancestor is likely to have handed down the segment on this chromosome, add each of the triangulated matches to your ancestor group 'Label', including Ian (your match of interest). In my case, I can tell the segment belongs to the Murphy-Bateman group, by adding the Murphy-Bateman label the each triangulated segment match shows a purple dot.
Triangulated Group Data - Decide on a name for the triangulated group you hope to identify for the 'match of interest' from this exercise - it could be TG001-Side A (or whatever name you choose - it can reflect maternal/paternal if you know that information - eg M_001 or something similar). Rename your 'MyHeritage Triangulation ....' file with your TG # - an example could be 'My Heritage Triangulation TG1_C07_Maternal' file.
You may wish to update both the notes section at My Heritage and your 'My Heritage Triangulation TG1_C07_Maternal' file with the details of the probable MRCA group/TG name.

Reviewing the Triangulation...

Return to your 'MyHeritage Triangulation ....' renamed 'My Heritage Triangulation TG1_C07_Maternal' file. To determine the shared triangulated segment area, we look at the highest start location, then the lowest end location.
Sort the file by start and end locations. You can see in this example the highest start location is 5.5 but the lowest end location is 5.3. This indicates that there are likely to be two separate triangulated groups over the length of the match with Ian - he shares between 0 - 9.1. The full length of the segment area could be coming from just one ancestor, but we must keep in mind for larger segments that the segment could be the mix of an ancestral couple, eg a 2MGGF and 2MGGM. In this case I would monitor the full segment area between 0-9 as one group until further evidence emerges.
Looking at your 'MyHeritage Triangulation ....' renamed 'My Heritage Triangulation TG1_C07_Maternal' file, if there are a very large number of triangulated matches all appearing to match on very similar start and end locations, double check that these are not in known false positive regions, or pile up areas. The chromosome map at DNA Painter is probably the easiest place to check this. If it is in a known false positive region, use a different match for the exercise, otherwise it may be too time consuming for what we are trying to demonstrate. Make a note in the notes column.
Your sheet should now look something like the one below.
For completeness, run reports with Pedigree Thief for each of the matches appearing in the triangulated group and save the first two reports 'save match data' and 'save chromosome data'. Add the data contained in these files to your two master sheets we created previously if it is not already there - 'My Heritage Match List, Total Shared cMs - Kit_IDXXXX' spreadsheet (Part A) and 'My Heritage Shared DNA file - Kit_IDXXXX' spreadsheet (Part B, Step 1). Add the new details of the TG# number and possible MRCA (if known) to the master sheets.

Key icons for working at My Heritage

Notes on Labelling at My Heritage

My preference for working at My Heritage is to only add Labels on 'triangulated segments'. It can be dangerous to mark matches as belonging to groups on the basis of a shared matches alone. My Heritage reports segments down to 6cMs many of these can be false or belong to a different ancestor than other larger segments, ie you may share both a recent and a distant ancestor. In addition, My Heritage uses imputation which can add segments to your match compared to other companies. Both these situations can squew results if incorrectly used as the basis for labelling.

To be sure, only label triangulated segments. Eventually other 'genetic cousins' for that line will all be labelled when analysed via their own DNA match page for the triangulated segments they share.

Part 4. Review the other side of the chromosome

Normally, we would examine matches in the same segment area who don't triangulate with our match of interest, to determine if they belong to a 'Triangulated group (TG)' on the other side of the chromosome. If your match of interest is maternal, those in TG's the other side of the chromosome, in the same segment location, who don't match the maternal group, must be paternal. Those in the same segment location who do not match either the maternal or paternal TG's, are likely false positive matches and should be discarded.

As full segment match list downloads have been TEMPORARILY DISABLED by My Heritage (since September 2023!) we cannot complete this step at this stage. We need to rely on amassing segment data in our master list to eventually be able to achieve this.

Part 5. Is the whole group triangulated?

You can now go back to the 'triangulated match tool' on the My Heritage site and check to see if all matches in the group you have identified triangulate with each other. To do this:

Navigate to the DNA match page with your shared 'match of interest'.
Click on the first TG icon symbol - this will open up the Chromosome Browser Triangulation Tool;
'Add and Remove' matches using the radio button on the top right of the page.
Only a maximum of 7 matches can be viewed at one time.

If the first segment was quite long, you may find the triangulations are in sub groups, like in the diagram below. You will need to play around with your match comparisons to identify the subgroups, but eventually you might see this sort of pattern. (NOTE: This example is on Chromosome 10, a different group to the one shared with Ian).

This is demonstrating that whilst everyone is triangulating with 'Match A' (red), they are coming into the group at different levels. You and 'Match A' share the longest segment, so 'Match A' is probably a closer relation to you and your shared MRCA couple. The other matches may reflect matches to the same MRCA couple or perhaps a segment belonging to an older ancestor from one of the ancestors in that MRCA ancestral couple.

If you can identify the MRCA couple for Match A, others in each of sub groups could be segments coming from any of the 4 parents of that couple depending upon whether there was a recombination event in the segment area for Match A.

For example if Match A (red) was a 2nd cousin, they share your great grandparents. The subgroups could belong to either of those, or potentially different 2nd great grandparents depending on where recombination events occurred. Whilst I would initially call this one triangulated group for the entire length of Match A, if you identify the subgroups as belonging to more distant ancestors, you may wish to rename your subgroups as separate TG's.

Read more about reviewing your DNA matches at My Heritage.

Next Steps - the quick way!

After you have mastered identifying your triangulated groups, access the 'My Heritage Auto Clusters' tool under the DNA tools menu. Click 'Explore' the "Generate' after selecting the kit of interest. Your report will be emailed to you. When reviewing the report remember that these are shared match clusters and will need to be examined carefully to identify triangulated groups.

Check to see if your 'match of interest' or anyone else who appears in your triangulated group appears in the cluster report?;
Who else is in the cluster?
Which matches in the cluster are triangulated and who is only a shared match?
Explore some new groups (optional);
Don't forget to add any additional analysis to your two master spreadsheet lists 'My Heritage Match List, Total Shared cMs - Kit_IDXXXX' spreadsheet (Part A) and 'My Heritage Shared DNA file - Kit_IDXXXX' spreadsheet (Part B, Step 1) for future reference.

Unfortunately, the defaults for cluster reports are pre determined by My Heritage and cannot be manipulated. If no one in your identified TG appears in the cluster report, pick an alternative cluster of interest and apply what you have learnt from the exercise by examining matches in that cluster.

Need more challenges?

Have you identified matches at My Heritage with shared common ancestors on your mystery line? See if you can identify any triangulated groups on the segments shared with them using the techniques in this exercise.

Veronica Williams

First Published: 3 August 2024

GEDmatch - Exercise to identify triangulated groups and false positive matches.

The purpose of undertaking these exercises is to consolidate your understanding of how to determine triangulated segments, identify potential match groups and whether a match is likely to be a false positive.

In the 2021 and 2023 programs we completed this exercise My Heritage. Due to temporarily suspension of downloads at many of the DNA sites, the data cannot be sourced for the exercise for the 2024 group.

This modified exercise seeks to use GEDmatch data to practice analysing your matches manually to ensure you appreciate the underlying theory. Remember, whilst shared matches are 'CLUES' of a shared ancestor, only shared and triangulated segments are 'EVIDENCE' of a common ancestor.

1. Identifying segments of interest

For this exercise, we are continuing to use the GEDmatch data we collected on our known ancestor match in Module 1, see the previous blogpost.

In my example I identified 3 'segments of interest' with my 'known match', who shared the MCRA of my problematic 2GGP's James Murphy and Elizabeth Bateman. In the previous exercise if you had other matches who shared segments going back to the target 'ancestral couple', you were advised to repeat the process and also add their shared segments to your spreadsheet. In my case there were 5 identified matches, but only 4 were on GEDmatch. After doing 'one on ones' at GEDmatch between the additional three matches and my mother I identified a total of 14 additional segments of interest.

The remaining 'known' match was on My Heritage and shared 3 different segments, however her data has been excluded for the moment.

2. Identifying matches in the same specific segment locations

Step 1: First we need to access Tier 1 Tools at GEDmatch, which requires a subscription. We are going to use the 'Segment Search' report. This report provides a list of all matches by 'chromosome' and shared 'segment location'. Run the report using the existing defaults and download the .csv file to your computer.

Open the 'Segment Search report'.csv file you downloaded from GEDmatch.

Save it as a separate sheet and give it a working title, i.e. 'Chromosome Analysis - Kit ID GEDmatch'. We will be amending this version, but we also want to retain the original 'Segment Search report'.

Now we are going to find the areas we identified as 'segments of interest' in Part 1.

Choose one of your largest segments from the list in Step 1 and find the matching segment location area in the newly created 'Chromosome Analysis - Kit ID GEDmatch' spreadsheet;

Identify all matches in your list, that match anywhere within the shared segment area of the selected segment of interest.

Highlight all the matches within the identified segment area.

In the following example I am using the segment on Chromosome 4 that is around 50cMs. This segment starts at 162.9 and ends at 191.1.

Step 2: We will now update the spreadsheet to be able to add additional information as we examine each of the matches throughout the analysis process.

In your newly created 'Chromosome Analysis - Kit ID GEDmatch' sheet, the columns we will be using will be Match Name 1, Match Name 2, Chromosome, Start Location, End Location, Centimorgans.

Delete or hide any unnecessary columns for manageability (optional).

You may also wish to format the Centimorgans column to display the numbers showing one decimal point so that it is easier to read (ie 168 is the default number, re-format the column so it would appear as 168.0).

Freeze the header row so that you can easily see and sort each column heading later.

In your newly created 'Chromosome Analysis - Kit ID GEDmatch' sheet, add columns for side (eg P, M, Both, I/F), TG (Triangulated Group), MRCA (Most Recent Common Ancestor) and notes.

This spreadsheet will now become your 'master sheet' for GEDmatch data for the relevant Kit_ID.

3. Identifying triangulated groups within the identified segment location

We are now going to review each match in the identified segment area to determine sides and to identify any triangulated segments, on either side of the chromosome.

Firstly, review the data collected from Exercise 3 in Module 1 and transfer the information to the newly created 'Chromosome Analysis - Kit ID GEDmatch' sheet.

For matches who 'triangulate' with your 'known' match - allocate them to the same side of the family.

Groups of matches who all triangulated with each other should be allocated a TG number - develop and maintain a system that works for you. As an example, you could number them as M###, P###, and Side 1_###, Side 2_### if you have not yet identified a side. In this exercise we know the side so it will be either M### or P###.

If you know the MRCA (Most Recent Common Ancestor) notate them in the MRCA column;

Add any notes you think may be useful in the future as you explore matches.

You may find other close matches from your family in this list, as I did. In my case, for such a large segment area, there are only limited matches appearing which was surprising. Each segment area will be different, for every kit. You may have a few or many matches within the segment area. If you have a large number, reduce the cMs limit to say >15cMs to make the list more manageable for the exercise.

Your sheet may now look something like this.

4. Identifying triangulated groups on the opposing side and false segments

The last part of the exercise is to use the information we have found about our triangulated group to inform us about the 'same segment area' on the 'other side of the chromosome'. We will next review all the remaining matches who have not been marked to see if we can find matches on the other side of the chromosome and identify potentially false segment matches.

Firstly check that the match DOES NOT share the segment area with matches identified as belonging to the same side of the 'known' match. Be sure to check the segment locations carefully, particularly if you are examining a long segment like this one as there can potentially be multiple triangulated segment areas along the length of the chromosome.

Then run 'people who match 2 kits' and see if there are any shared and triangulated matches on the other side of the chromosome. If so, in this case they would be marked as 'paternal' and allocated a PXXX TG number.

If there are still matches appearing in the list, that do not belong to either the 'maternal or 'paternal' triangulated groups, then they must be 'false positives' segments or 'IBS (identical by state)'. Update your sheet accordingly.

Remember we can only mark a segment as false when there are triangulated groups in the specific opposing segment locations - on both sides of the chromosome.

If there is no TG identified on the opposing side, the default position would be to allocate all the remaining matches to the other side for your known match. Once a TG is identified in this area, the matches should be re-examined and allocated to Maternal, Paternal or False. Marking segments on both sides as you go, assists with productivity and reduces duplicated effort down the track.

Being systematic and methodical will help you in the analysis process. It may seem slow and tedious to start but you will benefit over time.

Next steps:

The next step in the process is to interrogate each of the matches in the 'triangulated group' (TG) and to aim to push back the segment a further generation by finding a more distant MRCA. Everyone in the segment 'triangulated group' will share a common ancestor. The segment could go back many generations, or the match could fit in even closer to you than the identified MRCA, but still shares the more distant MRCA with others. Our goal is to push back one generation at a time and 'walk back the segment'.

Even if you cannot identify more MRCAs at this stage, add notes about names and locations that might be useful for further analysis down the track.

To fully explore all the clues for your ancestor of interest, you should repeat this exercise for all the identified 'segments of interest' identified in Section 1. Hopefully, more clues will emerge with common surnames or locations between groups.

It is not necessary to do all these segments for the purposes of Module 2, just do enough so you are confident with determining TG's on both sides and knowing when it is appropriate to mark some segments as 'false positives'.

If you are feeling particularly keen to do more and you have other close family tested, repeat all the the steps for the 'known' match with each of them, starting with Exercise 3 in Module 1. Make sure you consider where the tester sits in the tree and the implications on 'DNA inheritance' when deciding who to examine. Consider such things as parent/child relationships, if the match is on the maternal side and the mother has tested, there would be no need to examine the child or other descendants as relevant DNA would have already been identified from their mother. After doing this for my kits for the descendants of the Murphy-Bateman 'ancestral couple', my list of 'segment areas of interest' increased from 14 to 32!

Veronica Williams

First published: 3 August 2024