Genemonkey explains....: GEDmatch - Exercise to identify triangulated groups and false positive matches.

In Module 2 of 'Organising your DNA data and determining match groups' we discussed ways to organise data and a process for working with My Heritage to identify match groups. Before moving to Module 3, make sure you understand the theory of segment triangulation and have established a method for retaining records of your DNA research, to avoid rework and improve your productivity.

The purpose of undertaking these exercises is to consolidate your understanding of how to determine triangulated segments, identify potential match groups and whether a match is likely to be a false positive.

In the 2021 and 2023 programs we completed this exercise My Heritage. Due to temporarily suspension of downloads at many of the DNA sites, the data cannot be sourced for the exercise for the 2024 group.

This modified exercise seeks to use GEDmatch data to practice analysing your matches manually to ensure you appreciate the underlying theory. Remember, whilst shared matches are 'CLUES' of a shared ancestor, only shared and triangulated segments are 'EVIDENCE' of a common ancestor.

1. Identifying segments of interest

For this exercise, we are continuing to use the GEDmatch data we collected on our known ancestor match in Module 1, see the previous blogpost.

In my example I identified 3 'segments of interest' with my 'known match', who shared the MCRA of my problematic 2GGP's James Murphy and Elizabeth Bateman. In the previous exercise if you had other matches who shared segments going back to the target 'ancestral couple', you were advised to repeat the process and also add their shared segments to your spreadsheet. In my case there were 5 identified matches, but only 4 were on GEDmatch. After doing 'one on ones' at GEDmatch between the additional three matches and my mother I identified a total of 14 additional segments of interest.

The remaining 'known' match was on My Heritage and shared 3 different segments, however her data has been excluded for the moment.

2. Identifying matches in the same specific segment locations

Step 1: First we need to access Tier 1 Tools at GEDmatch, which requires a subscription. We are going to use the 'Segment Search' report. This report provides a list of all matches by 'chromosome' and shared 'segment location'. Run the report using the existing defaults and download the .csv file to your computer.

Open the 'Segment Search report'.csv file you downloaded from GEDmatch.

Save it as a separate sheet and give it a working title, i.e. 'Chromosome Analysis - Kit ID GEDmatch'. We will be amending this version, but we also want to retain the original 'Segment Search report'.

Now we are going to find the areas we identified as 'segments of interest' in Part 1.

Choose one of your largest segments from the list in Step 1 and find the matching segment location area in the newly created 'Chromosome Analysis - Kit ID GEDmatch' spreadsheet;

Identify all matches in your list, that match anywhere within the shared segment area of the selected segment of interest.

Highlight all the matches within the identified segment area.

In the following example I am using the segment on Chromosome 4 that is around 50cMs. This segment starts at 162.9 and ends at 191.1.

Step 2: We will now update the spreadsheet to be able to add additional information as we examine each of the matches throughout the analysis process.

In your newly created 'Chromosome Analysis - Kit ID GEDmatch' sheet, the columns we will be using will be Match Name 1, Match Name 2, Chromosome, Start Location, End Location, Centimorgans.

Delete or hide any unnecessary columns for manageability (optional).

You may also wish to format the Centimorgans column to display the numbers showing one decimal point so that it is easier to read (ie 168 is the default number, re-format the column so it would appear as 168.0).

Freeze the header row so that you can easily see and sort each column heading later.

In your newly created 'Chromosome Analysis - Kit ID GEDmatch' sheet, add columns for side (eg P, M, Both, I/F), TG (Triangulated Group), MRCA (Most Recent Common Ancestor) and notes.

This spreadsheet will now become your 'master sheet' for GEDmatch data for the relevant Kit_ID.

3. Identifying triangulated groups within the identified segment location

We are now going to review each match in the identified segment area to determine sides and to identify any triangulated segments, on either side of the chromosome.

Firstly, review the data collected from Exercise 3 in Module 1 and transfer the information to the newly created 'Chromosome Analysis - Kit ID GEDmatch' sheet.

For matches who 'triangulate' with your 'known' match - allocate them to the same side of the family.

Groups of matches who all triangulated with each other should be allocated a TG number - develop and maintain a system that works for you. As an example, you could number them as M###, P###, and Side 1_###, Side 2_### if you have not yet identified a side. In this exercise we know the side so it will be either M### or P###.

If you know the MRCA (Most Recent Common Ancestor) notate them in the MRCA column;

Add any notes you think may be useful in the future as you explore matches.

You may find other close matches from your family in this list, as I did. In my case, for such a large segment area, there are only limited matches appearing which was surprising. Each segment area will be different, for every kit. You may have a few or many matches within the segment area. If you have a large number, reduce the cMs limit to say >15cMs to make the list more manageable for the exercise.

Your sheet may now look something like this.

4. Identifying triangulated groups on the opposing side and false segments

The last part of the exercise is to use the information we have found about our triangulated group to inform us about the 'same segment area' on the 'other side of the chromosome'. We will next review all the remaining matches who have not been marked to see if we can find matches on the other side of the chromosome and identify potentially false segment matches.

Firstly check that the match DOES NOT share the segment area with matches identified as belonging to the same side of the 'known' match. Be sure to check the segment locations carefully, particularly if you are examining a long segment like this one as there can potentially be multiple triangulated segment areas along the length of the chromosome.

Then run 'people who match 2 kits' and see if there are any shared and triangulated matches on the other side of the chromosome. If so, in this case they would be marked as 'paternal' and allocated a PXXX TG number.

If there are still matches appearing in the list, that do not belong to either the 'maternal or 'paternal' triangulated groups, then they must be 'false positives' segments or 'IBS (identical by state)'. Update your sheet accordingly.

Remember we can only mark a segment as false when there are triangulated groups in the specific opposing segment locations - on both sides of the chromosome.

If there is no TG identified on the opposing side, the default position would be to allocate all the remaining matches to the other side for your known match. Once a TG is identified in this area, the matches should be re-examined and allocated to Maternal, Paternal or False. Marking segments on both sides as you go, assists with productivity and reduces duplicated effort down the track.

Being systematic and methodical will help you in the analysis process. It may seem slow and tedious to start but you will benefit over time.

Next steps:

The next step in the process is to interrogate each of the matches in the 'triangulated group' (TG) and to aim to push back the segment a further generation by finding a more distant MRCA. Everyone in the segment 'triangulated group' will share a common ancestor. The segment could go back many generations, or the match could fit in even closer to you than the identified MRCA, but still shares the more distant MRCA with others. Our goal is to push back one generation at a time and 'walk back the segment'.

Even if you cannot identify more MRCAs at this stage, add notes about names and locations that might be useful for further analysis down the track.

To fully explore all the clues for your ancestor of interest, you should repeat this exercise for all the identified 'segments of interest' identified in Section 1. Hopefully, more clues will emerge with common surnames or locations between groups.

It is not necessary to do all these segments for the purposes of Module 2, just do enough so you are confident with determining TG's on both sides and knowing when it is appropriate to mark some segments as 'false positives'.

If you are feeling particularly keen to do more and you have other close family tested, repeat all the the steps for the 'known' match with each of them, starting with Exercise 3 in Module 1. Make sure you consider where the tester sits in the tree and the implications on 'DNA inheritance' when deciding who to examine. Consider such things as parent/child relationships, if the match is on the maternal side and the mother has tested, there would be no need to examine the child or other descendants as relevant DNA would have already been identified from their mother. After doing this for my kits for the descendants of the Murphy-Bateman 'ancestral couple', my list of 'segment areas of interest' increased from 14 to 32!

Veronica Williams

First published: 3 August 2024

Saturday, August 3, 2024

GEDmatch - Exercise to identify triangulated groups and false positive matches.