Genemonkey explains....: Digging into segments - in the absence of reports from My Heritage

I have long been a fan of segment analysis to find your common ancestor. The last couple of years have been particularly frustrating since DNA companies have turned off their downloads for segment data.

Having downloaded most of my matches regularly to the Genealogical DNA Database Tool (GDAT) this issue was manageable in what I thought would be the 'short term' as I had plenty of matches to explore and could easily look at isolated segment data on some sites. AncestryDNA's release of Enhanced Shared Matching late last year also kept my interest up for a time and is proving to be invaluable, but I find myself exploring and building family groups for matches that are not my priority. Was this the best use of my time in finding my nemesis ancestor, my 2nd great grandfather George Courtney!

Using Visual Phasing techniques I have identified all the likely segment areas that have come from the 'Courtney line'. Two sets of siblings, my mother and three of her siblings (RM siblings), plus another four siblings, children of Mums paternal first cousin (RR siblings). How can I interrogate new matches in these segment areas without downloadable lists?

4 RM Siblings: Segments by Grandparent on Chromosome 4

There are bound to be new matches in these locations, but how do I find them? Continually visiting known matches in each segment location for the 10 kits relevant to the search was a long winded and laborious exercise, always hoping for a miracle. The relevant matches are not likely to be 'big' unknown matches, the key ones are more likely to be buried in the smaller matches, much harder to find.

Whilst GEDmatch and FTDNA have segment reports, there are few new matches and few trees. The My Heritage site was my best bet, providing segments, triangulation tools and many trees. Using tools like Pedigree Thief it is possible to obtain match lists providing Total cMs, this has helped find new larger matches, but it is difficult to identify specific ones that might be paternal matches of interest for me. I had to work through each one by one, extract the segment data and then compare to my Visual Phasing output. On most occasions it turned out the match was not in the segment area of interest. I tried to consider how this could be a more productive process.

I have never been a big user of the label (dots) system at My Heritage given there are so many false positives appearing in the shared match list. This being mainly due to the small match threshold of 6cMs and at times imputation. At AncestryDNA many of these old population segments have been eliminated by their Timber algorithm. At My Heritage, I always prefer to add matches to an 'ancestral couple' label ONLY when there is a triangulated segment. This way, poor label allocation doesn't lead to new matches being incorrectly labelled in the future. It was time to revisit my process.

I started to consider how Pedigree Thief, the new fabulous Genealogy Assistant (chrome extension) and Labels at My Heritage could help me to better identify and prioritise key matches for segment analysis.

Starting with Labels

My current labels were already organised by ancestral couples. For my Courtney project, I have 10 primary kits (plus 4 descendants) at My Heritage that need analysing in a priority order, shared segments between kits, then each kit needs examining for segments that were only inherited by them. The Courtney line is on Mum's paternal side, so I started with her matches and the 4 sibling kits of the children of her paternal first cousin (RRS) who share the Roberts-Courtney line. I decided to create labels for each of the RR siblings and then another label for matches who triangulate with one or all of them, as these matches would be my priority. As we are talking large numbers of shared matches, I recommend loading the Genealogy Assistant. These cousin siblings each had between 551-772 shared matches with my Mum. As both sets of Roberts siblings have mostly Irish maternal sides, there are a lot of shared matches that are old population segments and false positives that need to be 'weeded out' to be left with only valid Roberts-Courtney matches.

Navigate to the shared match page for the tester (in my case Mum) and the key match (RRS, sibling 1);
Go to the bottom of the shared match page and use the field shown for auto-load pages and put in an estimate of the number of pages needed (Genealogy Assistant must be loaded).
Navigate back up to the top of the list and slowly scroll through to the end of the match list. You will know when you are at the end when the 'Show more DNA Matches' disappears.
Once again, navigate up to the top of the list. Click on the 'select all matches' button (Genealogy Assistant). Before choosing a label watch the 'Manage Labels' count and let it get to the maximum number of matches selected.
Select an existing label or 'Create a new label'. Click Apply'. If you have large numbers of matches, then you may get a spinning wheel for a while and the page appears to have hung. Wait a decent length of time for it to finish. I have found that if you open the shared match page in another window and check if the last person has been labelled, it has usually completed the process.
While all the matches are selected we are next going to identify all the triangulated matches to add to our combined 'Matches TG - RRS' label. If your results are like mine there are more triangulated matches than un-triangulated matches. If so, remove the tick from everyone who does not have the triangulated group symbol. Otherwise, do the reverse. If you wait until later to do this part, when you 'select all matches' again, make sure to wait and watch the 'Manage Labels' count before selecting the new label.
Add the reduced number of matches to the 'Matches TG - RRS' label.
Repeat the process for other key matches (in this example the next 3 RR siblings), adding triangulated matches to the same 'Matches TG - RRS' label.

My mother had 16,566 matches at My Heritage when undertaking this exercise. After this process, the total number of shared matches for the RR siblings was 1463 and of those, 1043 were triangulated.

As the 1043 matches were all triangulating with Mum and at least one RR sibling we know each of them must have inherited the segment from the same common ancestor. As a result, we can be confident that the segment shared between the two sets of siblings (RM and RR), was inherited from their shared MRCA being the 'Roberts-Courtney' couple - Edward Roberts (Baker/Dye) and Abigail Courtney. How each of the 1043 matches are related is yet to be determined, but it will be back on one of either the Roberts OR Courtney lines. These 1043 can now also be labelled to the 'Roberts-Courtney' ancestral group (NOTE: The Roberts-Courtney group is an existing label for the 'ancestral couple'. It will contain other matches already identified through DNA analysis and may be larger than the newly created 'Matches TG - RRS' label).

Navigate back to the main match page and select 'Filters'. Go to 'Labels' and select the consolidated TG group - 'Matches TG - RRS'.
With the Genealogy Assistant 50 matches will appear on each page in this view. With 1043 matches we have 21 pages to process. Starting on page 1 navigate to the top of the match list to 'select all matches'. Then choose the appropriate label (in this case 'Roberts-Courtney'), select 'Apply'.
Repeat the process with the remaining pages. Firstly, navigate to the bottom of the list to select the next page number, then navigate to the top to 'select all matches', 'select label', 'apply'.

If you would like a spreadsheet generated as a list of these matches, enter the total number of pages and click the 'Download CSV' button. You must have the Genealogy Assistant loaded for this option to appear. You will need to wait for each page to populate and process all the pages before the CSV will be ready to download to your computer.

The spreadsheet contains the match name, estimated relationship, shared Total cMs, number of shared segments, largest cMs segment, tree size and the location of the match. Matches are listed in Total cMs order, but can also be sorted by largest cMs segment to identify priority matches for review. When reviewing the filtered Label of 'Matches TG - RRS', you can select the 'Sort By - largest segment' to achieve the same result.

Whilst at this point we have identified 1043 priority matches, we have no way of telling which of them have shared segments in the Courtney grandparent location (Figure: Chromosome 4).

My aim is to isolate all the George Courtney segments. The next challenge will be to analyse all matches in the 'Matches TG - RRS' group and split them into one of the 4 paternal great grandparent lines: Roberts PGGF (Dye), Laundon PGGM (wife of Edward Roberts), Paice PGGM (wife of George) and finally our mystery PGGF - Abigails father 'George Courtney'!

The remaining 420 shared matches who do not triangulate with our RR siblings will need additional analysis further down the track. They may share additional segments not inherited by the RM siblings or they may be false positives or old population segments. At this point we are only prioritising the 1043 triangulated matches as we know they are valid segments and should be reviewed first.

Getting segment data via Pedigree Thief

In the absence of downloads from My Heritage, Pedigree Thief is the only tool I know that can help extract specific segment data at the moment. It can prove to be difficult at times, as the site imposes limits and sometimes suggests we are doing too much, what it calls 'scraping'. This can sometimes lead to being temporarily barred from using the site, so we need to minimise each query as much as possible.

We can obtain segment data from My Heritage using Pedigree Thief in two ways:

From the main match page, where options are 'with DNA' or 'without DNA'. I would recommended extracting data here first using 'no DNA' to maximise the number of matches you are adding to the Pedigree Thief database.
From the shared match list, you can extract shared matches, ICW and trangulations. It will not work if the selected match is not already in the database and you will be directed back to the main match page.

My suggested process is aimed at gathering as much segment data as possible for the priority matches identified as triangulated with the RR Siblings, accessed via the filtered Label of 'Matches TG - RRS'.

Firstly, navigate back to the main match page and select 'Filters'. Go to 'Labels' and select the consolidated TG group - 'Matches TG - RRS'.
Navigate up to the ribbon toolbar (next to the search bar) and select the 'Pedigree Thief' icon (you must have already downloaded Pedigree Thief from the Chrome store for this to appear).
You can only read 5 pages of matches at a time, select 5 and Read Matches (no DNA). When the first 5 pages are finished, continue to select the next 5 pages until all are done.
Click the 4 'save boxes', these should produce the following files downloaded to your computer:

MyHeritage Matches for XXX;
MyHeritage Shared DNA for XXX;
MyHeritage ICWs for XXX; and
MyHeritage Triangulations for XXX.

These files can be imported into GDAT using the approved templates. Don't be fooled by the names, "Read Matches (no DNA) means no 'segment' DNA - it includes Total cMs for all matches.

Once this is loaded we can go back to the shared match page:

Navigate to the shared matches of your tester and the key match (for my example, Mum and an RR sibling);
Use the Auto-load page feature again (Genealogy Assistant) and keep scrolling down the list until all the shared matches have loaded and you have one page of matches;
Scroll up to the toolbar and click on the Pedigree Thief icon. As I have 695 in this list I am going to start by selecting 'Read matches (no DNA)' to extract more data into the Pedigree Thief database. The tool collected 695 ICW and Triangulations and says 530 are queued for triangulation. The output is a downloadable ICW file, which can be loaded into GDAT.
Next I want to Read Matches (and DNA) but this is where we need to be conscious of potential system overuse. The system will only review a certain number of these matches, mine stopped at about 100. The tool now says it has collected 695 ICW, 38 Triangulations and that 432 are queued for triangulation. The output now includes a downloadable chromosome data file, which can be loaded into GDAT. The chromosome data is the DNA shared between your tester and the key match (for my example, Mum and RR sibling 1). Using this process on more distant known cousins would provide a much more manageable number, but these matches are critical for my research project.
The ICW file is the same as the one earlier, but the triangulation file is also able to be downloaded and added to GDAT. It contains chromosome data from the 38 matches it has collected already. We can now target any of these matches that have segment data in the target segment areas (ie. Courtney segments).

Next we have two options, continue to complete the remaining 432 matches queued for triangulation or to work with the chromosome data we have collected so far (eg the 38 matches). My preference is to load all the data so I can identify all the matches in the likely segment locations, but MH system constraints may require you to do just one page at a time.

Return to Pedigree Thief to continue gathering the triangulation data - go to the triangulations queued and click 'read'. You may get a message that you have reached your limit for the day. If this is the case stop gathering and review the matches you have so far. Come back later, you don't want to have your subscription suspended. If you are using GDAT you should be able to review the triangulated matches and see who else is ICW on the segment given that file has already been loaded, even though you don't have all their chromosome data yet. Make sure to hit the down arrow and you have the option to click 'ALL' - matches, segments, in common with and triangulation files.
Pedigree Thief also allows you to gather data for just one match. The chromosome data spreadsheet for the 38 matches included 7 who were in the 'Courtney' segment location on chromosome 4. Unfortunately the chromosome data spreadsheet downloaded from Pedigree Thief does not contain match names, just a My Heritage Relative ID. You can search for the match in GDAT using the 'Search Match Key' option and entering the My Heritage Relative ID.
Once you have found the match in GDAT use the hotkey to navigate back to the match at My Heritage. Select the Pedigree Thief icon and extract the data using 'Read Match (and DNA)'. This will produce 4 files to be downloaded - match, chromosome data, ICW and triangulation. We already have the match and shared DNA in the GDAT database so we don't the first two, but upload ICW and triangulations to GDAT.
As you investigate your matches in the priority segment areas in GDAT, make sure to also add a note on the match's page in your consolidated TG label at My Heritage (eg. 'Matches TG - RRS') so that your can monitor your progress on all the triangulated matches.
Continue in this way until all those in the TG label have been reviewed and have segment data in the priority groups are uploaded to GDAT.
Where Visual Phasing or other segment analysis processes have identified the likely ancestral couple for each segment location, review each match in the consolidated TG label at My Heritage (eg. 'Matches TG - RRS') make a note to that effect, then add another label for the known ancestral couple and remove the match from Roberts-Courtney label. For my tree these would be Dye-Laundon (Roberts side) or Paice-Parker (wife of George Courtney). Even if we don't know the name of our MRCA with a match, the aim is to move all 1043 matches over time from the grandparent label (Roberts-Courtney) into the correct ancestral couple label (one of the 4 great grandparent lines).
As I review each TG (ideally removing it from the Robert-Courtney group), I update the label for the match and mark all the TG matches to their new ancestral group at the same time. This reduces rework when reviewing new triangulated matches down the track.
Matches who can't be allocated to a new 'ancestral couple' label should remain in the 'Roberts-Courtney' group. You may wish to create an additional label for those that have been reviewed, but were unable to be allocated to a new label.
If you are reviewing multiple kits, it is a good idea to get all kits to the 'Matches TG - RRS' stage, then use the Genealogical Assistant to download a csv of each kits matches. Combining them into an alphabetical .csv will give you an idea of the total number of matches needing review, plus you can keep track of those already reviewed as you progress further through all kits. Whilst I make notes in My Heritage and GDAT as I go, a paper list can sometimes be easier to manage in terms of checking back when working on subsequent kits.
Continue to review the Roberts-Courtney label for all matches who also have the label 'Matches TG - RRS' (sadly labels at MH do not cater for a sort of matches with both these labels) until all those matches have ben moved to an ancestral group label or the best case scenario - the number of matches in the group is exhausted. When reviewing the matches with this label, sort them using largest segment, so that priority segments are reviewed first ie. a person with a single 35cMs segment is likely to be a better match to review that someone sharing 35cMs over 4 segments.
Repeat the process for the other key testers (ie. the remaining 3 RM siblings) and other associated kits.
Always remember when reviewing shared matches to check if your key match cousins or siblings have larger segments than your main tester (in my case Mum). If they have larger segments make them the key kit for the analysis of that Triangulated Group as their shared matches will be in a larger segment location, ie. there will be a broader pool of triangulated matches to assist in identifying the MRCA for that group.

The missing link

This process does not help get to those 'likely Courtney segment areas' that the RR siblings do not share with the RM siblings or other associated kits. The time poor method of looking at known matches and their ICW's in existing segment locations seems the only way to get there at this time. Some may be buried in the 420, potentially matching on different segments. If anyone can suggest other methods, please let me know.

Hopefully by the time I've finished reviewing all the matches in my TG Group Label we will have downloads back!

Veronica Williams

First published 7 April 2025

Last updated 25 April 2025

ADDENDUM: 16 April 2025 Following discussion with the developer of Genealogy Assistant, a new button has been introduced to 'Load all' pages, this helps enormously. The blog will be updated soon to streamline the affected steps. We are also discussing ways to get another sort to reduce the list to just TG matches, more soon, fingers crossed.

Tuesday, April 8, 2025

Digging into segments - in the absence of reports from My Heritage