Thursday, April 24, 2025

Genealogy Assistant, Pedigree Thief and GDAT - Another My Heritage experiment!


THIS POST IS CURRENTLY 'DRAFT AND UNDER REVIEW'

FEEDBACK IS WELCOMED


The new Genealogy Assistant chrome extension is proving to be a game changer, increasing our productivity in so many ways.

In the absence of downloads from My Heritage I am constantly looking for ways to get at the elusive segment data of my priority matches and into the Genealogical DNA Analysis Tool (GDAT) which is where I like to do my DNA analysis - all in one place.  It makes it much easier to link matches with segment data from sites such as 23andME, FTDNA, Living DNA, My Heritage and GEDmatch, to their AncestryDNA results and to maximise your knowledge of your matches using all the available information.  Not to mention, GDAT is a great repository for storing your historical research analysis.  It enables you to take a break, come back and not forget where you were up to!

This post is the documentation of my experimentation in using the Genealogy Assistant and Pedigree Thief to get my data out of My Heritage and into GDAT.  The post assumes you are starting from scratch.





1. GATHERING MATCHES

To start I'm going to use my aunt's kit for Genealogy Assistant and my uncles for Pedigree Thief to maximise the best use of my time and system resources.  Later I will use both applications on both kits. 


The Genealogy Assistant Method

For my aunt's kit she has 16876 matches.  The Genealogy Assistant allows you to download a csv for all matches, based on the number of pages.  

  • There are 50 matches to a page, so I selected 338 pages.  Unfortunately this was too much and I found the My Heritage application hung.
  • I noticed the counter had changed to a maximum of 1200 so perhaps there is a limit of 24 pages.
  • I next selected 24 pages and clicked download csv.  It took 17 minutes to gather the 24 pages and download the file, with no problems.  Later downloads were much quicker some only 7 minutes, so perhaps the difference in the time taken was associated with network connectivity?
  • As the file is sorted in total cMs I accessed all the highest matches, down to 23.1cMs in the first 24 pages.  I want down to at least 15cMs so I will run more pages, it is a personal choice.  
  • Ideally I want the all matches, but that means running this about 15 times, continuing to move on through the pages of matches.  Depending how many matches you have this might take a few days to work through.
  • If I've collected all the matches however, future downloads can be sorted by 'most recent' to capture just new matches as they come in and keep the database up to date.
  • Through trial and error it appears the maximum number of pages is 25 (at the 50 matches per page the Genealogy Assistant provides).  This is equivalent to 1250 matches.
  • Initially, downloading match lists with the Genealogy Assistant gave me no warning messages from My Heritage which was very comforting.  However by the time I got to page 167 (8,350 matches) I did get a warning message telling me I had viewed too many and to come back the next day.  Next day, I was only able to download 7,900 matches, just a little short of my target.  It took 3 days to achieve the full list, but I was able to do other work on the site during that time.




The csv contains the following information:
  • Match name
  • Estimated relationship
  • Shared cM
  • Shared segments
  • Largest segment
  • Tree size
  • Location (Country of match)
  • Relative/Match ID (GUID)
  • Link to match

By downloading all the matches, we can examine the highest matches, or the largest segments, to ensure that we focus on finding those common ancestors first.  However, as many of us are looking for more distant ancestors, the segment data becomes more important.  Unfortunately we don't have it at this point.

In my opinion, it is best to upload all your matches into the GDAT database as a starting point, but you can just use the spreadsheet.  For GDAT, it is important for the match to exist in the database when later using other tools like Pedigree Thief to gather segment data.  It is particularly important when loading the ICW file as many of these will be more distant matches and if not in the database will be skipped.   They could be the key match to solve your mystery!

If you have already uploaded an old My Heritage download, an alternative could be to sort your match list by 'Most Recent' and.process the pages of matches until you get back to your last upload.

The original .csv download from the Genealogy Assistant did not contain the Relative ID or Link back to the DNA match which was not ideal for productivity working in GDAT after my initial imports.  Special thanks to the developer for having now implementied this suggested improvement.  I've now successfully created an import template for GDAT which imports the data and enables us to toggle back and forth with My Heritage.  


The Pedigree Thief Method

For my uncle's kit he had 16332 matches, very close in size to my aunt's.  Pedigree Thief allows you to download a csv for all matches, with or without DNA (this means segments).  You can select the number of pages, but the maximum is 5.  



  • Because I am using the Genealogy Assistant chrome extension I have 50 matches to a page, but the default for My Heritage is 10.  
  • To get all my Uncles matches (no DNA = by Total cMs) in this way would require 326 downloads using the default at My Heritage but only 66 with the Genealogy Assistant loaded.   A big difference to the 14 pages 14 pages for my aunt's matches.
  • It only took a minute or so to download the 5 pages.
  • The benefit of downloading from Pedigree Thief is that the GDAT template is already part of GDAT 2025r03.
  • Again working from 'Most Recent' could be an alternative.

The csv contains the following information:
  • Test ID (My Heritage key)
  • Test name
  • Relative/Match ID 
  • Match name
  • Sex
  • Admin (contact name)
  • Shared segments
  • Shared cM
  • Largest segment
  • Link to match
  • Link to tree
The .csv file does contain some different information.  However, the only additional item of interest missing from the Genealogy Assistant download is the link direct to the tree.  


Summary

Given the slow process to download all matches using Pedigree Thief, I now prefer the Genealogy Assistant for downloading matches, particularly when doing the first download for a kit.

The only problem with both methods is that we still don't have segment data to be able to drill into segment locations of interest for more advanced DNA research.

However, once you have all the matches in GDAT you can use the Total cMs method to review big matches and toggle back and forth between My Heritage and GDAT.  You can often work out how to allocate a match to the 'ancestral couple' by reviewing shared matches and pedigrees without the segment data, although the 'one by one' approach is very labour intensive.

Pedigree Thief does have the advantage of ensuring the match is in the Pedigree Thief database, when you start downloading segment information from the shared match page.  However, Genealogy Assistant allows for all matches to be loaded much more quickly.  We can later use filters by 'Label' to add specific segment data using Pedigree Thief, minimising the strain on the My Heritage system.



2. IDENTIFYING INDIVIDUAL SEGMENTS

Unfortunately there is no current feature in the Genealogy Assistant to provide specific segment location data.

My Heritage does provide segment information if you open the DNA match and scroll down to the Chromosome Browser - Shared DNA segments.  You will be presented with an image showing the shared segments.  If you click on the word 'Advanced Options' at the top right, you can download the shared segments between the tester and the match.  This can be useful but no bulk analysis is possible and would be very time consuming if doing matches one at a time.

The only exception I have found to this is with a group of 'triangulated matches'.  When comparing the 3 individuals in the my Heritage Triangulation Tool, the same type of file generates.  After selecting 'Advanced Options' at the top right, you can download the relevant segment data.  This version identifies the segment that all matches share plus the individual segments the two matches share with the tester.





These csv's contains the following information:
  • Tester name
  • Match name
  • Chromosome
  • Start Location
  • End Location
  • Start RSID
  • End RSID
  • cMs
  • SNPs
There is no current template to incorporate this file into GDAT.  Previously segment data was imported via the My Heritage match and segment data reports that have now been suspended for some time.  Having no relative or match ID's in the csv is problematic and would require manual updates to import them 'one by one'.


Pedigree Thief - Download via Match Page
  • You can download via the match page view using Pedigree Thief by selecting Read Matches (and DNA).  Selecting only one page to view it only took a few minutes.  As I had the Genealogy Assistant loaded this meant 50 matches.
  • This creates 2 files, a match file and a segment file.
  • The match file is identical to the earlier Pedigree Thief matches file.
  • The segment file contains much the same data as the My Heritage file, but Pedigree Thief provides the segment data for all matches on the page (ie 50), not just one.
  • The segment data download from Pedigree Thief can then be imported into GDAT using the existing template in GDAT 2025r03.
  • I then selected 5 pages, which is the maximum number of pages allowed to see how it performed. I soon reached the warning that I'd reached my maximum 'daily usage'.  It did however still generate an additional 183 segments that I could load into GDAT.
  • This method is useful but it will take many days to download all segment data, as you are constantly pushing the boundaries at My Heritage.  Selecting all matches in the list in no particular order does not help to identify priority matches until after the segment data is loaded and analysed so depending on your research goals, most of these will not be ones you want to pursue.





Summary

Whilst it is a slow process to download segment data from the match page, Pedigree Thief is a preferred option at this time. 

The only problem for importing into GDAT for this method is that it is difficult to isolate 'like' matches using match page filters to focus on a particular ancestral line.  Refer to my previous blogpost Digging into Segments for how I am using the Genealogy Assistant and 'Labels' for this purpose.

Don't push the system when you get these messages, you don't want to be suspended from using the site.  Stop what you are doing and work with the data you have generated so far.



2. IDENTIFYING ICW MATCHES 

Both the Genealogy Assistant and Pedigree Thief have options to gather ICW (in common with or shared matches) with specific matches of interest.  Downloading the shared match spreadsheet using either method is great for working on shared matches via the spreadsheet method.

From the GDAT point of view both Genealogy Assistant and Pedigree Thief assume the data for the primary match is already loaded and the primary match is not included in the csv.  The other thing to remember is that both lists include all shared matches, not just those with shared segments, so it can be full of false positive matches, not just the priority ones we need for our research.  Refer to my previous blogpost Digging into Segments for how I am using the Genealogy Assistant and 'Labels' for this purpose.


The csv for the Genealogy Assistant contains the following information:
  • Match (ICW - secondary matches shared by the tester and the primary match)
  • Estimated relationship (Tester and secondary match)
  • cMs (Tester and secondary match)
  • ICW (also matches = Primary match)
  • Estimated relationship (Primary and secondary match)
  • cMs (Primary and secondary match)
  • Relative ID (Secondary match)
  • Link to DNA match page
The csv for Pedigree Thief contains the following information:
  • Match ID (Primary)
  • Match Name (Primary)
  • Relative ID (Secondary)
  • Relative Name (Secondary)
  • cMs shared (Primary and secondary match)
Importing these into GDAT database requires the match and each of the shared matches to be already in the GDAT database.  If the match cannot be found during the import, then data will not be imported and shared segment matches will not appear in the F4 screen as ICW matches.



The Genealogy Assistant Method

  • Downloading the shared match spreadsheet from the Genealogy Assistant is great for working on shared matches via the spreadsheet method, however it is not suitable for immediate import into GDAT;  The most critical match that needs to be in the database is of course the primary match;  
  • If the primary match is not already in the GDAT database one option could be to download the match and shared DNA via Pedigree Thief first.  You will need to search for them on the main match page, then use Pedigree Thief to download the match 'Read Match (and DNA).  Then return to the shared match page by clicking on 'review DNA match' and download the match file;
  • However a better option might be to add a row into the downloaded csv from Genealogy Assistant for the primary match;  Remember if using a mac to export your .numbers file back to a .csv after editing before importing to GDAT;
  • The .csv has the Relative ID for the secondary matches, but does not include a relative ID for the primary match.  Again amending the .csv to include a column for this appears to be the best solution.  Add the Relative ID of the key match as Column 5 and then the we can then import the csv file into GDAT,  I created a template called 'Genealogy Assistant: My Heritage Shared Match List - ICW template.  Again, remember if using a mac to export your .numbers file back to a .csv after editing before importing to GDAT; 
  • The ICW matches will only appear in GDAT if shared segment data at the chromosome level already exists in the database. It is a laborious process to download data for each match without segment data 'one by one' if they are not.
  • Given the Genealogy Assistant does not gather segment data at the chromosome level it is not ideal for import to GDAT as we really want to identify those matches who are ICW on a segment, not just a shared match.


The Pedigree Thief Method

When we are on the Shared Match page and want to import these matches Pedigree Thief assumes you have already downloaded the primary match and their shared DNA into GDAT.
  • Navigate up to the ribbon toolbar (next to the search bar) and select the 'Pedigree Thief' icon (you must have already downloaded Pedigree Thief from the Chrome store for this to appear).
  • If you uploaded your matches in Section 1 via the Genealogy Assistant, most of your matches won't be in the Pedigree Thief database unless you have extracted other matches. When you try to engage Pedigree Thief you will be directed back to the main match page to gather the data for the 'key shared match'.
  • Return to the main match page and search for your key match, read the match with 'no DNA' and then import it into GDAT using the approved template
  • Return to the shared match page for the Primary match - 'Read match (with DNA)'. You can only read 5 pages of matches at a time, select 5 and Read Matches (no DNA). When the first 5 pages are finished, continue to select the next 5 pages until all are done. Beware of systems issues and do not continue if you get warning messages.
  • Click each of the following 3 'save boxes', these should produce the following files downloaded to your computer. The first file will not be needed as you have already added the primary match to GDAT.
    • MyHeritage Matches for XXX;
    • MyHeritage Chromosome Data for XXX; and
    • MyHeritage ICWs for XXX
The ICW file is a full list of all shared matches.  These matches may or may not share on the same segment.  When loaded into GDAT they should be viewable on the F4 page, under 'Display''Show all ICW matches'.

To gather data for specific groups of matches refer to my previous blogpost Digging into Segments for how I am using the Genealogy Assistant and 'Labels' for this purpose.

Ideally if wanting to fully explore a shared match it is best to search them on the main match page and 'Read match (with DNA)' for just that match, this way you will get all the segment data for all shared matches.


3. IDENTIFYING TRIANGULATED SEGMENTS 

Our priority matches for review will be those that triangulated as these are most likely to share a Most Recent Common Ancestor.  When running the 'Read match (with DNA)' in the previous step, a fourth file will generate if there are also triangulated matches.  This can be loaded to GDAT using the existing approved templates.

  • MyHeritage Triangulations for XXX.

When loaded into GDAT these should be viewable on the F4 page under 'Display' - 'Show Triangulations' or 'Show ICW Relatives on the same segment'.

Remember to view all your GDAT segment data in this step (not just My Heritage matches) for more clues from other testing sites.


4. MODIFICATIONS TO IMPROVE THE PROCESS

  • We want all matches in the GDAT database, so that any ICW or shared matches including triangulated segment data can be imported more easily.  The newly updated shared match .csv provided by the Genealogical Assistant to import all matches at the outset helps achieve this.
  • If wanting to just examine a match and their shared matches and you have not imported all matches into the GDAT database a hybrid method can be used by importing 'relatives' identified in the Genealogy Assistant shared match data.  I have created a modified GDAT template to do this.   It will only work on the shared matches with the primary match, not the shared matches of matches.
  • Because GDAT works on segment data the most productive way to select matches for gathering is to identify the subset of triangulated matches in a shared match list.  At present, this requires the manual process of selecting matches with the triangulated symbol and adding them to a 'Label'.  I have suggested an improvement to the Genealogy Assistant to reduce the main list to just those with the TG symbol.  This is currently under consideration - watch this space.
  • A link to the pedigree chart in the .csv generated by the Genealogy Assistant from the main match page would also be useful. This will be submitted as a suggested improvement.


5. MY CURRENT PREFERRED PROCESS

  1. Download all matches using Genealogy Assistant and upload to GDAT.  Once complete, sort the list into 'Most Recent' and make a note of the last match imported;
  2. Update match lists using 'Most Recent' filter over time.  Refer to your note from the last import and make sure you regularly gather all new matches back to that person to keep your GDAT database up to date; 
  3. Review larger matches in GDAT using the 'Total cMs' approach.  Toggle back and forth to My Heritage viewing shared matches and trees to identify likely shared ancestral couples, update likely side and family group in GDAT and Labels in My Heritage;
  4. Identify priority matches of interest by using the Genealogical Assistant and the Label method - refer to Digging into Segments;
  5. Review your priority match list at My Heritage utilising shared matches and pedigree information to further refine your label group;
  6. Download your priority matches from your priority 'Label Group' using Pedigree Thief '- Read matches (no DNA)', accessed from the main match page using filters.  Load into GDAT - most should already be there.  This step is purely to get all your key matches into the Pedigree Thief database before the next step of gathering segment data.  If your Label Group is large this is particularly important as you won't be able to get all your data at once due to My Heritage system limits;
  7. Repeat the previous step for your priority matches from a 'Label Group' using Pedigree Thief ' - but this time select 'Read matches (with DNA)' - a maximum of 5 pages can be selected.  The smaller your label subset of matches, the better this will work.
  8. Download the 4 files generated by Pedigree Thief and upload them to GDAT.
  9. Analyse via chromosome segment data in GDAT.  Compare segments with other matches in the segment location at other DNA sites, including any clusters already identified at AncestryDNA with known matches for more clues.
  10. Double check that all triangulated segment data has been imported - Toggle back to My Heritage and run more Pedigree Thief segment downloads if not all triangulations were completed and upload to GDAT;
  11. Update 'ancestral couple' labels at My Heritage. where appropriate.


Veronica Williams

First published 24 April 2025

All posts relating to Pedigree Thief

All posts relating to Genealogy Assistant

GDAT templates to be incorporated after further testing




Tuesday, April 8, 2025

Digging into segments - in the absence of reports from My Heritage

I have long been a fan of segment analysis to find your common ancestor.  The last couple of years have been particularly frustrating since DNA companies have turned off their downloads for segment data.  

Having downloaded most of my matches regularly to the Genealogical DNA Database Tool (GDAT) this issue was manageable in what I thought would be the 'short term' as I had plenty of matches to explore and could easily look at isolated segment data on some sites.  AncestryDNA's release of Enhanced Shared Matching late last year also kept my interest up for a time and is proving to be invaluable, but I find myself exploring and building family groups for matches that are not my priority.  Was this the best use of my time in finding my nemesis ancestor, my 2nd great grandfather George Courtney!

Using Visual Phasing techniques I have identified all the likely segment areas that have come from the 'Courtney line'.  Two sets of siblings, my mother and three of her siblings (RM siblings), plus another four siblings, children of Mums paternal first cousin (RR siblings).  How can I interrogate new matches in these segment areas without downloadable lists?  

4 RM Siblings: Segments by Grandparent on Chromosome 4  

There are bound to be new matches in these locations, but how do I find them?  Continually visiting known matches in each segment location for the 10 kits relevant to the search was a long winded and laborious exercise, always hoping for a miracle.  The relevant matches are not likely to be 'big' unknown matches, the key ones are more likely to be buried in the smaller matches, much harder to find.

Whilst GEDmatch and FTDNA have segment reports, there are few new matches and few trees.  The My Heritage site was my best bet, providing segments, triangulation tools and many trees.  Using tools like Pedigree Thief it is possible to obtain match lists providing Total cMs, this has helped find new larger matches, but it is difficult to identify specific ones that might be paternal matches of interest for me.  I had to work through each one by one, extract the segment data and then compare to my Visual Phasing output.  On most occasions it turned out the match was not in the segment area of interest.  I tried to consider how this could be a more productive process.

I have never been a big user of the label (dots) system at My Heritage given there are so many false positives appearing in the shared match list.  This being mainly due to the small match threshold of 6cMs and at times imputation.  At AncestryDNA many of these old population segments have been eliminated by their Timber algorithm.  At My Heritage, I always prefer to add matches to an 'ancestral couple' label ONLY when there is a triangulated segment.  This way, poor label allocation doesn't lead to new matches being incorrectly labelled in the future.  It was time to revisit my process.

I started to consider how Pedigree Thief, the new fabulous Genealogy Assistant (chrome extension) and Labels at My Heritage could help me to better identify and prioritise key matches for segment analysis.


Starting with Labels

My current labels were already organised by ancestral couples.  For my Courtney project, I have 10 primary kits (plus 4 descendants) at My Heritage that need analysing in a priority order, shared segments between kits, then each kit needs examining for segments that were only inherited by them.  The Courtney line is on Mum's paternal side, so I started with her matches and the 4 sibling kits of the children of her paternal first cousin (RRS) who share the Roberts-Courtney line.  I decided to create labels for each of the RR siblings and then another label for matches who triangulate with one or all of them, as these matches would be my priority.  As we are talking large numbers of shared matches, I recommend loading the Genealogy Assistant.  These cousin siblings each had between 551-772 shared matches with my Mum.  As both sets of Roberts siblings have mostly Irish maternal sides, there are a lot of shared matches that are old population segments and false positives that need to be 'weeded out' to be left with only valid Roberts-Courtney matches.

  1. Navigate to the shared match page for the tester (in my case Mum) and the key match (RRS, sibling 1);
  2. Go to the bottom of the shared match page and use the field shown for auto-load pages and put in an estimate of the number of pages needed (Genealogy Assistant must be loaded).
  3. Navigate back up to the top of the list and slowly scroll through to the end of the match list. You will know when you are at the end when the 'Show more DNA Matches' disappears.
  4. Once again, navigate up to the top of the list.  Click on the 'select all matches' button (Genealogy Assistant).  Before choosing a label watch the 'Manage Labels' count and let it get to the maximum number of matches selected.
  5. Select an existing label or 'Create a new label'.  Click Apply'.  If you have large numbers of matches, then you may get a spinning wheel for a while and the page appears to have hung.  Wait a decent length of time for it to finish.  I have found that if you open the shared match page in another window and check if the last person has been labelled, it has usually completed the process.
  6. While all the matches are selected we are next going to identify all the triangulated matches to add to our combined 'Matches TG - RRS' label.  If your results are like mine there are more triangulated matches than un-triangulated matches.  If so, remove the tick from everyone who does not have the triangulated group symbol.  Otherwise, do the reverse.  If you wait until later to do this part, when you 'select all matches' again, make sure to wait and watch the 'Manage Labels' count before selecting the new label.
  7. Add the reduced number of matches to the 'Matches TG - RRS' label.
  8. Repeat the process for other key matches (in this example the next 3 RR siblings), adding triangulated matches to the same 'Matches TG - RRS' label.


My mother had 16,566 matches at My Heritage when undertaking this exercise.  After this process, the total number of shared matches for the RR siblings was 1463 and of those, 1043 were triangulated.  

As the 1043 matches were all triangulating with Mum and at least one RR sibling we know each of them must have inherited the segment from the same common ancestor.  As a result, we can be confident that the segment shared between the two sets of siblings (RM and RR), was inherited from their shared MRCA being the 'Roberts-Courtney' couple - Edward Roberts (Baker/Dye) and Abigail Courtney.  How each of the 1043 matches are related is yet to be determined, but it will be back on one of either the Roberts OR Courtney lines.  These 1043 can now also be labelled to the 'Roberts-Courtney' ancestral group (NOTE:  The Roberts-Courtney group is an existing label for the 'ancestral couple'.  It will contain other matches already identified through DNA analysis and may be larger than the newly created  'Matches TG - RRS' label). 

  1. Navigate back to the main match page and select 'Filters'.  Go to 'Labels' and select the consolidated TG group - 'Matches TG - RRS'. 
  2. With the Genealogy Assistant 50 matches will appear on each page in this view.  With 1043 matches we have 21 pages to process.  Starting on page 1 navigate to the top of the match list to 'select all matches'. Then choose the appropriate label (in this case  'Roberts-Courtney'), select 'Apply'.
  3. Repeat the process with the remaining pages.  Firstly, navigate to the bottom of the list to select the next page number, then navigate to the top to 'select all matches', 'select label', 'apply'.  
If you would like a spreadsheet generated as a list of these matches, enter the total number of pages and click the 'Download CSV' button.  You must have the Genealogy Assistant loaded for this option to appear.  You will need to wait for each page to populate and process all the pages before the CSV will be ready to download to your computer.

The spreadsheet contains the match name, estimated relationship, shared Total cMs, number of shared segments, largest cMs segment, tree size and the location of the match.  Matches are listed in Total cMs order, but can also be sorted by largest cMs segment to identify priority matches for review.  When reviewing the filtered Label of 'Matches TG - RRS', you can select the 'Sort By - largest segment' to achieve the same result.  

Whilst at this point we have identified 1043 priority matches, we have no way of telling which of them have shared segments in the Courtney grandparent location (Figure: Chromosome 4).

My aim is to isolate all the George Courtney segments.  The next challenge will be to analyse all matches in the 'Matches TG - RRS' group and split them into one of the 4 paternal great grandparent lines: Roberts PGGF (Dye), Laundon PGGM (wife of Edward Roberts), Paice PGGM (wife of George) and finally our mystery PGGF - Abigails father  'George Courtney'!  

The remaining 420 shared matches who do not triangulate with our RR siblings will need additional analysis further down the track.  They may share additional segments not inherited by the RM siblings or they may be false positives or old population segments. At this point we are only prioritising the 1043 triangulated matches as we know they are valid segments and should be reviewed first.


Getting segment data via Pedigree Thief

In the absence of downloads from My Heritage, Pedigree Thief is the only tool I know that can help extract specific segment data at the moment.  It can prove to be difficult at times, as the site imposes limits and sometimes suggests we are doing too much, what it calls 'scraping'.  This can sometimes lead to being temporarily barred from using the site, so we need to minimise each query as much as possible.

We can obtain segment data from My Heritage using Pedigree Thief in two ways:

  1. From the main match page, where options are 'with DNA' or 'without DNA'.  I would recommended extracting data here first using 'no DNA' to maximise the number of matches you are adding to the Pedigree Thief database.
  2. From the shared match list, you can extract shared matches, ICW and trangulations.  It will not work if the selected match is not already in the database and you will be directed back to the main match page.
My suggested process is aimed at gathering as much segment data as possible for the priority matches identified as triangulated with the RR Siblings, accessed via the filtered Label of 'Matches TG - RRS'.
  1. Firstly, navigate back to the main match page and select 'Filters'.  Go to 'Labels' and select the consolidated TG group - 'Matches TG - RRS'. 
  2. Navigate up to the ribbon toolbar (next to the search bar) and select the 'Pedigree Thief' icon (you must have already downloaded Pedigree Thief from the Chrome store for this to appear).
  3. You can only read 5 pages of matches at a time, select 5 and Read Matches (no DNA).  When the first 5 pages are finished, continue to select the next 5 pages until all are done.
  4. Click the 4 'save boxes', these should produce the following files downloaded to your computer:
    • MyHeritage Matches for XXX;
    • MyHeritage Shared DNA for XXX;
    • MyHeritage ICWs for XXX; and 
    • MyHeritage Triangulations for XXX.



These files can be imported into GDAT using the approved templates.  Don't be fooled by the names,  "Read Matches (no DNA) means no 'segment' DNA - it includes Total cMs for all matches.

Once this is loaded we can go back to the shared match page:

  1. Navigate to the shared matches of your tester and the key match (for my example, Mum and an RR sibling);
  2. Use the Auto-load page feature again (Genealogy Assistant) and keep scrolling down the list until all the shared matches have loaded and you have one page of matches;
  3. Scroll up to the toolbar and click on the Pedigree Thief icon.  As I have 695 in this list I am going to start by selecting 'Read matches (no DNA)' to extract more data into the Pedigree Thief database.  The tool collected 695 ICW and Triangulations and says 530 are queued for triangulation.  The output is a downloadable ICW file, which can be loaded into GDAT.
  4. Next I want to Read Matches (and DNA) but this is where we need to be conscious of potential system overuse.  The system will only review a certain number of these matches, mine stopped at about 100.  The tool now says it has collected 695 ICW, 38 Triangulations and that 432 are queued for triangulation.  The output now includes a downloadable chromosome data file, which can be loaded into GDAT.  The chromosome data is the DNA shared between your tester and the key match (for my example, Mum and RR sibling 1).  Using this process on more distant known cousins would provide a much more manageable number, but these matches are critical for my research project.
  5. The ICW file is the same as the one earlier, but the triangulation file is also able to be downloaded and added to GDAT.  It contains chromosome data from the 38 matches it has collected already.  We can now target any of these matches that have segment data in the target segment areas (ie. Courtney segments).
Next we have two options, continue to complete the remaining 432 matches queued for triangulation or to work with the chromosome data we have collected so far (eg the 38 matches).  My preference is to load all the data so I can identify all the matches in the likely segment locations, but MH system constraints may require you to do just one page at a time. 
  1. Return to Pedigree Thief to continue gathering the triangulation data - go to the triangulations queued and click 'read'.  You may get a message that you have reached your limit for the day.  If this is the case stop gathering and review the matches you have so far.  Come back later, you don't want to have your subscription suspended.  If you are using GDAT you should be able to review the triangulated matches and see who else is ICW on the segment given that file has already been loaded, even though you don't have all their chromosome data yet.  Make sure to hit the down arrow and you have the option to click 'ALL' - matches, segments, in common with and triangulation files.
  2. Pedigree Thief also allows you to gather data for just one match.  The chromosome data spreadsheet for the 38 matches included 7 who were in the 'Courtney' segment location on chromosome 4.  Unfortunately the chromosome data spreadsheet downloaded from Pedigree Thief does not contain match names, just a My Heritage Relative ID.  You can search for the match in GDAT using the 'Search Match Key' option and entering the My Heritage Relative ID.  
  3. Once you have found the match in GDAT use the hotkey to navigate back to the match at My Heritage.  Select the Pedigree Thief icon and extract the data using 'Read Match (and DNA)'.  This will produce 4 files to be downloaded - match, chromosome data, ICW and triangulation.  We already have the match and shared DNA in the GDAT database so we don't the first two, but upload ICW and triangulations to GDAT.
  4. As you investigate your matches in the priority segment areas in GDAT, make sure to also add a note on the match's page in your consolidated TG label at My Heritage (eg. 'Matches TG - RRS') so that your can monitor your progress on all the triangulated matches.  
  5. Continue in this way until all those in the TG label have been reviewed and have segment data in the priority groups are uploaded to GDAT.  
  6. Where Visual Phasing or other segment analysis processes have identified the likely ancestral couple for each segment location, review each match in the consolidated TG label at My Heritage (eg. 'Matches TG - RRS') make a note to that effect, then add another label for the known ancestral couple and remove the match from Roberts-Courtney label.  For my tree these would be Dye-Laundon (Roberts side) or Paice-Parker (wife of George Courtney).  Even if we don't know the name of our MRCA with a match, the aim is to move all 1043 matches over time from the grandparent label (Roberts-Courtney) into the correct ancestral couple label (one of the 4 great grandparent lines).
  7. As I review each TG (ideally removing it from the Robert-Courtney group), I update the label for the match and mark all the TG matches to their new ancestral group at the same time.  This reduces rework when reviewing new triangulated matches down the track.  
  8. Matches who can't be allocated to a new 'ancestral couple' label should remain in the 'Roberts-Courtney' group.  You may wish to create an additional label for those that have been reviewed, but were unable to be allocated to a new label.
  9. If you are reviewing multiple kits, it is a good idea to get all kits to the 'Matches TG - RRS' stage, then use the Genealogical Assistant to download a csv of each kits matches.  Combining them into an alphabetical .csv will give you an idea of the total number of matches needing review, plus you can keep track of those already reviewed as you progress further through all kits. Whilst I make notes in My Heritage and GDAT as I go, a paper list can sometimes be easier to manage in terms of checking back when working on subsequent kits.
  10. Continue to review the Roberts-Courtney label for all matches who also have the label 'Matches TG - RRS' (sadly labels at MH do not cater for a sort of matches with both these labels) until all those matches have ben moved to an ancestral group label or the best case scenario - the number of matches in the group is exhausted. When reviewing the matches with this label, sort them using largest segment, so that priority segments are reviewed first ie.  a person with a single 35cMs segment is likely to be a better match to review that someone sharing 35cMs over 4 segments.
  11. Repeat the process for the other key testers (ie. the remaining 3 RM siblings) and other associated kits.
  12. Always remember when reviewing shared matches to check if your key match cousins or siblings have larger segments than your main tester (in my case Mum).  If they have larger segments make them the key kit for the analysis of that Triangulated Group as their shared matches will be in a larger segment location, ie. there will be a broader pool of triangulated matches to assist in identifying the MRCA for that group.


The missing link

This process does not help get to those 'likely Courtney segment areas' that the RR siblings do not share with the RM siblings or other associated kits.  The time poor method of looking at known matches and their ICW's in existing segment locations seems the only way to get there at this time.  Some may be buried in the 420, potentially matching on different segments.  If anyone can suggest other methods, please let me know.

Hopefully by the time I've finished reviewing all the matches in my TG Group Label we will have downloads back!


Veronica Williams

First published 7 April 2025
Last updated 25 April 2025



ADDENDUM: 16 April 2025  Following discussion with the developer of Genealogy Assistant, a new button has been introduced to 'Load all' pages, this helps enormously. The blog will be updated soon to streamline the affected steps.  We are also discussing ways to get another sort to reduce the list to just TG matches, more soon, fingers crossed.