THIS POST IS CURRENTLY 'DRAFT AND UNDER REVIEW'
FEEDBACK IS WELCOMED
The new Genealogy Assistant chrome extension is proving to be a game changer, increasing our productivity in so many ways.
In the absence of downloads from My Heritage I am constantly looking for ways to get at the elusive segment data of my priority matches and into the Genealogical DNA Analysis Tool (GDAT) which is where I like to do my DNA analysis - all in one place. It makes it much easier to link matches with segment data from sites such as 23andME, FTDNA, Living DNA, My Heritage and GEDmatch, to their AncestryDNA results and to maximise your knowledge of your matches using all the available information. Not to mention, GDAT is a great repository for storing your historical research analysis. It enables you to take a break, come back and not forget where you were up to!
This post is the documentation of my experimentation in using the Genealogy Assistant and Pedigree Thief to get my data out of My Heritage and into GDAT. The post assumes you are starting from scratch.
1. GATHERING MATCHES
To start I'm going to use my aunt's kit for Genealogy Assistant and my uncles for Pedigree Thief to maximise the best use of my time and system resources. Later I will use both applications on both kits.
The Genealogy Assistant Method
For my aunt's kit she has 16876 matches. The Genealogy Assistant allows you to download a csv for all matches, based on the number of pages.
- There are 50 matches to a page, so I selected 338 pages. Unfortunately this was too much and I found the My Heritage application hung.
- I noticed the counter had changed to a maximum of 1200 so perhaps there is a limit of 24 pages.
- I next selected 24 pages and clicked download csv. It took 17 minutes to gather the 24 pages and download the file, with no problems. Later downloads were much quicker some only 7 minutes, so perhaps the difference in the time taken was associated with network connectivity?
- As the file is sorted in total cMs I accessed all the highest matches, down to 23.1cMs in the first 24 pages. I want down to at least 15cMs so I will run more pages, it is a personal choice.
- Ideally I want the all matches, but that means running this about 15 times, continuing to move on through the pages of matches. Depending how many matches you have this might take a few days to work through.
- If I've collected all the matches however, future downloads can be sorted by 'most recent' to capture just new matches as they come in and keep the database up to date.
- Through trial and error it appears the maximum number of pages is 25 (at the 50 matches per page the Genealogy Assistant provides). This is equivalent to 1250 matches.
- Initially, downloading match lists with the Genealogy Assistant gave me no warning messages from My Heritage which was very comforting. However by the time I got to page 167 (8,350 matches) I did get a warning message telling me I had viewed too many and to come back the next day. Next day, I was only able to download 7,900 matches, just a little short of my target. It took 3 days to achieve the full list, but I was able to do other work on the site during that time.
- Match name
- Estimated relationship
- Shared cM
- Shared segments
- Largest segment
- Tree size
- Location (Country of match)
- Relative/Match ID (GUID)
- Link to match
By downloading all the matches, we can examine the highest matches, or the largest segments, to ensure that we focus on finding those common ancestors first. However, as many of us are looking for more distant ancestors, the segment data becomes more important. Unfortunately we don't have it at this point.
In my opinion, it is best to upload all your matches into the GDAT database as a starting point, but you can just use the spreadsheet. For GDAT, it is important for the match to exist in the database when later using other tools like Pedigree Thief to gather segment data. It is particularly important when loading the ICW file as many of these will be more distant matches and if not in the database will be skipped. They could be the key match to solve your mystery!
If you have already uploaded an old My Heritage download, an alternative could be to sort your match list by 'Most Recent' and.process the pages of matches until you get back to your last upload.
The original .csv download from the Genealogy Assistant did not contain the Relative ID or Link back to the DNA match which was not ideal for productivity working in GDAT after my initial imports. Special thanks to the developer for having now implementied this suggested improvement. I've now successfully created an import template for GDAT which imports the data and enables us to toggle back and forth with My Heritage.
The Pedigree Thief Method
For my uncle's kit he had 16332 matches, very close in size to my aunt's. Pedigree Thief allows you to download a csv for all matches, with or without DNA (this means segments). You can select the number of pages, but the maximum is 5.
- Because I am using the Genealogy Assistant chrome extension I have 50 matches to a page, but the default for My Heritage is 10.
- To get all my Uncles matches (no DNA = by Total cMs) in this way would require 326 downloads using the default at My Heritage but only 66 with the Genealogy Assistant loaded. A big difference to the 14 pages 14 pages for my aunt's matches.
- It only took a minute or so to download the 5 pages.
- The benefit of downloading from Pedigree Thief is that the GDAT template is already part of GDAT 2025r03.
- Again working from 'Most Recent' could be an alternative.
- Test ID (My Heritage key)
- Test name
- Relative/Match ID
- Match name
- Sex
- Admin (contact name)
- Shared segments
- Shared cM
- Largest segment
- Link to match
- Link to tree
- Tester name
- Match name
- Chromosome
- Start Location
- End Location
- Start RSID
- End RSID
- cMs
- SNPs
- You can download via the match page view using Pedigree Thief by selecting Read Matches (and DNA). Selecting only one page to view it only took a few minutes. As I had the Genealogy Assistant loaded this meant 50 matches.
- This creates 2 files, a match file and a segment file.
- The match file is identical to the earlier Pedigree Thief matches file.
- The segment file contains much the same data as the My Heritage file, but Pedigree Thief provides the segment data for all matches on the page (ie 50), not just one.
- The segment data download from Pedigree Thief can then be imported into GDAT using the existing template in GDAT 2025r03.
- I then selected 5 pages, which is the maximum number of pages allowed to see how it performed. I soon reached the warning that I'd reached my maximum 'daily usage'. It did however still generate an additional 183 segments that I could load into GDAT.
- This method is useful but it will take many days to download all segment data, as you are constantly pushing the boundaries at My Heritage. Selecting all matches in the list in no particular order does not help to identify priority matches until after the segment data is loaded and analysed so depending on your research goals, most of these will not be ones you want to pursue.
- Match (ICW - secondary matches shared by the tester and the primary match)
- Estimated relationship (Tester and secondary match)
- cMs (Tester and secondary match)
- ICW (also matches = Primary match)
- Estimated relationship (Primary and secondary match)
- cMs (Primary and secondary match)
- Relative ID (Secondary match)
- Link to DNA match page
- Match ID (Primary)
- Match Name (Primary)
- Relative ID (Secondary)
- Relative Name (Secondary)
- cMs shared (Primary and secondary match)
- Downloading the shared match spreadsheet from the Genealogy Assistant is great for working on shared matches via the spreadsheet method, however it is not suitable for immediate import into GDAT; The most critical match that needs to be in the database is of course the primary match;
- If the primary match is not already in the GDAT database one option could be to download the match and shared DNA via Pedigree Thief first. You will need to search for them on the main match page, then use Pedigree Thief to download the match 'Read Match (and DNA). Then return to the shared match page by clicking on 'review DNA match' and download the match file;
- However a better option might be to add a row into the downloaded csv from Genealogy Assistant for the primary match; Remember if using a mac to export your .numbers file back to a .csv after editing before importing to GDAT;
- The .csv has the Relative ID for the secondary matches, but does not include a relative ID for the primary match. Again amending the .csv to include a column for this appears to be the best solution. Add the Relative ID of the key match as Column 5 and then the we can then import the csv file into GDAT, I created a template called 'Genealogy Assistant: My Heritage Shared Match List - ICW template. Again, remember if using a mac to export your .numbers file back to a .csv after editing before importing to GDAT;
- The ICW matches will only appear in GDAT if shared segment data at the chromosome level already exists in the database. It is a laborious process to download data for each match without segment data 'one by one' if they are not.
- Given the Genealogy Assistant does not gather segment data at the chromosome level it is not ideal for import to GDAT as we really want to identify those matches who are ICW on a segment, not just a shared match.
The Pedigree Thief Method
- Navigate up to the ribbon toolbar (next to the search bar) and select the 'Pedigree Thief' icon (you must have already downloaded Pedigree Thief from the Chrome store for this to appear).
- If you uploaded your matches in Section 1 via the Genealogy Assistant, most of your matches won't be in the Pedigree Thief database unless you have extracted other matches. When you try to engage Pedigree Thief you will be directed back to the main match page to gather the data for the 'key shared match'.
- Return to the main match page and search for your key match, read the match with 'no DNA' and then import it into GDAT using the approved template
- Return to the shared match page for the Primary match - 'Read match (with DNA)'. You can only read 5 pages of matches at a time, select 5 and Read Matches (no DNA). When the first 5 pages are finished, continue to select the next 5 pages until all are done. Beware of systems issues and do not continue if you get warning messages.
- Click each of the following 3 'save boxes', these should produce the following files downloaded to your computer. The first file will not be needed as you have already added the primary match to GDAT.
- MyHeritage Matches for XXX;
- MyHeritage Chromosome Data for XXX; and
- MyHeritage ICWs for XXX
3. IDENTIFYING TRIANGULATED SEGMENTS
Our priority matches for review will be those that triangulated as these are most likely to share a Most Recent Common Ancestor. When running the 'Read match (with DNA)' in the previous step, a fourth file will generate if there are also triangulated matches. This can be loaded to GDAT using the existing approved templates.
- MyHeritage Triangulations for XXX.
When loaded into GDAT these should be viewable on the F4 page under 'Display' - 'Show Triangulations' or 'Show ICW Relatives on the same segment'.
Remember to view all your GDAT segment data in this step (not just My Heritage matches) for more clues from other testing sites.
4. MODIFICATIONS TO IMPROVE THE PROCESS
- We want all matches in the GDAT database, so that any ICW or shared matches including triangulated segment data can be imported more easily. The newly updated shared match .csv provided by the Genealogical Assistant to import all matches at the outset helps achieve this.
- If wanting to just examine a match and their shared matches and you have not imported all matches into the GDAT database a hybrid method can be used by importing 'relatives' identified in the Genealogy Assistant shared match data. I have created a modified GDAT template to do this. It will only work on the shared matches with the primary match, not the shared matches of matches.
- Because GDAT works on segment data the most productive way to select matches for gathering is to identify the subset of triangulated matches in a shared match list. At present, this requires the manual process of selecting matches with the triangulated symbol and adding them to a 'Label'. I have suggested an improvement to the Genealogy Assistant to reduce the main list to just those with the TG symbol. This is currently under consideration - watch this space.
- A link to the pedigree chart in the .csv generated by the Genealogy Assistant from the main match page would also be useful. This will be submitted as a suggested improvement.
5. MY CURRENT PREFERRED PROCESS
- Download all matches using Genealogy Assistant and upload to GDAT. Once complete, sort the list into 'Most Recent' and make a note of the last match imported;
- Update match lists using 'Most Recent' filter over time. Refer to your note from the last import and make sure you regularly gather all new matches back to that person to keep your GDAT database up to date;
- Review larger matches in GDAT using the 'Total cMs' approach. Toggle back and forth to My Heritage viewing shared matches and trees to identify likely shared ancestral couples, update likely side and family group in GDAT and Labels in My Heritage;
- Identify priority matches of interest by using the Genealogical Assistant and the Label method - refer to Digging into Segments;
- Review your priority match list at My Heritage utilising shared matches and pedigree information to further refine your label group;
- Download your priority matches from your priority 'Label Group' using Pedigree Thief '- Read matches (no DNA)', accessed from the main match page using filters. Load into GDAT - most should already be there. This step is purely to get all your key matches into the Pedigree Thief database before the next step of gathering segment data. If your Label Group is large this is particularly important as you won't be able to get all your data at once due to My Heritage system limits;
- Repeat the previous step for your priority matches from a 'Label Group' using Pedigree Thief ' - but this time select 'Read matches (with DNA)' - a maximum of 5 pages can be selected. The smaller your label subset of matches, the better this will work.
- Download the 4 files generated by Pedigree Thief and upload them to GDAT.
- Analyse via chromosome segment data in GDAT. Compare segments with other matches in the segment location at other DNA sites, including any clusters already identified at AncestryDNA with known matches for more clues.
- Double check that all triangulated segment data has been imported - Toggle back to My Heritage and run more Pedigree Thief segment downloads if not all triangulations were completed and upload to GDAT;
- Update 'ancestral couple' labels at My Heritage. where appropriate.
Veronica Williams
First published 24 April 2025
All posts relating to Pedigree Thief
All posts relating to Genealogy Assistant