Sunday, October 24, 2021

Tools: Clustering for Chromosome Analysis

Clustering tools are a great way to quickly identify groups of matches with a relationship to each other.  

As we know shared matches are 'clues', whilst shared segments are 'evidence' of a shared ancestor.  All cluster tools identify matches that are likely to be clues of common relationships.   The best types of clusters in my opinion, are shared segment clusters, however every cluster analysis you do is likely to give you new ideas about how your matches might relate to you.

This post is aimed at identifying where clustering can be undertaken and whether they provide shared match or shared segment clusters.  Click on the heading hyperlinks for more information, plus please refer to the blog posts at the end of this page for more information about using each of the different tools.

  

DNAGedcom Client 

Using the cluster tool at DNAGedcom requires a subscription which can be taken out for a minimum of one month.  The DNAGedcom client provides clusters based on the Collins Leeds Method for AncestryDNA and FamilyTreeDNA.  

As AncestryDNA does not provide segment data, the clusters generated are 'shared match' clusters.  Generating clusters with matches >30cMs will generally given an indication of shared ancestry, however clusters generated with less than that amount may give false leads with matches potentially sharing different more distant ancestors.   


DNAGedcom has an another advantage when working with AncestryDNA data.  It allows you to identify other shared matches that are below the 20cMs threshold, provided they share >20cMs with the match you have in common.  This can be confusing, hopefully this cheat sheet helps. 

The FamilyTreeDNA clusters generated by DNAGedcom are based on shared segments and most matches in the cluster are usually triangulated groups.  Due to this, smaller cMs defaults can be used and each cluster will provide clues of a shared common ancestor.  Additionally, shared matches are also shown in grey.


  

GEDmatch 

There are now 2 cluster tools at GEDmatch, they require a Tier 1 subscription which can be taken out for a minimum of one month. 

The first tool is the AutoTree Clustering tool which is based on shared segments and each cluster of the same colour is usually a triangulated group.  Similar to the DNAGedcom tool for FTDNA, smaller cMs defaults can be used and each cluster will provide matches of interest sharing common ancestors.



In Oct 2021 a second tool was released called AutoSegment.  It provides triangulated data.  Refer to the Roberta Estes blogpost for more information on this great new tool.





The free tool at My Heritage is an example of a 'shared match' cluster, so whilst the matches in each cluster have something in common, they may not all descend from the same common ancestor.  The advantage of this cluster tool is that it also provides segment detail.  In the example below interrogation of Cluster 3 (yellow) revealed four distinct triangulated groups, each providing separate clues regarding the identity of the common ancestor.  The main limitation with this tool is that you are unable to change or refine the search parameters. 

Genetic Affairs

Genetic Affairs is a third-party tool created by Evert-Jan Blom that provides great tools for use with 23andMe, FTDNA, GEDmatch and My Heritage DNA data.  Evert also developed the free My Heritage cluster tool and the new GEDmatch AutoSegment tool.  

The Genetic Affairs site is another subscription site that provides a range of tools to assist with your analysis.  Running the tools at Genetic Affairs provides more flexibility and in particular for the My Heritage cluster also shows interrelationships, something not provided by the free tool.

Genetic Affairs provides many other useful tools on its site, including AutoKinship, AutoTree and AutoPedigree.


Shelley Crawford Clusters 

Connected DNA used to provide fabulous network maps of shared match clusters for AncestryDNA, FamilyTreeDNA, and 23andMe.  However, as at October 2021 Shelley is taking a break and no orders are being taken at the moment.  We hope she will be back soon.


DNA Painter Cluster Auto Painter 

You can upload clusters created from DNAGedcom (FTDNA only), My Heritage, GEDmatch and Genetic Affairs to the DNA Painter Cluster Auto Painter to visualise the segments and make notes about the results of your analysis.  This allows for a more visual approach to analysing your segment data.  The image below has been generated from the output of the AutoSegment at GEDmatch.



Shared Clustering Tool

The Shared Clustering tool by Jonathan Brecher works from AncestryDNA shared match lists. Rather than providing a single list of matches ordered only by the strength of the match, Shared Clustering divides that list into smaller clusters of matches that are likely related to each other.  Downloading directly from AncestryDNA has been disabled, but files extracted using DNAGedcom can be uploaded.  It only works using Windows.



RootsFinder 

RootsFinder is a family tree building and DNA analysis website. The premium level has DNA features for a subscription fee.  The triangulation (cluster) view allows you to view your matches in clusters – otherwise known as a network graph.



Ideas for exploring your cluster

* Explore the matches in the cluster and check if the shared matches are also sharing the same segments;

* Are there any triangulated groups within the cluster?  Explore these matches first.  Remember you can expand each triangulated group by checking your segment data for others not appearing in the cluster report, who may not have met the cluster criteria.

* Are there any 'bridge' matches in the cluster?  Use these to help you to find others who match in the same segment area at other sites.

* If the cluster includes matches not triangulating with the core group, explore those segment areas as these may provide additional clues to the possible relationship of those in the cluster. 

* Do the genealogy!  Build research trees for your matches and revisit your own pedigree to search for the what is in common between the cluster group - surnames, locations, ethnicities?

* Once you have identified a 'side' or MRCA makes notes on your master list and at the DNA site for all the matches in the triangulated group. 

* You may also wish to allocate a reference number to your Cluster for future reference.  Remember all the tools have their own numbering systems which constantly change with each report.

* Make sure you are systematic with your notes so that next time you generate a cluster report you can easily see matches that have been worked on before.

* Don't forget to consider the other side of the chromosome - use what you now know to mark the segments on the opposing side.   Explore the other side of the 'specific segment area' for more triangulated groups.  By looking at both sides concurrently you can then mark others not triangulating as potentially false positive matches, saving analysis time in the future.

My group numbering system has changed many times over the years, it has been adapted from the Jim Bartlett method.  It doesn't need to be as complex as this, but it needs to be meaningful for you.



More information:

Check out these links for useful blogs that may help you interpret how the different cluster sites work:

* The Leeds Method, Dana Leeds, 2018 

DNA Sleuth, 2019

Walking the Clusters Back, Jim Bartlett 2019

* Cluster Auto Painter, Jonny Perl 2019

* Shared Clustering - A great tool! Jim Bartlett, 2019

* Connecting the Dots, My Heritage 2020

* Fast Ways to Cluster your DNA Matches, Family Locket 2020

* Genetic Affairs Tools and how to use them, Roberta Estes 2020

* Walking Back the Clusters, Veronica Williams 2020.  Demonstration of applying Jim's method.

* Using DNA tools to solve a family mystery, Vicki Hails 2020

* Annotating a Cluster Auto Painter Map, Jonny Pearl 2020

Auto Segment Triangulation Tool at GEDmatch, Roberta Estes 18 Oct 2021

* RootsFinder Network Graphs, Family Locket 2021



AncestryDNA

Whilst AncestryDNA does not provide chromosome information it is always best to start your analysis there, due to the large numbers in the database and its many pedigrees.  AncestryDNA Clustering can be done using the DNAGedcom CLM tool, Shared Clustering Tool and the DNA2 Tree app.  You will often find Ancestry testers at other sites, particularly GEDmatch and My Heritage (a growing database with lots of family trees).  Always look for 'bridge matches' between the sites as they can help to expand your research pool and help you tie your triangulated groups together with broader shared match clusters.


Veronica Williams

First published: 24 Oct 2021

Last updated: December 2023