I’m seeing more and more DNA network graphing activity going on. I’m so pleased to see that there are tools being developed to make this type of approach widely available.
One concern I have with these new developments is the exclusive use of “triangulated” segments to link between two DNA matches. By triangulated segments I mean segments of DNA that you and two of your DNA matches all have in common.
Don't get me wrong - triangulation is a very good thing. If you have a triangulated DNA segment, there’s a very good chance that all three of you inherited it from the same ancestor (whoever that may be). Sticking to triangulated segments only is appealing and seems an intuitively sensible choice – they provide a degree of confidence because you know that the relationships you see are relevant to your ancestry.
My contention is that the addition of DNA relationships that don’t have triangulated segments is essential to find groups of mid range – say 2nd to 4th - cousins descended from a common ancestor among a set of matches.
The triangulated view
Below is a layout of triangulated segments extracted from Gedmatch using the Tier 1 triangulation report (chart produced with Gephi). Many of the groups here – particularly the large groups – are very distant relatives.
Notice the four pink dots? They are known cousins who all share a common ancestor. They match me and each other in the 1st cousin once removed to fourth cousin range. Only one of the six possible pairings of the four shows a triangulated segment! And that line is between the two more distant (to me) matches. If I didn’t know that all four of them had a common ancestor there would not be much in the chart that compelled me to pursue how those four people match.
The untriangulated view
Below is a different view of the data, taking a different approach.
Here I added in shared match information from Gedmatch’s “People who match one or both of two kits” report for all my matches over 20 centiMorgans (cM) . This includes pairs of matches without any triangulated (with me) segments. In the chart I have limited the matches shown those who share 20cM with me AND with each other. This is similar to but slightly more inclusive than Ancestry’s thresholds (there’s another post in what Ancestry does that I may write one day).
- The blue lines indicate the match pair has at least one shared segment in common with me.
- Grey lines indicate that the people at each end of the line match each other, but there is no overlap of segments between the pair of matches and me.
I needed to limit the connections between people on the amount of DNA they shared with each other in order to stop the number of links in the chart from becoming ridiculous – and I have no known endogamy.
I should also mention that in both these charts, thicker lines indicate larger shared cM amounts between the pairs of matches. The thickest lines are parent/child or sibling relationships. The size of the dot reflects the relationship with me. Larger dots are closer relatives.
Quite a different picture. While I’ve lost a lot of distant matches, there is now the suggestion of a grouping with my known cousins. The chart is more interlinked – some of these links may be coincidental relationships nothing to do with my tree. I would look upon single links between clusters with suspicion but not dismiss them entirely.
There are some some clusters entirely made up of “untriangulated” match pairs including relatives closer than 20cM to me who do NOT show up in the triangulated only version above. These are clusters that are close enough that I might be able to determine the common ancestor with a little digging.
Is what I am seeing with my four cousins a one-in-a-million random chance occurrence?
I don’t think so. I suspect that there’s a higher chance of relatives in a researchable timeframe not sharing a triangulated segment than one may imagine.
Here’s another example – a Family Tree DNA chromosome browser view of two people who share one great-great grandparent with me. They are more closely related to each other. There is ample paper and other DNA evidence to say that the relationship is correct.
No stacked blue and orange lines = no triangulated segments.
Once again, if I didn’t already know about it, a connection between these two people is exactly what I would want to find in my data.
I would be interested to know if readers can find further examples of close matches that don’t triangulate in their data.
So if triangulated matches between closer relatives are so hard to come by, why those big clusters of distant triangulated relationships?
As each generation passes, you are less likely to inherit DNA from a particular ancestor. For a very distant ancestor you may have only one segment, if any. Each ancestor, however, has on average an increasing number of descendants with each generation. The chances of another descendant having the same inherited segment as you are slim… but there are a lot of other descendants. A small fraction of them do inherit that same segment. If they DNA test, they all match in common with each other on that one segment and become a cluster in the chart. You can see it when you look at the chromosome data for the matches in a big cluster – they all match in a big stack at one location.
Keeping only triangulated segments is cleaner and increases the chance that the relationship you see is due to a shared ancestor – but that doesn’t necessarily make them more helpful for research. There is a risk of losing close match information that could be researched, for the sake of distant match information beyond paper trail timeframes.
Finding the balance
A compromise position that trimmed off untriangulated relationships for distant relatives, but kept them where there was a close relationship, might be the answer.
The version of the graph below uses the same thresholds as the untriangulated chart (20cM shared with me, 20cM shared between match pairs), but then adds in all triangulated segments between pairs of people who each share 20 cM or more with me. This adds in a few more matches, and the addition of the less close triangulated lines support some of the untriangulated clusters. I now have a good picture of that group of four known matches in pink. There is a winding path of untriangulated matches connecting several of the triangulated (and untriangulated) groups. While they complicate the picture they do alert me to the possibility that my tree may have intermarriage that I’m not aware of. It’s messy, but not necessarily a bad thing.
DNA products and datasets
I would like to see DNA matching datasets (or products made from them) with as many as possible of the following attributes:
- Inclusion of close in-common-with relationships that don’t have triangulated segments.
- Data on the strength of the total connection between pairs of matches (ie or edge filters using this information).
- Ability to distinguish between match pairs with and without triangulated segments.
- Ability to set different thresholds for triangulated and non-triangulated edges.
- Inclusion of total match size for each match.
Triangulated segments are the icing, not the cake.
I hope that as more products and data extraction capabilities are developed some of these ideas will be incorporated. You can help by giving developers a push along these lines when you provide feedback about their products.