I’m seeing more and more DNA network graphing activity going on. I’m so pleased to see that there are tools being developed to make this type of approach widely available.
One concern I have with these new developments is the exclusive use of “triangulated” segments to link between two DNA matches. By triangulated segments I mean segments of DNA that you and two of your DNA matches all have in common.
Don't get me wrong - triangulation is a very good thing. If you have a triangulated DNA segment, there’s a very good chance that all three of you inherited it from the same ancestor (whoever that may be). Sticking to triangulated segments only is appealing and seems an intuitively sensible choice – they provide a degree of confidence because you know that the relationships you see are relevant to your ancestry.
My contention is that the addition of DNA relationships that don’t have triangulated segments is essential to find groups of mid range – say 2nd to 4th - cousins descended from a common ancestor among a set of matches.
The triangulated view
Below is a layout of triangulated segments extracted from Gedmatch using the Tier 1 triangulation report (chart produced with Gephi). Many of the groups here – particularly the large groups – are very distant relatives.
Notice the four pink dots? They are known cousins who all share a common ancestor. They match me and each other in the 1st cousin once removed to fourth cousin range. Only one of the six possible pairings of the four shows a triangulated segment! And that line is between the two more distant (to me) matches. If I didn’t know that all four of them had a common ancestor there would not be much in the chart that compelled me to pursue how those four people match.
The untriangulated view
Below is a different view of the data, taking a different approach.
Here I added in shared match information from Gedmatch’s “People who match one or both of two kits” report for all my matches over 20 centiMorgans (cM) . This includes pairs of matches without any triangulated (with me) segments. In the chart I have limited the matches shown those who share 20cM with me AND with each other. This is similar to but slightly more inclusive than Ancestry’s thresholds (there’s another post in what Ancestry does that I may write one day).
- The blue lines indicate the match pair has at least one shared segment in common with me.
- Grey lines indicate that the people at each end of the line match each other, but there is no overlap of segments between the pair of matches and me.
I needed to limit the connections between people on the amount of DNA they shared with each other in order to stop the number of links in the chart from becoming ridiculous – and I have no known endogamy.
I should also mention that in both these charts, thicker lines indicate larger shared cM amounts between the pairs of matches. The thickest lines are parent/child or sibling relationships. The size of the dot reflects the relationship with me. Larger dots are closer relatives.
Quite a different picture. While I’ve lost a lot of distant matches, there is now the suggestion of a grouping with my known cousins. The chart is more interlinked – some of these links may be coincidental relationships nothing to do with my tree. I would look upon single links between clusters with suspicion but not dismiss them entirely.
There are some some clusters entirely made up of “untriangulated” match pairs including relatives closer than 20cM to me who do NOT show up in the triangulated only version above. These are clusters that are close enough that I might be able to determine the common ancestor with a little digging.
Is what I am seeing with my four cousins a one-in-a-million random chance occurrence?
I don’t think so. I suspect that there’s a higher chance of relatives in a researchable timeframe not sharing a triangulated segment than one may imagine.
Here’s another example – a Family Tree DNA chromosome browser view of two people who share one great-great grandparent with me. They are more closely related to each other. There is ample paper and other DNA evidence to say that the relationship is correct.
No stacked blue and orange lines = no triangulated segments.
Once again, if I didn’t already know about it, a connection between these two people is exactly what I would want to find in my data.
I would be interested to know if readers can find further examples of close matches that don’t triangulate in their data.
So if triangulated matches between closer relatives are so hard to come by, why those big clusters of distant triangulated relationships?
As each generation passes, you are less likely to inherit DNA from a particular ancestor. For a very distant ancestor you may have only one segment, if any. Each ancestor, however, has on average an increasing number of descendants with each generation. The chances of another descendant having the same inherited segment as you are slim… but there are a lot of other descendants. A small fraction of them do inherit that same segment. If they DNA test, they all match in common with each other on that one segment and become a cluster in the chart. You can see it when you look at the chromosome data for the matches in a big cluster – they all match in a big stack at one location.
Keeping only triangulated segments is cleaner and increases the chance that the relationship you see is due to a shared ancestor – but that doesn’t necessarily make them more helpful for research. There is a risk of losing close match information that could be researched, for the sake of distant match information beyond paper trail timeframes.
Finding the balance
A compromise position that trimmed off untriangulated relationships for distant relatives, but kept them where there was a close relationship, might be the answer.
The version of the graph below uses the same thresholds as the untriangulated chart (20cM shared with me, 20cM shared between match pairs), but then adds in all triangulated segments between pairs of people who each share 20 cM or more with me. This adds in a few more matches, and the addition of the less close triangulated lines support some of the untriangulated clusters. I now have a good picture of that group of four known matches in pink. There is a winding path of untriangulated matches connecting several of the triangulated (and untriangulated) groups. While they complicate the picture they do alert me to the possibility that my tree may have intermarriage that I’m not aware of. It’s messy, but not necessarily a bad thing.
DNA products and datasets
I would like to see DNA matching datasets (or products made from them) with as many as possible of the following attributes:
- Inclusion of close in-common-with relationships that don’t have triangulated segments.
- Data on the strength of the total connection between pairs of matches (ie or edge filters using this information).
- Ability to distinguish between match pairs with and without triangulated segments.
- Ability to set different thresholds for triangulated and non-triangulated edges.
- Inclusion of total match size for each match.
Triangulated segments are the icing, not the cake.
I hope that as more products and data extraction capabilities are developed some of these ideas will be incorporated. You can help by giving developers a push along these lines when you provide feedback about their products.
This is a very useful post.As an African American who does not have robust family trees on all my lines I have felt the need to push the envelope in terms of analyzing DNA results that fall short of multiple triangulated groups. This is the first post I have seen that provides some direction in that regard.
ReplyDeleteThank you, and best of luck applying this to those difficult lines. It still depends on the right people having testing, of course.
DeleteExcellent new ideas here, Shelley. I think you may be correct that using In Common With data may be better than using triangulated segments for finding genealogically-relevant connections, as the latter are likely more distant. And although I personally find the exploration of 3D graphs tedious since everything looks like a pattern, I love how you used them to demonstrate this. Great work!
ReplyDeleteI should however correct one statement of yours: "If you have a triangulated DNA segment, you know that all three of you inherited it from the same ancestor". Be careful with this. Any segments inherited from the same ancestor will triangulate, but segments that triangulate are not necessarily from the same ancestor because they may match on different parental chromosomes and if small (under 7 cm), one or more may match the others by chance.
Thank you Louis. I appreciate the feedback, both the positive and the correction. Particularly as you are not a fan of the charts!
DeleteYou are quite right in your correction - I will edit to soften those words.
This is an excellent post, Shelley. Nice balance between triangulation and the emerging ability to create networks. I hope it gets easier to do. I think it would be great if you could do a webinar on how to do this.
ReplyDeleteThanks Patti, I wore a series of posts that you may find helpful about how to do this sort of thing with Ancestry DNA data using NodeXL here: http://twigsofyore.blogspot.com.au/2017/07/visualising-ancestry-dna-matchesindex.html
DeleteGephi isn't that difficult, but there is a bit more of a learning curve. I may put up some posts - and maybe even try making a video - on how to use it when I get time.
I connected with a distant cousin on Lost Cousins. Our common ancestor is my 4th GG. I was very excited for her to upload her DNA to Gedmatch so I could compare. She and I share a nice chunk of DNA and the only person it triangulates with is my great-aunt. She and my great-aunt share 5 good sized segments on 5 different chromosomes and...nothing. Now, this could be a question of population that has tested, but as I read what you’re saying about triangulation groups and looking at this experience and all my other experience with triangulation (ending up with triangulation groups where one of the people is 3-5 generations back but the remainder is 4th to remote), I’m thinking I need to widen my approach.
ReplyDeleteThis and other reactions to the post on Facebook suggests it's not just me!
DeleteLooking forward to learning more about this at Congress. I need time to digest all this.
ReplyDeleteMy Congress presentation is much more like the NodeXL series, but I'm sure we'll get a chance to chat. See you on Friday :-)
DeleteThank you Shelley for a very interesting blog post. Pictures say more than a thousand words. I've long argued about the limitations of triangulation for the very reasons you describe.
ReplyDeleteGreat article and analysis. I agree and look forward to what the future holds with those smart developers out there.
ReplyDeleteThanks Paul. There's so much promise for what we could do.
DeleteI am curious what you think of the new tool RootsFinder?
ReplyDeleteI think that there are a lot of good things about it. I really like how they've implemented the colour coding, sizing dots by shared cM, filtering on shared cM, the information that pops up on the side. I think is also has some trouble spots, such as the layout on the network graph view. Groups overlap so it's hard to tell what's going on. I think the view where they have a circle with bundled edges works better. The main trouble spot is the data itself. More processing to eliminating duplicates of the same person would be a big help. Oh yes, and it misses big non-triangulated shared matches connections!! I'm looking forward to seeing how it continues to develop.
DeleteWould that kind of visualization be possible using the FTDNA data taken from the Chromosome browser view, or would it be missing data?
ReplyDeleteYes, you can use FTDNA data but it's more difficult to manage. DNAGedcom will download in common with lists for FTDNA. The main difficulty is that since FTDNA don't have a threshold or way to filter the shared match size (ie how closely the two shared matches are related to each other). As a result there's a very large number of very small shared relationships between matches included, which makes for a dense graph unless you heavily filter the matches themselves.
DeleteYou mentioned above that you may write about what Ancestry does with shared match thresholds, I was wondering if you did write about that? I thought about doing so myself but A.) nobody wanted to hear me go on about it and, b) I'm not a blogger. Though I have to admit, my conclusion has morphed lately into they just don't have the brains or bandwidth in Utah to handle their growth. An example being the surname search function on match list being flaky at best.
ReplyDelete