Blog post

Wednesday, March 14, 2018

Congress 2018 wrap-up

Four days - a busy blur of conference sessions and group gatherings for meals or photos. Now Congress 2018 has ended, and hundreds of delegates have returned home. I expect that like me they were sad to see it end, but ready for a break and a chance to put all they’d learnt into action. Conference tag with ribbons attached, string of beads.

There was a good selection of both local and international speakers, but the speakers are only part of the experience. Jill Ball of GeniAus did an exceptional job of extending the community spirit and camaraderie that exists among genealogy bloggers to the non-blogging conference goers. Or at least that’s how it appeared to me, and I hope that’s how they felt about it!

I caught up with friends I had met online or at the Canberra conference in 2015, with my cousin who was also attending, and also made/met some new friends. I don’t want to name names or I will be sure to leave someone out.

I delivered my presentation on Visualising DNA Matches with Network Graphs on Sunday evening. The conference started on Friday so there were three days for my nerves to build, but also three days to settle in and feel like part of the genealogy community. Several people told me afterwards that they were keen to try graphing their DNA matches, or spoke to me about the insights they had already gained through doing so.

I’ve run through my notes and made a list of things to try, or thoughts to hang on to. Some of my top items:

  • Need to investigate the journals section of Trove.
  • Possible purchase: Farewell my Children by Richard E Reid (after hearing Pauleen Cass talk)
  • Why don’t I have a copy of Phillimore’s Atlas?! Must fix that (several talks prompted this thought).  
  • Need to take a proper look at DustyDocs.
  • Judy Russell (The Legal Genealogist) provided links to sites with public domain photos – bookmark them.
  • Freemason records! Now that I’ve learnt more about these I definitely want to follow up on the Freemasons in my family. 
  • Lewis’ gazetteer – get hold of that too.
  • Lisa Louise Cooke spoke about using Google Earth Pro. I realised I already have it on my computer and promptly lost several hours playing with it. She said that would happen…
  • A couple of blog tweaks I should probably make after hearing Jill Ball talk about Beaut Blogs.

One of the highlights was meeting international speaker, Judy Russell (The Legal Genealogist).

Shelley Crawford and Judy Russell

This is one of the few photos I have of people – I really should have taken more. Between lunches, dinners, group photos and other get togethers it felt like I had taken a million, but apparently not.

It was very disappointing to hear that none of the Societies have put their hand up to host the next Congress. I hope that we will hear good news on that front soon. I will be more than ready to go to another conference in three years from now.

Saturday, March 3, 2018

Triangulation is the icing, not the cake

I’m seeing more and more DNA network graphing activity going on. I’m so pleased to see that there are tools being developed to make this type of approach widely available.

One concern I have with these new developments is the exclusive use of “triangulated” segments to link between two DNA matches. By triangulated segments I mean segments of DNA that you and two of your DNA matches all have in common.

Don't get me wrong - triangulation is a very good thing. If you have a triangulated DNA segment, there’s a very good chance that all three of you inherited it from the same ancestor (whoever that may be). Sticking to triangulated segments only is appealing and seems an intuitively sensible choice – they provide a degree of confidence because you know that the relationships you see are relevant to your ancestry.

My contention is that the addition of DNA relationships that don’t have triangulated segments is essential to find groups of mid range – say 2nd to 4th - cousins descended from a common ancestor among a set of matches.

The triangulated view

Below is a layout of triangulated segments extracted from Gedmatch using the Tier 1 triangulation report (chart produced with Gephi). Many of the groups here – particularly the large groups – are very distant relatives.

Notice the four pink dots? They are known cousins who all share a common ancestor. They match me and each other in the 1st cousin once removed to fourth cousin range. Only one of the six possible pairings of the four shows a triangulated segment! And that line is between the two more distant (to me) matches. If I didn’t know that all four of them had a common ancestor there would not be much in the chart that compelled me to pursue how those four people match.

Chart showing distinct separated clusters of dots and lines

The untriangulated view

Below is a different view of the data, taking a different approach.

Here I added in shared match information from Gedmatch’s “People who match one or both of two kits” report for all my matches over 20 centiMorgans (cM) . This includes pairs of matches without any triangulated (with me) segments. In the chart I have limited the matches shown those who share 20cM with me AND with each other. This is similar to but slightly more inclusive than Ancestry’s thresholds (there’s another post in what Ancestry does that I may write one day).

  • The blue lines indicate the match pair has at least one shared segment in common with me.
  • Grey lines indicate that the people at each end of the line match each other, but there is no overlap of segments between the pair of matches and me.

I needed to limit the connections between people on the amount of DNA they shared with each other in order to stop the number of links in the chart from becoming ridiculous – and I have no known endogamy.

I should also mention that in both these charts, thicker lines indicate larger shared cM amounts between the pairs of matches. The thickest lines are parent/child or sibling relationships. The size of the dot reflects the relationship with me. Larger dots are closer relatives.

Chart with sparse but interconnected dots and lines, with a few distinct clusters

Quite a different picture. While I’ve lost a lot of distant matches, there is now the suggestion of a grouping with my known cousins. The chart is more interlinked – some of these links may be coincidental relationships nothing to do with my tree.  I would look upon single links between clusters with suspicion but not dismiss them entirely.

There are some some clusters entirely made up of “untriangulated” match pairs including relatives closer than 20cM to me who do NOT show up in the triangulated only version above. These are clusters that are close enough that I might be able to determine the common ancestor with a little digging. 

Is what I am seeing with my four cousins a one-in-a-million random chance occurrence?Chromosome browser view with blue and orange segment markers that don't overlap

I don’t think so. I suspect that there’s a higher chance of relatives in a researchable timeframe not sharing a triangulated segment than one may imagine.

Here’s another example – a Family Tree DNA chromosome browser view of two people who share one great-great grandparent with me. They are more closely related to each other. There is ample paper and other DNA evidence to say that the relationship is correct.

No stacked blue and orange lines = no triangulated segments.

Once again, if I didn’t already know about it, a connection between these two people is exactly what I would want to find in my data.

I would be interested to know if readers can find further examples of close matches that don’t triangulate in their data.

So if triangulated matches between closer relatives are so hard to come by, why those big clusters of distant triangulated relationships?

As each generation passes, you are less likely to inherit DNA from a particular ancestor. For a very distant ancestor you may have only one segment, if any. Each ancestor, however, has on average an increasing number of descendants with each generation. The chances of another descendant having the same inherited segment as you are slim… but there are a lot of other descendants. A small fraction of them do inherit that same segment. If they DNA test, they all match in common with each other on that one segment and become a cluster in the chart. You can see it when you look at the chromosome data for the matches in a big cluster – they all match in a big stack at one location.

Keeping only triangulated segments is cleaner and increases the chance that the relationship you see is due to a shared ancestor – but that doesn’t necessarily make them more helpful for research. There is a risk of losing close match information that could be researched, for the sake of distant match information beyond paper trail timeframes.

Finding the balance

A compromise position that trimmed off untriangulated relationships for distant relatives, but kept them where there was a close relationship, might be the answer.

The version of the graph below uses the same thresholds as the untriangulated chart (20cM shared with me, 20cM shared between match pairs), but then adds in all triangulated segments between pairs of people who each share 20 cM or more with me. This adds in a few more matches, and the addition of the less close triangulated lines support some of the untriangulated clusters. I now have a good picture of that group of four known matches in pink. There is a winding path of untriangulated matches connecting several of the triangulated (and untriangulated) groups. While they complicate the picture they do alert me to the possibility that my tree may have intermarriage that I’m not aware of. It’s messy, but not necessarily a bad thing.

Network chart showing interconnected lines, with a moderate number of distinct clusters

DNA products and datasets

I would like to see DNA matching datasets (or products made from them) with as many as possible of the following attributes:

  • Inclusion of close in-common-with relationships that don’t have triangulated segments.
  • Data on the strength of the total connection between pairs of matches (ie or edge filters using this information).
  • Ability to distinguish between match pairs with and without triangulated segments.
  • Ability to set different thresholds for triangulated and non-triangulated edges.
  • Inclusion of total match size for each match.

Triangulated segments are the icing, not the cake.

I hope that as more products and data extraction capabilities are developed some of these ideas will be incorporated. You can help by giving developers a push along these lines when you provide feedback about their products.