Blog post

Tuesday, June 14, 2016

DIY Ancestry DNA circles

Ancestry didn’t give me any DNA circles, so I made my own. If you want to join me in the DNA circle loop, then you will need AncestryDNA results and:

Use the DNAGedcom client to download your Ancestry matches and in-common-with (ICW) results as spreadsheets. You will need to click “Gather Matches” and “Gather ICW”. It’s the most convenient way to get the shared match information from Ancestry.

NodeXL is where the magic happens. It’s an Excel tool for social network analysis. I used NodeXL because it’s in Excel which I’m familiar with and it has all the facilities I need in the free version. I don’t know anything about social network analysis, and I didn’t need to in order to get the result I wanted. Follow the instructions on the website linked above to get started. It takes a little fiddling to get used to it, but in the familiar Excel interface it’s not as intimidating as it might at first seem.

Now the fun begins!

When you create a file using the template, you will see an extra ribbon, and an area for your charts to display. Those extra features won’t be there when you open Excel as normal, only when you open a spreadsheet from the template.

You will see several tabs. The most important for our purposes are “Vertices” and “Edges”. Think of “Vertices” as people, and “Edges” as relationships between people. The list of Match IDs goes into “vertices”, and the paired Match IDs in the ICW file goes into “edges”. As it’s Excel, you can cut and paste data into the sheets. I pasted twice on each sheet – the first time with just the match ID numbers in the first column (or two columns for Edges), then the rest of the columns into the “add your own columns here” section.

Click “Refresh Graph” to see a graph of your information. When you first drop match information in you will probably get a big mess of dots and crossing lines. There are options to fix that.

With a bit of fiddling, I came up with this:

image 

Look! I’ve got circles!

Each dot represents a person, each line a DNA relationship between two people. When trying to interpret the information remember that that Ancestry has a cut off – it won’t show shared matches unless at least one of the people is a fourth cousin or closer to you. At least, that’s how I think it works. I’m not sure if they also have to be fourth cousins or closer to each other to show up. If you can enlighten me on exactly how it works, I’d be grateful.

The point is to remember that because of the cut-off there are likely to be other relationships between the dots that you can’t see. I assume that’s what’s happening with the fan shaped ‘circles’. I had 35 fourth cousins or closer at the time of making this chart and no circles or “New Ancestor Discoveries”.

To get distinct clusters I first used the “Group by cluster…” option on the toolbar.

image

The groups might still be mixed up at this stage. To separate the groups from each other, I clicked the little arrow dropdown to the right of “Circle” (above) and under “Layout options” I chose “Lay out each of the graph’s groups in it’s own box”.

image

For the layout I chose “Circle”. Because I wanted DNA circles. You could make a DNA spiral or a sine wave or a grid or a random layout or … but circles work nicely and they help with the circle-envy. This option is available both on the main NodeXL ribbon, and in the settings at the top of the graph area.

“Autofill columns” on the main ribbon lets you easily move information from your own columns into the columns that control the graph’s appearance. There are a lot of options to play with – size and colour of dots, thickness of lines all have potential. I set the size of each dot to the number of Shared cM with me. You can also label the dots using information on the sheet. The obvious label to use is the person’s name.

You need to refresh the graph by clicking “Show graph” when data changes on a worksheet. If you’re only changing display options, you can save the recalculation time by clicking “Lay Out Again”.

There’s a lot of fun to be had just playing with the options. I’ve also tried this with my FTDNA results. For those, I had a much busier chart. Different clustering algorithms had different effects, and the dynamic filter came in useful to clear away matches who sat in distracting “pile up regions” which could be seen as a dense collection of interlinked spots.

In my next post I’ll show you how I used my DIY Ancestry DNA circles to identify a new research lead.

11 comments:

  1. Intersting Blog. Maybe we should follow each other's blogs.
    thestephensherwoodletters.blogspot.com

    ReplyDelete
  2. Shelley, you've excelled (oops bad pun!) yourself. I'm not unfamiliar with Excel but this sounds a tad overwhelming. However I'll save the post and reflect on it further.

    ReplyDelete
  3. Shelley I just wanted to thank you so much for this. I have created my own circles now using this method and it is so interesting!

    ReplyDelete
    Replies
    1. Thanks Aillin, I'm so pleased you found it useful and could build your own circles! I'm still having a lot of fun playing with NodeXL.

      Delete
  4. Hi Shelley, I am still using NodeXL to create circles of my DNA matches and want to thank you again for this post as it has been so useful for my DNA research. Have you shared it on the DNAGedcom group on Facebook or the DNA for Genealogy Aus & NZ Facebook group? I think they would find it very useful too :) Thanks again.

    ReplyDelete
    Replies
    1. That's fantastic! And so good of you to write back again, I really appreciate it. Also, what great timing... I'm planning a series of posts that take the method further, so your message is great encouragement :-) I was going to start writing them after this weekend, which is a bit busy. I'd love to compare notes or have a guinea pig to try some things out, if you're interested? I can't see a way to contact you privately - could you maybe message me through my Facebook page? https://www.facebook.com/TwigsOfYore/

      Delete
  5. Does DNAGedcom client download into Chromobook Google spreadsheets ? Can hardley wait to try this method.

    ReplyDelete
    Replies
    1. Hi Magda, I'm sorry to say that NodeXL only works in Excel in Windows.

      Delete
  6. Putting two technical suggestions up front for people who made the same mistakes I did:
    1. Installed Windows version of NodeXL, loaded Excel, couldn't find template anywhere. Win 10 64 bit and others reported issues. Turns out you have to find the template by name in the Start menu and run it there. Comes up just fine.
    2. Built first spreadsheet to use. Pasted selected DNAGEDCOM match into Vertices tab, ICW data for about 5 people into the Edges tab. Worked great. Copied more data into place in the same spreadsheet. Didn't work at all! Totally counter-intuitive. Maybe there's a way to get this iterative copying to work, but I gave up after about 45 minutes. Realized Excel might support a much better approach anyway, and it does. Put your entire match data into the vertices and 100% or your ICW rows into edges. Don't try to pre-select ICW rows to copy. Put them all in there. Then you can Filter on one of the ICW columns, graph, select more rows, graph again, select fewer rows, graph again, etc. This way, you're never trying to paste new data after your initial population of the tabs.
    3. For convenience of selecting ICW rows, I put the ICW name into the Label column and the ICW admin name into the first "add your columns here" column on the right. I filter on the ICW admin column most of the time to pick up all the kits that one person administers. But you can filter on any column you want.

    ReplyDelete
    Replies
    1. Have you seen my more recent series with detailed instructions about how to import and work with the data? I think you'll find it easier than copying and pasting the data in!
      https://twigsofyore.blogspot.com.au/2017/07/visualising-ancestry-dna-matchesindex.html

      Delete
  7. Thanks for the link, I'll work my way through the posts. I'm getting hundreds of Ancestry matches added each week so I opted for the simplest exploration so far (having only spent a couple of evenings on this). I'm needing to add various DNA-based cousins to my genealogy database to help interpret clustering (for example, one cluster all comes through the same pioneers in Perry Co, Pa while another has the right surname for the expected DNA match but through a different PA count - do they intersect in any known way?). Once done, groups closer to me may point to where this DNA entered my own cluster of cousins. The next thing I'll read in the blogs relates to removing clutter from the graphs to see useful structure. I've already realized I should start small and then add in people as seems appropriate. / Tom

    ReplyDelete