Blog post

Saturday, July 1, 2017

Visualising Ancestry DNA matches–Part 1–Getting ready

This is the first post of a series that will demonstrate, step-by-step, how to use a free Excel-based network analysis tool to explore and get more from your Ancestry DNA matches.

You can find an index to this series here.

I first posted on this subject a year ago (under the title DIY Ancestry DNA Circles). Since then, I have experimented and refined the methods I’m using. I think my process is ready for prime time, and that a lot of genetic genealogists would get a lot of value from it. The tool is NodeXL Basic – a free template created by the ‘Social Media Research Foundation’ – which gives Excel the ability to create network charts. It has made a huge difference to how I look at my Ancestry DNA matches.

What you will learn

  • how to visualise your Ancestry DNA shared matches, 
  • find DNA circles for yourself and among you matches
  • keep track of little clues that you discover along the way
  • update with new information easily

The people I have in mind while writing these posts have a reasonable basic knowledge of how to get around on their computer (open a file, find and click on a menu item), but no special skills in Excel. You don’t need to be able to write formulas (but if you can you’ll be able to supercharge your analysis).

Example

An example of the type of visualisation it’s possible to create from match lists is below. These are very quick, just to provide a general idea of where this exercise is headed. There’s a lot more you can do.

Each dot represents a person on my match list – I could have included name labels but have left them them off for online display. I’ve also added some additional information to some of my matches, and I have retained those labels. Two of my matches in the green cluster descend from John Allsop and Ellen Fearn. Having looked at some of the trees of people in my red cluster, I suspect they may be connected to my ‘Mack’ family ancestors who lived in Belfast.

Charts showing dots connected by a network of lines

If I decide to focus on the Mack family, for example, I can see right away a group of close matches whose trees I should examine. I will know which ‘private’ or ‘no tree’ people I could contact in relation to that branch, or who I could ask to upload results to GEDmatch to compare DNA directly.

The dots are sized according to the amount of DNA I share with them. Larger dots are closer relatives. Which brings me to another point. I can see relationships here that I can’t see by navigating through Ancestry shared match pages! On Ancestry, the shared match page for each match only shows people who match and who are (approximately) fourth cousins or closer to you. That means that on the match page for one of your thousands or distant relatives you may see one of your fourth cousins. You won’t see that distant relative on the shared match list for the fourth cousin.

Take a look at the green cluster above. There is a smaller, ‘distant relative’ dot at the bottom of the main grouping who is connected to three of the larger, fourth or closer cousin dots. I don’t see that distantly related person on the my fourth cousin shared match pages! But maybe that person has something in their tree that is the key to working out exactly where those others fit in. I also wouldn’t easily see from navigating ancestry pages, or even with any ease the in common with lists, the big clue that they may be related to my ancestors John Allsop and Ellen Fearn.

Once I get through loading data and navigating the interface, I’ll provide some more detailed examples of use.

What you will need

  • Excel 2007 (or more recent) installed on a Windows machine.
    Knowledge of Excel is not required. If you are able to follow the “what you need” steps, you should be able to follow the rest. I will make suggestions for more advanced users of Excel who want to get even more from their. Sorry Mac users, but the tool I’m going to suggest is Windows only.
  • AncestryDNA match and in-common-with lists
    Obviously you will need to have done an AncestryDNA test.
    To obtain these lists, use the DNAGedcom client [PDF link with instructions]. There is a very modest cost involved – $5 US for one month of access. It’s the most convenient way to get the shared match information from Ancestry and well worth the money. Follow the directions, and download your match and in-common-with (ICW) lists.
  • NodeXL Basic Excel Template
    Follow the directions on the website to install the Basic (no cost) template.
    Take care, as there are links to both a Pro and a Basic version. Its the free Basic version you need – make sure the download link you click has the word “Basic” in it. You should not be prompted for a licence number or payment when you install.

In my next post, I’ll show you how to load your information into the template and generate your first graph.

65 comments:

  1. Still confused!!!

    ReplyDelete
  2. Ok, I've got it installed and working. Awaiting your next step/s...

    ReplyDelete
    Replies
    1. I've since uninstalled the template as it was causing issues. On startup it keeps asking for the Pro licence. I ignored that and then an error message popped up that Excel had stopped working and shut the program down. Using Excel 2016 (via Office 365 sub).

      Delete
    2. I have the same version of Excel as you and haven't had any issue, I'm sorry you haven't had the same experience.

      Did you download the Basic or the Pro version? It's the basic version you need. The file you downloaded should have been called NodeXLBasicExcelTemplate2014Setup.exe. A screen pops up for 20 seconds when you first open a template in the basic version but nothing that prompts for a licence.

      Delete
    3. I rechecked and found I had downloaded the Pro version in error. Now working as expected, thanks.

      Delete
  3. can't.. apparently it's not a trusted something or rather ... :(

    ReplyDelete
    Replies
    1. The installation page ( http://www.smrfoundation.org/nodexl/installation/ ) gives some instructions about adding an address to trusted sites if you get an error. If you are prepared to add it to the trusted sites list that should probably resolve the problem.

      Delete
    2. That is only for the pro version, which you do not need. The basic version works without any special setup or permissions. Here is the direct link: http://nodexl.codeplex.com/downloads/get/806203

      Delete
  4. Thanks Shelley - coming late to the party. Have set the afternoon to play (if my payment to DNA Gedcom gets processed by then).

    ReplyDelete
  5. Downloading the three files from Ancestry via DNAGedcom client took 7 1/2 hours. Guess I'll do the next step tomorrow.

    ReplyDelete
    Replies
    1. That first download can be slow.

      Delete
    2. subsequent downloads are relatively quick except for the icw files.

      Delete
  6. Replies
    1. You can find the mac version here:
      http://www.dnagedcom.com/app/DNAGedcom.app.zip. It supports Ancestry, but not yet 23andMe or FTDNA.

      Delete
    2. Rob great! Thanks for that tip for MAC users! I have Excel on my Mac

      Delete
    3. This just took me to the usual DNAGedcom connection client page. I'm wondering if there is a mac version of the Excel Template, ie to use on Numbers?

      Delete
    4. I'm having this same problem trying to use the Basic Excel template but owning a Mac. Did you ever find a solution?

      Delete
    5. NodeXL is Windows only, unfortunately. I've heard that it works on a Mac under Parallels. Alternatively there are a number free programs specifically for network graphs that do run on a Mac. Gephi seems to be popular. It can do most of the things I cover in these posts. Some it does better, some with more data wrangling before loading, but some not at all. I hope you find a solution!

      Delete
    6. I'd love to be able to try this, another Mac user though :(. I will say the Matches and ICWs downloaded from Ancestry lickety-split with DNAGedcom. Now I'm off in search of a tool, will take a look at Gephi.

      Delete
  7. I'm struggling to understand all of this but I have a basic question, because I don't know enough to ask anything else!

    What is the difference between the output from this product and AncestryDNA Helper
    offered by Jeff Snavely?

    What fields appear in the resultant spreadsheet?

    ReplyDelete
    Replies
    1. I haven't used the AncestryDNA Helper myself. My understanding is that only DNAGedcom extracts an in-common-with file, which is the file that shows you which of your matches are related to each other.

      Delete
  8. does it work with Home and Student 2007 Excel

    ReplyDelete
    Replies
    1. I don't know. I've seen someone post elsewhere that they had difficulties with it using Excel 2007.

      Delete
  9. Here's a question: I have an older dual platform macbook pro with both Ios and Windows installed on it. I rarely use its Windows platform but want to try to make this work. Can I do the in-common lists on my newer macbook using the DNAGedcom client [PDF link with instructions, and then transfer the results to the windows platform on the older computer once they're finished?

    ReplyDelete
    Replies
    1. It shouldn't make any difference which platform you use to obtain the csv files. So if NodeXL itself works for you - yes!

      Delete
    2. Thanks, Shelley. I downloaded two sets of information yesterday to .csv files, one to gather matches and one to gather trees. Am I correct that now I gather ICW info?

      Delete
    3. Yes, you'll need the in-common-with info.

      Delete
  10. Is it possible to create an in-common-with file manually? I'd rather not go into details, but for me, the DNAGedcom subscription is an impossibility at this time.

    ReplyDelete
    Replies
    1. It doesn't matter how the file was created - if it has pairs of IDs it will work. It would be an extremely time-consuming process though. You can get the match ID numbers from the URL of the match page on Ancestry.

      Delete
  11. Shelley, I dont have a windows computer, so am thinking about buying one. Would 6gb memory be enough to do this work? Thanks!

    ReplyDelete
    Replies
    1. Hi Mary, I really couldn't say. It will depend on how big your file is.

      Delete
  12. I did this on another computer and it worked as for getting the graph. I had to do 4th cousins. I can't get them to go in groups and I can not click on dots to show anything. I can only click on the name and it will show up in the graph. When it shows connections in red I can not click on those dots. Everything in the graph is froze. I have done this over and over....

    ReplyDelete
    Replies
    1. Do you have a lot of matches? It might be that your file is too large for your PC to manage. I will be describing a clean up step that might help in my next post.

      Delete
    2. I don't have that many 4th. I did mine yesterday and I could click on dots and the names would appear. I did my aunts today with only 4th and my graph keeps trying to move horizontal on top of sheet and when I click on a dot the title bar "document actions" turns orange and nothing happens. And I still can't get it to group in separate boxes.

      Delete
    3. I got the clusters separated a bit with diff colored dots but still can not click on dots to give me the names.

      Delete
  13. Does it have to be Excel? Why wouldn't Open Office software work just as well?

    ReplyDelete
    Replies
    1. Because this process depends on using an add-on program (NodeXL) that was designed for Microsoft Excel.

      Delete
  14. Hi Shelly and thank you very much for this. It's working well for my Ancestry results and I'm already building my own circles and seeing patterns I haven't seen before.

    My question is around FTDNA matches and where I can get my icw file from. I can download my complete list of matches but the icw one eludes me. I see from an earlier blog post of yours that you've been able to do it.

    Cheers
    Laurie

    ReplyDelete
    Replies
    1. Hi Loz, the DNAGedcom client will also download FTDNA information.

      Delete
    2. Hi Shelley
      I've got the files, are there any instructions on which columns to set as Edge and Vertices please?

      Delete
  15. Thanks Shelly I'm trying that now

    ReplyDelete
  16. Do you know if there's a work-around for someone using Excel 2016? I get an error when trying to install NodeXL.

    ReplyDelete
    Replies
    1. It's supposed to work with Excel 2016. Try running it as admin? And check the information on the NodeXL home page about adding it to trusted sites, or any other help that page can offer. I hope it will work for you. Good luck!

      Delete
    2. I'm using it in Excel 2016 without issue.

      Delete
  17. Is there any way to get this to work using Excel on an iMac?

    ReplyDelete
    Replies
    1. Hi Sarah, sorry I missed this comment earlier. It won't work on the Mac version of Excel as I understand, but some people have said that they have been able to use it with Excel running under Parallels.

      Delete
  18. I administer my sister and 2 of my children's DNA tests at Ancestry. Do I download the DNAGedcom files for all of them or just mine to XLNode? Thank you sooo much for sharing this system! I am really looking forward to exploring it. :)

    ReplyDelete
  19. Shelley, should this nodexl work on excell 2013? I downloaded the basic program, but can't find the template anywhere. Any suggestions?

    Gene

    ReplyDelete
    Replies
    1. Hi Gene, it should work. See my reply to another comment:

      http://twigsofyore.blogspot.com.au/2017/07/visualising-ancestry-dna-matchesindex.html?showComment=1501805817748#c561069917028592215

      Delete
    2. Thanks, yes I found it no problem now, but a couple of questions.
      1. When I downloaded the ancestry data I downloaded everything. Took days. Can I just delete lines from the csv below 10 cm ( or higher) to cut down the number of points. First time through it was just a blob.

      2. I am not an excel illiterate, but is there a clean way to clear the spreadsheet out and start over?

      Delete
    3. Deleting the small match records won't help, as they'll still be in the ICW file and show up as edges. You can use the dynamic filter option to temporarily hide them, or 'skip' them to hide them completely. Also try using different clustering algorithms. The Wakita-Tsurumi clustering algorithm usually break it up the most. Have a look at the part 5 and 6 posts for more options that may help.

      Delete
  20. Is there any reason to also process the tree file? --Ken Waters

    ReplyDelete
  21. I have just accessed this page for the first time and wanted to say you are amazing. I have a very good working knowledge of Excel and to be able to combine my love for genealogy, DNA, computers AND Excel is a dream come true. Thank you so much for making this available.

    ReplyDelete
    Replies
    1. You're very welcome :-) I'll just mention that the tool isn't mine, only these instructions. I'm very pleased that so many people have been able to take the step into something they didn't realise they could do.

      Delete
  22. I have just completed the construction of the Excel Spreadsheets and have imported my ancestry.com DNA data - my advice is to do it very slowly and follow the instructions Step by Step - one step at a time - if you rush it you will not succeed. When you succeed you will be rewarded with an outstanding representation of your DNA data. Thank you so much Shelly - very impressive. Well done.

    ReplyDelete
  23. Thank you for this information! I am stuck gathering the matches on dnagedcom. I downloaded the client and I begin the process of gathering the matches and after about 2% have loaded, I receive the following message: "object reference not set to an instance of an object". I have basic computer skills and I have no idea why I am getting this message. My match gathering then ends after I click OK.

    ReplyDelete
    Replies
    1. Hi Anonymous, I'd suggest contacting DNAGedcom about that. One way to do so is through the user group on Facebook: https://www.facebook.com/groups/DNAGedcomUserGroup

      Delete
    2. I have this problem too. I don't have a Facebook account. How can I resolve the problem

      Delete
    3. This comment has been removed by a blog administrator.

      Delete
  24. Once I have downloaded everything per above, do I need to keep my monthly subscription to DNAGedcom?

    ReplyDelete
  25. Once I have downloaded everything per above, do I need to continue my monthly subscription to DNAGedcom?

    ReplyDelete