Blog post

Sunday, July 2, 2017

Visualising Ancestry DNA matches-Part 2-Loading files the first time

This post is part two of a series.

In the first post I showed you the files and software you can use to visualise Ancestry DNA matches. Today we’re going to load  the match and in-common-with files you downloaded using the DNAGedcom client, and have our first look at a graph.

Getting set up and loading the files is not difficult, but there are a lot of steps to follow and details to note. I’ve suggested some check points at which you should save your progress.  If you miss a detail you won’t have to start from the beginning. Just reopen the file and resume from the last save point.

The first time you try this, give yourself at least forty five minutes at a time when you feel ready to concentrate.

It’s much quicker when you get used to it. The entire process described below takes me less than five minutes.

Thank you to my husband and to Aillin O’Brien who tested these instructions and provided invaluable feedback.

An index to this series of posts is available here.

Preparation: Set up a Spreadsheet for Additional Input

There’s one final step of preparation before we load the data.

If we load the information as it is, the chart will show connections between the test taker (I’ll call that person “you”) and every one of their matches. All you will see is a mass of dots. It’s also likely to tie up your computer while it thinks about all those lines it has to draw. I’ve made this mistake more than once... The graph appears eventually, but it isn’t very useful.

This will also occur when a direct line relative who can be expected to share a substantial number of matches with you from across your tree has also tested - a sibling, parent, child or grandchild.

The most efficient way I have found to get around this is to load in a small additional spreadsheet. We can also use the new spreadsheet to add other information, but we’ll get to that later.

  • Open Excel and create a new workbook:  File – New – Blank Workbook.

    image

In the first row of your new workbook, type in the following column headings:

  • matchID
  • Match name
  • Match admin
  • Vertex 2
  • Name
  • Vertex Type
  • Edge Type
  • Visibility
  • Comment

image

We need to enter one line of information in this table for each person with a large number of matches.

Under matchid you will enter the test ID number that was assigned to that person’s test by Ancestry. For your own test you can the URL when you go to your DNA page on Ancestry. It will look something like this – you need the part marked red:

https://www.ancestry.com.au/dna/insights/AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE

  • Copy and paste your test ID number under both matchid and Vertex 2. 
  • Type the word “Skip” under Visibility.             <<< Don’t miss this step!

That’s all that’s strictly necessary for it to work, but a little extra information will remind you what this line is for later:

  • Put your name in three columns: Match name, Match admin and Name.
  • comment is for reminders to yourself. Put whatever you like there. I added a short explanatory note about what this line does.

image

Repeat the process for each close relative (sibling, parent, child or grandchild) who has DNA tested.

This time, use your own id as the matchid, and your close relative’s id as Vertex 2.

If you’re not the administrator for the test, you can find their match ID on your DNA match page. The red part is your ID, and the blue part is your relative.

https://www.ancestry.com.au/dna/tests/AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE
/match/VVVVVVVV-WWWW-XXXX-YYYY-ZZZZZZZZZZZZ

Alternatively you can look up the match ID numbers in the matches files.

Save the file somewhere you will find it again. I’ll call this file the Additional Input file from now on.

OK, we’re all set. Now we create a NodeXL file and load the information in.

Create a NodeXL Workbook

The method required to use the template may vary with your version of Excel.
I have an Office 365 subscription. I select File New PERSONALNodeXLGraph

If this method doesn’t work for you, try searching for “NodeXL” in the Windows “Search programs and files” field or equivalent on your system, and double click the “NodeXL Template” file returned.

image

A new spreadsheet will open. It may check for template updates as it opens, and you will need to wait for 20 seconds for the splash screen to close. Once it does, your screen should look something like this:

image

A new ribbon called NodeXL Basic has appeared. It won’t be there when you open a normal file, it will only appear when you are using the special files created with the template. Click on the new ribbon and take a look. This is where most of the action will take place.

image

Load your files

Open your match list (m_yourname.csv), in-common-with list (icw_yourname.csv), and additional input file in the normal way then return to the new NodeXL sheet. With each load we have to tell the template which fields in the file hold information about people (‘vertices’) and relationships (‘edges’). I’ll tell you what to put in at each stage.

Important step before loading the first time:

On the NodeXL ribbon, click the Import button. It’s at the far left hand side.

image

  • Choose Import Options… (the bottom item on the menu).
  • Clear the box next to “Clear the NodeXL workbook before data is imported”. There should be no tick in the box.

Why: Otherwise, no matter how many files you load only the last one loaded will be in the spreadsheet. We need all three files – matches, in-common-with, and additional input – to go in.

Save the file now that you’ve adjusted the setting.

In-common-with file

  • Select Import from the NodeXL ribbon and choose From Open Workbook…image
  • Select the in-common-with file (icw_yourname.csv) in the top box of the dialog that appears
  • Leave “Columns have headers” checked
  • Tick the boxes for “match id” and “icwid” under Is Edge Column. No other boxes should be ticked
  • Confirm that “matchid” is selected in the  “Which edge column is Vertex 1?” dropdown
  • Under “Which edge column is Vertex 2?” choose “icwid”
  • Click import (say OK to the message about text wrapping if you get it.)

image

Check the import

Navigate to the Edges worksheet using the tabs at the bottom left of the screen.

image

You should see lots of ID numbers in the “Vertex 1” and “Vertex 2” columns.

The ID numbers will overlap each other and the other columns. That doesn’t matter. You should not see any other data entered in the sheet at this stage.

If the import looks correct, save your progress and carry on.

Matches file

  • Open the import dialog again:
    image
  • Click on the matches file in the top box (m_yourname.csv)
  • Set “testid” and “matchid” as edge columns (tick boxes). No other boxes in that column should be ticked.
  • Under “Is Vertex 2 Property Column” check the boxes for:
    “name”,
    “admin”
    “SharedCM”,
    “note” and
    “matchurl”   (you’ll need to scroll all the way to the bottom to find this).
  • Choose “testid” in the dropdown box under “Which edge column is Vertex 1”
  • Choose “matchid” for “Which edge column is Vertex 2”.
  • Click import (say OK to the message about text wrapping if you get it.)

image

Check the import

Navigate to the Vertices worksheet.

image

Check that the first column “Vertex 1” contains ID numbers. You should not see any names or other information in that column.

Scroll right and check that the “names”, “admin”, “shared CM”, “note” and “matchurl” columns have appeared and have information in them. You may need to scroll right to see them.

image

If this looks right, save you progress and continue.

Additional Input file

Important: Don’t forget to load this file!

  • Open the import dialog again
    image
  • Click on the Additional Input file in the top box
  • Set “matchid” and “Vertex 2” as edge columns (no other boxes should be ticked)
  • Scroll down and tick “Name” and “Visibility” under Is Vertex 2 Property Column    <<< Don’t miss this step
  • Choose “matchid” in the dropdown box under “Which edge column is Vertex 1”
  • Choose “Vertex 2” in the dropdown box under “Which edge column is Vertex 2”
  • Click import (and clear the text wrapping message if it appears)

image

Check the import

Move to the Vertices worksheet again and find the row with your own name (Control-F will bring up a search box).

Confirm that the word “Skip” is in the Visibility column.

image

If this looks right, save again and move to the next step.

Now make a chart!

Find the toolbar in the chart area, and click Show Graph.

Troubleshooting: If it takes more than a few seconds, there was probably a problem with the additional input file. When Excel has finished drawing thousands of dots, go back and check the Additional Input file instructions again and make sure you’ve loaded it. If it was missing and you’ve fixed it, click Refresh Graph which will have appeared where you found Show Graph before.

image

Don’t be disappointed if your chart looks like the image below (and it probably will). It will get better with a few tweaks.

image

Identify groups

On the NodeXL ribbon, find the Groups button. Click it and select “Group by connected component” from the option list.

This option works well for me, but if you have a lot of very interconnected matches you might find that one of the choices under “Group by Cluster” works better.

image

Refresh Graph will add the newly created grouping information to the chart. Your chart will become more colourful but no more tidy. Just one more step, and you’ll have something more interesting to look at.

Separate groups in the chart

You can access layout options from both the NodeXL Ribbon, and the chart area. Click on the dropdown in either location and select Layout Options from the dropdown menu.

image

Change the “Layout Style” option to “Lay out each of the graph’s groups in its own box” and click OK.

image

Click Lay Out Again to apply that change to the chart. You didn’t need to refresh the graph a second time because the data itself didn’t change, only the layout instructions.

image

This is how mine looks now. Each dot represents a person I have a DNA match with. Each line represents a relationship between two of my matches.

Move back to the Vertices worksheet and see what happens when you click on the chart dots.

image

That’s plenty for today. Don’t forget to save your file!

Go and get yourself a nice cup of tea (or whatever beverage you prefer) knowing that if you’ve made it this far you can definitely manage the next steps I have in mind.

49 comments:

  1. Phew.Thanks for your explicit instructions.I now have 23 boxes with lines and coloured dots. Can't make heads or tails of them - presume the next step is to do something with the gathered trees. Bring it on.

    ReplyDelete
    Replies
    1. Well done! The next post will be shorter but should help you make more sense of what you're seeing. :-)

      Delete
  2. Shelley... I have gone back three times now and started over because when I type "skip" in the visibility box, it doesn't show later after I imported the three worksheets. Should I be putting the word in quotation marks or otherwise delineating the word? Meg S.

    ReplyDelete
    Replies
    1. Hi Meg,
      Just the word Skip should be enough with no quotation or other marks. Start with a capital letter, just to be safe.
      Can you find the line that relates to you on the Vertices worksheet? Does your name appear there? I'm trying to work out if the line went in at all, or if it's just that one field with a problem. If you can find the row with your name then you can type skip in manually until you work out where the problem is.

      Delete
  3. Yes I did find the line and I did try entering 'Skip" manually. I still ended up with a huge grey blob with a black blob on top of it. . I'm going to try it now one more time before I head to bed. Meg

    ReplyDelete
  4. Okay, I just tried importing everything again, after I double checked that I had entered skip under visability. When I looked next to my name and under visability on the Vertices page, the word did not appear. I am typing it in manually and going to try to proceed to see if sorting into groups makes any difference. Incidentally, I am using Windows 10 with Microsoft Office 2013, but haven't see any other problems. Meg

    ReplyDelete
    Replies
    1. Hi Meg, possible mea culpa here. I didn't think of everything! Do you have close family members - parents, children or siblings - who have tested? I've updated the instructions about the Additional Input file so that we will 'Skip' them as well. Hopefully it will resolve your problem. Fingers crossed!

      Delete
    2. Um...yeah...LOL.. I am an adoptee who was looking for her birth father's family (I found them in May). I am the administrator for both of my oldest children, my 1st cousin, my first cousin once removed and my 2nd cousin. So, I should "Skip" both of my children, but should leave my cousins?
      Meg

      Delete
    3. Definitely 'Skip' both your children. For the rest click on their 'dot' to see how much of the grey they account for. I expect you'll need to skip your 1C1R and probably your first cousin as well. It would be better not to skip your 2nd cousin if you can avoid it. I'm intending to offer further suggestions in my next post!

      Delete
    4. Hi Shelley,

      Thinking that perhaps my Microsoft Office accounted for some problems, I upgraded to Office 2016. Didn't help. I still cannot get the "Skip" to work, even if I manually enter it. So then, I tried removed all of my son's, daughter's, first cousin, first cousin once removed and second cousin's matches to see if that would clear anything out. It did clear a little but not much. I am wondering if the fact that my maternal matches far outweigh my paternal matches by about 80% - 20%, and the maternal side is heavily endogamous. If you have any other ideas I'll give it a try.

      Delete
  5. What can I say... wow! I discovered some additional links with this step :-) Looking forward to checking out the next step.

    ReplyDelete
  6. So I got through the icw import no apparent problems. Did the matches import as directed, but after loading, this error message pops up:
    ---------------------------
    NodeXL
    ---------------------------
    An unexpected problem occurred. If it occurs again, please copy the details to the clipboard by typing Ctrl-C, then post the details to http://www.codeplex.com/NodeXL/Thread/List.aspx.



    Details:



    [COMException]: Exception from HRESULT: 0x800A03EC



    at System.RuntimeType.ForwardCallToInvokeMember(String memberName, BindingFlags flags, Object target, Int32[] aWrapperTypes, MessageData& msgData)

    at Microsoft.Office.Interop.Excel.Range.get_Offset(Object RowOffset, Object ColumnOffset)

    at Smrf.AppLib.ExcelUtil.OffsetRange(Range& range, Int32 rowOffset, Int32 columnOffset)

    at Smrf.NodeXL.ExcelTemplate.GraphImporter.ImportEdges(IGraph oSourceGraph, String[] asEdgeAttributes, ListObject oEdgeTable, Range oVertex1NameColumnData, Range oVertex2NameColumnData, Boolean bAppendToTable)

    at Smrf.NodeXL.ExcelTemplate.GraphImporter.ImportGraph(IGraph sourceGraph, String[] edgeAttributes, String[] vertexAttributes, Boolean clearTablesFirst, Workbook destinationNodeXLWorkbook)

    at Smrf.NodeXL.ExcelTemplate.ThisWorkbook.ImportGraph(IGraph oGraph, String[] oEdgeAttributes, String[] oVertexAttributes, String sGraphSource, String sGraphTerm, String sImportDescription, String sSuggestedTitle, String sSuggestedFileNameNoExtension)
    ---------------------------
    OK
    ---------------------------
    HELP!

    ReplyDelete
    Replies
    1. I can only suggest closing the file, reopening, and trying again. Follow the directions at the top of the error to get help direct from NodeXL if it doesn't resolve.

      Delete
  7. Thank you for the step by step instructions. It seems I still goofed up. My graph shows up with the many dots of varied sizes but there is a gray background behind them. When I do the step for making the boxes, nothing happens. Any thoughts where I went wrong?. Denise

    ReplyDelete
    Replies
    1. Did you follow the step to skip yourself and close relatives, and check that the 'skip' instruction went in?

      Delete
    2. Shelley, the only people tested so far are my mom and myself. I put us both in the "additional" excel sheet and skip in visibility.

      Delete
    3. Have you checked that the word 'Skip' appears in the NodeXL worksheet (sorry to ask the obvious, but that's where I usually go wrong!). If you match with anyone else who has a lot of links to your matches you will see something like that. How close are your next closest matches? Is there endogamy in your tree?

      Delete
    4. I have checked "skip" over and over. I will start over tomorrow as I have realized a 2nd cousin tested so I will add her into the Additional Info Sheet. There is very little endogamy on my mother's side that I am aware of. Thanks for your help.

      Delete
  8. Shelly, Does the Chart have to share workspace with the spreadsheet or how do I "pop it out" to make it easier to see?

    ReplyDelete
    Replies
    1. You can pull it out by grabbing and dragging the "Document Actions" title bar. It's a bit tricky to place as it will 'snap' itself to the edges. I've found it best to let go when you're over the worksheet itself, then resize into place (rather than moving the box).

      Delete
  9. I am loading my matches and ICW files but only did 4th cousins...should I have done them all?

    ReplyDelete
    Replies
    1. It's up to you. I loaded distant relatives in mine, but I don't have that many close matches. If you have a large volume of shared matches you might decide to keep it manageable by limiting to fourth cousins.

      Delete
  10. Awesome - I love it however I can't see all my vertices. Some known ancestors that I should be able to place in a group are not showing up at all. Shouldn't all my vertices show up in the graph. Also I seem to have overwhelmed it with my husband's mother's Mennonite ancestry. Too many lines!

    ReplyDelete
    Replies
    1. Only matches with shared match links will show. If all your vertices were in the chart you would have the dreaded blob! Your matches with known ancestors should show if 1. you haven't 'skipped' them and 2. they also match other people in your file.

      Delete
    2. Thanks Shelley - That makes sense now. I haven't skipped them so they just must not have matches to anyone else, yet...

      Delete
  11. I get kind of one big cluster of matches but not the boxes. I see boxes over to the right side of the graph but can't enlarge. I have re-tried a couple of times but get the same result. Not sure if I am missing something or misunderstanding the output.

    ReplyDelete
    Replies
    1. Hi Steve, have you 'skipped' close relatives? If so, for now I'd suggesting trying different clustering algorithms (NodeXL Ribbon, Groups button, "Group by cluster"). I'm currently preparing a post with more ways to work with a moderately complex graph.

      Delete
  12. I did try Group by Cluster but it didn't change anything. This test is using my wife's results. I did set the file up to SKIP her but the closest relatives she has are a couple of 2nd cousins. Should I skip them too?

    ReplyDelete
    Replies
    1. Hi Steve, I'll be posting in the near future about things you can try.

      Delete
  13. OK, I thought the skip instruction only applied to immediate family. Should I have included first and second cousins in the "Additional Input" sheet? I got much denser clusters than yours.

    ReplyDelete
    Replies
    1. Hi Ted, I started using this tool to try to get the most from my rather sparse matches! I'll be posting in the near future about working with more complicated data - but skipping the closer cousins is an option. Try with just the first cousins at first.

      Delete
  14. Thank you for sharing this very useful tool. The insights the graphic charts added to my family DNA "brick walls" were amazing.

    ReplyDelete
    Replies
    1. You're very welcome, and I love hearing that these posts have helped!

      Delete
  15. Shelley, I am determined to get this right. I feel like it is the additional info spreadsheet that is the trouble. Please tell me if I am correct on these points. I am only concerned about my mother's family; so under match ID (col. A) I insert her ancestry number. Col. B. is her name, Col. C is my name since in am the administrator. Col. D. is her ancestry ID, Col E. is her name and Col. H is Skip. As I read the instructions, her ID goes in Col. A all the way down for anyone I am skipping, is that correct? Then I add myself (daughter) in B,C,D,E,H. Is it correct to have over 17,000 lines of info for one graph. Thank you so much. Denise Stanton

    ReplyDelete
  16. Shelley,
    I have written out the instructions in a step-by-step method, just to make certain I am doing this correctly. I grabbed my 4th cousin's file, since she doesn't administer anyone and doesn't have any super close cousins. I even limited her DNAGedcom download from Ancestry to 4th cousins or closer because we do have endogamy in our family tree. The "Skip" command still will not take and doesn't even seem to be working when I enter it manually. I consistently get a splotchy gray background with black dots in the foreground. I am now using Office 2016 on Windows 10 with the free NodeXL download. Do you have a contact perhaps at the Social Media Research Foundation? I really, REALLY want this to work because I am a visual learner. And I don't mind spending $$ for good software, but at this point I am getting very frustrated. Thank you. Meg Staton

    ReplyDelete
    Replies
    1. Hi Meg, are you quite sure the 'skip' line isn't taking? If you have endogamy in your tree you WILL get a grey background from all the interlinked people. Splotchy grey actually sounds promising - there might be some room to clean thing up. Can you hang in there until my next post? It might help.

      Delete
  17. Well, since nothing else seems to be working ... and I definately am not going forward ... I guess I'll wait on your next post ;-)

    Meg

    ReplyDelete
  18. OK. I am having trouble getting the NodeXL template to open. I have confirmed it is downloaded, and when I click on NodeBasicXLTemplateSetup2014 it says I have already set it up. However, my view of Excel has not changed. I am in Microsoft Excel Home and Student 2010 version of Excel. And when I ask how to import files it just says open the file with Excel. My other problem is that I do not have any close matches to open a icw file with.

    ReplyDelete
    Replies
    1. Hi Anonymous,

      You won't see any difference when you open Excel normally, only when you create a new worksheet using the template. If you open Excel and click File - New then hopefully you will see it as an option in the list of templates.

      If not, do you know how to search your PC? The file you need to find is NodeXLGraph.xltx. On my system it's sitting in Documents/Custom Office Templates. If you find it, double click to create a new worksheet based on the template (don't open it using File Open from Excel or you will be editing the template itself).

      I hope it works for you. If it does, could you please comment here again so that other people will know what steps worked?

      If you can't find it, please message me using the "CONTACT ME" link at the top of my blog as there's one more thing I can think of but I'll need to email you.

      Delete
  19. I'm not sure I have done this first phase correctly. My graph looks nothing like yours. It looks like a giant pincushion purple dots with solid grey in the background. I tried group by component but only got 1 group in visual properties. When I used group by cluster 23. After refreshing the graph I now have the pincusion, but with lots of multi coloured pins in a straight line down the right hand side of the screen. All my imports look OK. I am skipping me and my mothers test. Skip is there for both of us. Does the names have to be exactly the same as ancestry? I put my ancestry admin name in, not the same name as I have listed for myself. Cauld that be causing it?

    ReplyDelete
    Replies
    1. Hi Genemonkey, everything you're doing sounds OK. My clusters separate out that way because I have a relatively small number of matches, and geographically dispersed ancestors. Try the Wakita-Tsurumi clustering algorithm if you haven't already, it breaks charts more than the first one. Assuming that moves things in the right direction, some of my later posts talk about making sense of a busy chart. You're not the only one with this experience!

      Delete
  20. I completed the tasks in part 2, and generated a graph, but my graph has so many people in it that I can only see a great blob of ink with some connecting lines out at the perimeter. I tried grouping, separating groups, but I cannot see the results. How can I see the graph on its own, not in the right hand panel next to the spreadsheet? And how can I see the separate groups that you show in your second image? Thank you.

    ReplyDelete
    Replies
    1. Check that you 'skipped' yourself and any very close relatives.
      Try a different clustering algorithm - Groups > Group by cluster... > Wakita-Tsurumi breaks charts up the most.
      Check that you've followed the other steps suggested.
      It's quite possible your chart will still be complicated afterwards (my graph is very sparse!). If you see different colours for different groups you're on the right track. The two posts about busy charts should help.
      Good luck!

      Delete
  21. I just wanted to say, I'm with you so far and thank you very much indeed for providing such detailed step by step instructions.

    ReplyDelete
  22. I have run into a glitch, and I am hoping you can give me some ideas about what might be wrong. This is the fourth one of these spreadsheets I have done and the first time I have seen anything like it. It appears that NodeXL is not recognizing groups. Although this man has 12,000 matches on the Group Vertices page, NodeXL has found only 4 groups, with all but 10 of the matches in Group 1. Of course the single group remains a blob in the middle of the graph, regardless of layout settings. I have restarted the sheet from scratch 3 times, and found a minor error in my first Additional Input sheet, but the end result remains one huge and three tiny groups. Any ideas?

    ReplyDelete
    Replies
    1. What type of grouping are you using? If you're using "Group by connected component" try "Group by Cluster". The "Wakita-Tsurumi" option usually breaks clusters up the most.

      Delete
  23. Hi- thanks for the quick answer. On the problem spreadsheet, I tried both component and cluster. My cheat sheet says use component, but I am not sure I have done that consistently on all three of the successful ones. I have not tried "Wakita-Tsurumi" at all, so I will give that a try.

    I posted my question on this blog installment because the problem pretty much has to be either in the input files or in my setup of the various options. I have tried to find differences in all of those factors compared to successful spreadsheets, but no luck so far. Nevertheless, it has to be here somewhere!

    Thanks again

    ReplyDelete
  24. YES! Your suggestion worked! It looks like we are back in business.

    Thanks!

    ReplyDelete