Twigs of Yore: Visualising Ancestry DNA matches-Part 2-Loading files the first time

Blog post

Sunday, July 2, 2017

Visualising Ancestry DNA matches-Part 2-Loading files the first time

This post is part two of a series.

In the first post I showed you the files and software you can use to visualise Ancestry DNA matches. Today we’re going to load the match and in-common-with files you downloaded using the DNAGedcom client, and have our first look at a graph.

Getting set up and loading the files is not difficult, but there are a lot of steps to follow and details to note. I’ve suggested some check points at which you should save your progress. If you miss a detail you won’t have to start from the beginning. Just reopen the file and resume from the last save point.

The first time you try this, give yourself at least forty five minutes at a time when you feel ready to concentrate.

It’s much quicker when you get used to it. The entire process described below takes me less than five minutes.

Thank you to my husband and to Aillin O’Brien who tested these instructions and provided invaluable feedback.

An index to this series of posts is available here.

Preparation: Set up a Spreadsheet for Additional Input

There’s one final step of preparation before we load the data.

If we load the information as it is, the chart will show connections between the test taker (I’ll call that person “you”) and every one of their matches. All you will see is a mass of dots. It’s also likely to tie up your computer while it thinks about all those lines it has to draw. I’ve made this mistake more than once... The graph appears eventually, but it isn’t very useful.

This will also occur when a direct line relative who can be expected to share a substantial number of matches with you from across your tree has also tested - a sibling, parent, child or grandchild.

The most efficient way I have found to get around this is to load in a small additional spreadsheet. We can also use the new spreadsheet to add other information, but we’ll get to that later.

Open Excel and create a new workbook: File – New – Blank Workbook.

In the first row of your new workbook, type in the following column headings:

matchID
Match name
Match admin
Vertex 2
Name
Vertex Type
Edge Type
Visibility
Comment

We need to enter one line of information in this table for each person with a large number of matches.

Under matchid you will enter the test ID number that was assigned to that person’s test by Ancestry. For your own test you can the URL when you go to your DNA page on Ancestry. It will look something like this – you need the part marked red:

https://www.ancestry.com.au/dna/insights/AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE

Copy and paste your test ID number under both matchid and Vertex 2.
Type the word “Skip” under Visibility. <<< Don’t miss this step!

That’s all that’s strictly necessary for it to work, but a little extra information will remind you what this line is for later:

Put your name in three columns: Match name, Match admin and Name.
comment is for reminders to yourself. Put whatever you like there. I added a short explanatory note about what this line does.

Repeat the process for each close relative (sibling, parent, child or grandchild) who has DNA tested.

This time, use your own id as the matchid, and your close relative’s id as Vertex 2.

If you’re not the administrator for the test, you can find their match ID on your DNA match page. The red part is your ID, and the blue part is your relative.

https://www.ancestry.com.au/dna/tests/AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE
/match/VVVVVVVV-WWWW-XXXX-YYYY-ZZZZZZZZZZZZ

Alternatively you can look up the match ID numbers in the matches files.

Save the file somewhere you will find it again. I’ll call this file the Additional Input file from now on.

OK, we’re all set. Now we create a NodeXL file and load the information in.

Create a NodeXL Workbook

The method required to use the template may vary with your version of Excel.
I have an Office 365 subscription. I select File – New – PERSONAL – NodeXLGraph

If this method doesn’t work for you, try searching for “NodeXL” in the Windows “Search programs and files” field or equivalent on your system, and double click the “NodeXL Template” file returned.

A new spreadsheet will open. It may check for template updates as it opens, and you will need to wait for 20 seconds for the splash screen to close. Once it does, your screen should look something like this:

A new ribbon called NodeXL Basic has appeared. It won’t be there when you open a normal file, it will only appear when you are using the special files created with the template. Click on the new ribbon and take a look. This is where most of the action will take place.

Load your files

Open your match list (m_yourname.csv), in-common-with list (icw_yourname.csv), and additional input file in the normal way then return to the new NodeXL sheet. With each load we have to tell the template which fields in the file hold information about people (‘vertices’) and relationships (‘edges’). I’ll tell you what to put in at each stage.

Important step before loading the first time:

On the NodeXL ribbon, click the Import button. It’s at the far left hand side.

Choose Import Options… (the bottom item on the menu).
Clear the box next to “Clear the NodeXL workbook before data is imported”. There should be no tick in the box.

Why: Otherwise, no matter how many files you load only the last one loaded will be in the spreadsheet. We need all three files – matches, in-common-with, and additional input – to go in.

Save the file now that you’ve adjusted the setting.

In-common-with file

Select Import from the NodeXL ribbon and choose From Open Workbook…

Select the in-common-with file (icw_yourname.csv) in the top box of the dialog that appears
Leave “Columns have headers” checked
Tick the boxes for “match id” and “icwid” under Is Edge Column. No other boxes should be ticked
Confirm that “matchid” is selected in the “Which edge column is Vertex 1?” dropdown
Under “Which edge column is Vertex 2?” choose “icwid”
Click import (say OK to the message about text wrapping if you get it.)

Check the import

Navigate to the Edges worksheet using the tabs at the bottom left of the screen.

You should see lots of ID numbers in the “Vertex 1” and “Vertex 2” columns.

The ID numbers will overlap each other and the other columns. That doesn’t matter. You should not see any other data entered in the sheet at this stage.

If the import looks correct, save your progress and carry on.

Matches file

Open the import dialog again:

Click on the matches file in the top box (m_yourname.csv)
Set “testid” and “matchid” as edge columns (tick boxes). No other boxes in that column should be ticked.
Under “Is Vertex 2 Property Column” check the boxes for:
“name”,
“admin”
“SharedCM”,
“note” and
“matchurl” (you’ll need to scroll all the way to the bottom to find this).
Choose “testid” in the dropdown box under “Which edge column is Vertex 1”
Choose “matchid” for “Which edge column is Vertex 2”.
Click import (say OK to the message about text wrapping if you get it.)

Check the import

Navigate to the Vertices worksheet.

Check that the first column “Vertex 1” contains ID numbers. You should not see any names or other information in that column.

Scroll right and check that the “names”, “admin”, “shared CM”, “note” and “matchurl” columns have appeared and have information in them. You may need to scroll right to see them.

If this looks right, save you progress and continue.

Additional Input file

Important: Don’t forget to load this file!

Open the import dialog again

Click on the Additional Input file in the top box
Set “matchid” and “Vertex 2” as edge columns (no other boxes should be ticked)
Scroll down and tick “Name” and “Visibility” under Is Vertex 2 Property Column <<< Don’t miss this step
Choose “matchid” in the dropdown box under “Which edge column is Vertex 1”
Choose “Vertex 2” in the dropdown box under “Which edge column is Vertex 2”
Click import (and clear the text wrapping message if it appears)

Check the import

Move to the Vertices worksheet again and find the row with your own name (Control-F will bring up a search box).

Confirm that the word “Skip” is in the Visibility column.

If this looks right, save again and move to the next step.

Now make a chart!

Find the toolbar in the chart area, and click Show Graph.

Troubleshooting: If it takes more than a few seconds, there was probably a problem with the additional input file. When Excel has finished drawing thousands of dots, go back and check the Additional Input file instructions again and make sure you’ve loaded it. If it was missing and you’ve fixed it, click Refresh Graph which will have appeared where you found Show Graph before.

Don’t be disappointed if your chart looks like the image below (and it probably will). It will get better with a few tweaks.

Identify groups

On the NodeXL ribbon, find the Groups button. Click it and select “Group by connected component” from the option list.

This option works well for me, but if you have a lot of very interconnected matches you might find that one of the choices under “Group by Cluster” works better.

Refresh Graph will add the newly created grouping information to the chart. Your chart will become more colourful but no more tidy. Just one more step, and you’ll have something more interesting to look at.

Separate groups in the chart

You can access layout options from both the NodeXL Ribbon, and the chart area. Click on the dropdown in either location and select Layout Options from the dropdown menu.

Change the “Layout Style” option to “Lay out each of the graph’s groups in its own box” and click OK.

Click Lay Out Again to apply that change to the chart. You didn’t need to refresh the graph a second time because the data itself didn’t change, only the layout instructions.

This is how mine looks now. Each dot represents a person I have a DNA match with. Each line represents a relationship between two of my matches.

Move back to the Vertices worksheet and see what happens when you click on the chart dots.

That’s plenty for today. Don’t forget to save your file!

Go and get yourself a nice cup of tea (or whatever beverage you prefer) knowing that if you’ve made it this far you can definitely manage the next steps I have in mind.

82 comments:

GeniAusJuly 6, 2017 at 11:47 PM
Phew.Thanks for your explicit instructions.I now have 23 boxes with lines and coloured dots. Can't make heads or tails of them - presume the next step is to do something with the gathered trees. Bring it on.
ReplyDelete
Replies
AnonymousJuly 8, 2017 at 9:59 AM
Shelley... I have gone back three times now and started over because when I type "skip" in the visibility box, it doesn't show later after I imported the three worksheets. Should I be putting the word in quotation marks or otherwise delineating the word? Meg S.
ReplyDelete
Replies
AnonymousJuly 8, 2017 at 11:48 AM
Yes I did find the line and I did try entering 'Skip" manually. I still ended up with a huge grey blob with a black blob on top of it. . I'm going to try it now one more time before I head to bed. Meg
ReplyDelete
Replies
AnonymousJuly 8, 2017 at 11:59 AM
Okay, I just tried importing everything again, after I double checked that I had entered skip under visability. When I looked next to my name and under visability on the Vertices page, the word did not appear. I am typing it in manually and going to try to proceed to see if sorting into groups makes any difference. Incidentally, I am using Windows 10 with Microsoft Office 2013, but haven't see any other problems. Meg
ReplyDelete
Replies
ReneJuly 8, 2017 at 5:25 PM
What can I say... wow! I discovered some additional links with this step :-) Looking forward to checking out the next step.
ReplyDelete
Replies
UnknownJuly 9, 2017 at 4:50 AM
So I got through the icw import no apparent problems. Did the matches import as directed, but after loading, this error message pops up:
---------------------------
NodeXL
---------------------------
An unexpected problem occurred. If it occurs again, please copy the details to the clipboard by typing Ctrl-C, then post the details to http://www.codeplex.com/NodeXL/Thread/List.aspx.

Details:

[COMException]: Exception from HRESULT: 0x800A03EC

at System.RuntimeType.ForwardCallToInvokeMember(String memberName, BindingFlags flags, Object target, Int32[] aWrapperTypes, MessageData& msgData)

at Microsoft.Office.Interop.Excel.Range.get_Offset(Object RowOffset, Object ColumnOffset)

at Smrf.AppLib.ExcelUtil.OffsetRange(Range& range, Int32 rowOffset, Int32 columnOffset)

at Smrf.NodeXL.ExcelTemplate.GraphImporter.ImportEdges(IGraph oSourceGraph, String[] asEdgeAttributes, ListObject oEdgeTable, Range oVertex1NameColumnData, Range oVertex2NameColumnData, Boolean bAppendToTable)

at Smrf.NodeXL.ExcelTemplate.GraphImporter.ImportGraph(IGraph sourceGraph, String[] edgeAttributes, String[] vertexAttributes, Boolean clearTablesFirst, Workbook destinationNodeXLWorkbook)

at Smrf.NodeXL.ExcelTemplate.ThisWorkbook.ImportGraph(IGraph oGraph, String[] oEdgeAttributes, String[] oVertexAttributes, String sGraphSource, String sGraphTerm, String sImportDescription, String sSuggestedTitle, String sSuggestedFileNameNoExtension)
---------------------------
OK
---------------------------
HELP!
ReplyDelete
Replies
DeniseJuly 9, 2017 at 5:07 AM
Thank you for the step by step instructions. It seems I still goofed up. My graph shows up with the many dots of varied sizes but there is a gray background behind them. When I do the step for making the boxes, nothing happens. Any thoughts where I went wrong?. Denise
ReplyDelete
Replies
UnknownJuly 9, 2017 at 11:08 PM
Shelly, Does the Chart have to share workspace with the spreadsheet or how do I "pop it out" to make it easier to see?
ReplyDelete
Replies
UnknownJuly 10, 2017 at 4:38 AM
I am loading my matches and ICW files but only did 4th cousins...should I have done them all?
ReplyDelete
Replies
SharonD214July 10, 2017 at 10:56 AM
Awesome - I love it however I can't see all my vertices. Some known ancestors that I should be able to place in a group are not showing up at all. Shouldn't all my vertices show up in the graph. Also I seem to have overwhelmed it with my husband's mother's Mennonite ancestry. Too many lines!
ReplyDelete
Replies
StonewallJuly 11, 2017 at 9:38 AM
I get kind of one big cluster of matches but not the boxes. I see boxes over to the right side of the graph but can't enlarge. I have re-tried a couple of times but get the same result. Not sure if I am missing something or misunderstanding the output.
ReplyDelete
Replies
StonewallJuly 12, 2017 at 10:22 AM
I did try Group by Cluster but it didn't change anything. This test is using my wife's results. I did set the file up to SKIP her but the closest relatives she has are a couple of 2nd cousins. Should I skip them too?
ReplyDelete
Replies
UnknownJuly 14, 2017 at 1:24 PM
OK, I thought the skip instruction only applied to immediate family. Should I have included first and second cousins in the "Additional Input" sheet? I got much denser clusters than yours.
ReplyDelete
Replies
AnonymousJuly 15, 2017 at 4:08 AM
Thank you for sharing this very useful tool. The insights the graphic charts added to my family DNA "brick walls" were amazing.
ReplyDelete
Replies
DeniseJuly 18, 2017 at 9:23 AM
Shelley, I am determined to get this right. I feel like it is the additional info spreadsheet that is the trouble. Please tell me if I am correct on these points. I am only concerned about my mother's family; so under match ID (col. A) I insert her ancestry number. Col. B. is her name, Col. C is my name since in am the administrator. Col. D. is her ancestry ID, Col E. is her name and Col. H is Skip. As I read the instructions, her ID goes in Col. A all the way down for anyone I am skipping, is that correct? Then I add myself (daughter) in B,C,D,E,H. Is it correct to have over 17,000 lines of info for one graph. Thank you so much. Denise Stanton
ReplyDelete
Replies
AnonymousJuly 19, 2017 at 11:13 AM
Shelley,
I have written out the instructions in a step-by-step method, just to make certain I am doing this correctly. I grabbed my 4th cousin's file, since she doesn't administer anyone and doesn't have any super close cousins. I even limited her DNAGedcom download from Ancestry to 4th cousins or closer because we do have endogamy in our family tree. The "Skip" command still will not take and doesn't even seem to be working when I enter it manually. I consistently get a splotchy gray background with black dots in the foreground. I am now using Office 2016 on Windows 10 with the free NodeXL download. Do you have a contact perhaps at the Social Media Research Foundation? I really, REALLY want this to work because I am a visual learner. And I don't mind spending $$ for good software, but at this point I am getting very frustrated. Thank you. Meg Staton
ReplyDelete
Replies
AnonymousJuly 20, 2017 at 6:46 AM
Well, since nothing else seems to be working ... and I definately am not going forward ... I guess I'll wait on your next post ;-)

Meg
ReplyDelete
Replies
AnonymousAugust 2, 2017 at 3:33 AM
OK. I am having trouble getting the NodeXL template to open. I have confirmed it is downloaded, and when I click on NodeBasicXLTemplateSetup2014 it says I have already set it up. However, my view of Excel has not changed. I am in Microsoft Excel Home and Student 2010 version of Excel. And when I ask how to import files it just says open the file with Excel. My other problem is that I do not have any close matches to open a icw file with.
ReplyDelete
Replies
GenemonkeyAugust 13, 2017 at 3:35 PM
I'm not sure I have done this first phase correctly. My graph looks nothing like yours. It looks like a giant pincushion purple dots with solid grey in the background. I tried group by component but only got 1 group in visual properties. When I used group by cluster 23. After refreshing the graph I now have the pincusion, but with lots of multi coloured pins in a straight line down the right hand side of the screen. All my imports look OK. I am skipping me and my mothers test. Skip is there for both of us. Does the names have to be exactly the same as ancestry? I put my ancestry admin name in, not the same name as I have listed for myself. Cauld that be causing it?
ReplyDelete
Replies
triovlaifAugust 15, 2017 at 1:36 AM
I completed the tasks in part 2, and generated a graph, but my graph has so many people in it that I can only see a great blob of ink with some connecting lines out at the perimeter. I tried grouping, separating groups, but I cannot see the results. How can I see the graph on its own, not in the right hand panel next to the spreadsheet? And how can I see the separate groups that you show in your second image? Thank you.
ReplyDelete
Replies
ElaineAugust 17, 2017 at 6:41 PM
I just wanted to say, I'm with you so far and thank you very much indeed for providing such detailed step by step instructions.
ReplyDelete
Replies
UnknownAugust 20, 2017 at 1:12 PM
I have run into a glitch, and I am hoping you can give me some ideas about what might be wrong. This is the fourth one of these spreadsheets I have done and the first time I have seen anything like it. It appears that NodeXL is not recognizing groups. Although this man has 12,000 matches on the Group Vertices page, NodeXL has found only 4 groups, with all but 10 of the matches in Group 1. Of course the single group remains a blob in the middle of the graph, regardless of layout settings. I have restarted the sheet from scratch 3 times, and found a minor error in my first Additional Input sheet, but the end result remains one huge and three tiny groups. Any ideas?
ReplyDelete
Replies
UnknownAugust 21, 2017 at 3:14 AM
Hi- thanks for the quick answer. On the problem spreadsheet, I tried both component and cluster. My cheat sheet says use component, but I am not sure I have done that consistently on all three of the successful ones. I have not tried "Wakita-Tsurumi" at all, so I will give that a try.

I posted my question on this blog installment because the problem pretty much has to be either in the input files or in my setup of the various options. I have tried to find differences in all of those factors compared to successful spreadsheets, but no luck so far. Nevertheless, it has to be here somewhere!

Thanks again
ReplyDelete
Replies
UnknownAugust 21, 2017 at 11:15 AM
YES! Your suggestion worked! It looks like we are back in business.

Thanks!
ReplyDelete
Replies
UnknownSeptember 1, 2017 at 12:11 AM
I hope someone can help me. I am still trying to get my hands around the input data. As we get the data from dnagedcom client in a csv, it contains all the icws and matches. How to filter out just the 4th cousin and closer data or is it even necessary. I know i can delete the distant cousins in m_ file, but is it even necessary because I can't really delete anything so easily in the icw_ file? When I make a graph with only the 4th or closer, the program still reference the names in the icw_ and shows them in the graph, making the graph more complicated. Any hints are greatly appreciated.
ReplyDelete
Replies
AnonymousOctober 22, 2017 at 12:46 PM
Hi Shelley, my graph is taking forever to draw. I've waited more than 10 minutes and it's still processing. I went back (twice) to your instruction about importing the "Additional File" and the result is still the same. Everything else looked just fine. Any ideas on what the problem may be? Thanks. --Ken Waters
ReplyDelete
Replies
AnonymousOctober 23, 2017 at 4:59 AM
Thanks, Shelley. I finally was able to get a graph and finish unit #2. The test owner was marked as Skip. The other close relatives (two sons, two grandchildren) were not marked as Skip even though they were in the "Additional File" spreadsheet. So, I did go into the big spreadsheet and manually marked them as "Skip" as well. Perhaps it's because there were so many matches, more than 32,000.
ReplyDelete
Replies
AnonymousOctober 26, 2017 at 7:57 AM
Hi Shelly, I am at the import stage for the icw file. When I click on import I get a message that says "If the columns in the other workbook have headers, then there must be at least two rows". I have clicked around and I can't find any help. Help! Thanks.
ReplyDelete
Replies
AnonymousOctober 26, 2017 at 2:32 PM
I've closed out for the night, and will work with it again in the morning. My screen looks exactly like the sample when I am at the import step. When I click on import, there is a pause and then that message pops up. Open to playing with any ideas....just an aside, I LOVE this tool and appreciate all of the work you have put in to writing up such great directions with illustrations. Thank YOU. (Now I just need to get it to work!)
ReplyDelete
Replies
UnknownNovember 12, 2017 at 9:58 PM
Hi Shelley, I wonder if you might have an idea on what I've done wrong. I've been through the process twice, reading everything carefully (or so I thought), and yet I get nothing at the end. When I hit the button to make the graph, it thinks for about one second and acts like it's done, but no graph shows up. When I hit "refresh" --- same thing. I can move freely around the spread, so it's obviously not hung in the middle of thought. I use Excel every day at work (although nothing this complicated), so I feel that I am fairly well versed in its basic use. Any help would be greatly appreciated! Thank you in advance!
ReplyDelete
Replies
AnonymousDecember 2, 2017 at 4:03 AM
Hello, Thank you very much for your wonderful program. I am at step 2 and trying to load my Additional Worksheet. I receive this error: [OutOfMemoryException]: Exception of type 'System.OutOfMemoryException' was thrown.
Is there a workaround for this? My icw and match files loaded fine. I enjoy your step by step, detailed explanations! Thank you Lynn
ReplyDelete
Replies
SmithHunter1783January 1, 2018 at 2:02 PM
Loading "Additional Input File.xlsx" and it erases everything from Edges and Vertices tables. What am I doing wrong?
ReplyDelete
Replies
UnknownJanuary 2, 2018 at 8:49 PM
I imported the three worksheets. Should I be putting the word in quotation marks or otherwise delineating the word?
ReplyDelete
Replies
Jane BonnyFebruary 23, 2018 at 9:23 AM
Yes, indeed, thank you for the explicit instructions. I managed to get to step 3 of Import. That is where I have a problem. I don't have an "additional input file." I do, however, have a file that begins with "a_..." I tried that but did not get the subsequent questions/boxes/steps. How do I get the "additional input>?" I tried Googling it and was referred back here.
ReplyDelete
Replies
dBMarch 9, 2018 at 11:19 AM
Is the "Additional Input" file to be saved as .xls or .csv? Thanks!
ReplyDelete
Replies
Alexandra DixonJuly 9, 2018 at 2:25 AM
I just created a new nodexl graph for a dna test with which I am intimately familiar, the first one I ever "solved" (for an adopted friend). I have skipped him but nobody else. He has two maternal uncles ~1800 cm, I have not skipped them. When I create the graph and tell it to put each group into a separate box, I get ONE group (all dark blue). I have tried tweaking the upper level of the cm to show such that the uncles are excluded, and that doesn't appear to break the data up into more than one group (would it?). This is not endogamous data, quite the opposite. There's his father's side which has colonial roots mostly from England, his mother's side - maternal is 100% norwegian immigrants, paternal is 100% scottish immigrants, the only overlap between the Scottish and the Norwegian is his mother (who hasn't tested) and her two brothers (who have).

Yet somehow nodexl thinks this is one big group?
ReplyDelete
Replies
John MOTZIAugust 16, 2018 at 1:22 AM
Instead of using the “Additional Input” file to skip the close matches I do the following:

When importing the matches file, I select “range” item in addition to the other recommended items (“name”, “admin”, etc.) for “Is Vertex 2 Property Column”. Range is the relationship bucket displayed in Ancestry.

Then I go to the Vertices tab of the workbook and set the Range column to display only the close matches (Parent/Child, Close relations, etc.) and also blanks. You can then change the display option for those lines that you wish to hide or skip. The line with the blank relationship is the test taker. While I am there I usually add SELF to the name field of the test taker to keep track of it.

Now reset the range column to display all and that’s it. No need for that third spreadsheet.
ReplyDelete
Replies
Kent JSeptember 21, 2018 at 11:27 AM
If I am only interested in the grouping data and not the plot, does it matter if I enter skip in my closest matches? I am thinking they could help me identify which part of my tree these groups are.
ReplyDelete
Replies
Yvette HoitinkOctober 2, 2018 at 7:05 AM
This is seriously awesome. I had a large Excel-sheet of known associates of a person of interest, and their associates. The NodeXL template together with your instructions allowed me to visualize the whole network. Thank you so much!
ReplyDelete
Replies