Blog post

Sunday, July 2, 2017

Visualising Ancestry DNA matches-Part 2-Loading files the first time

This post is part two of a series.

In the first post I showed you the files and software you can use to visualise Ancestry DNA matches. Today we’re going to load  the match and in-common-with files you downloaded using the DNAGedcom client, and have our first look at a graph.

Getting set up and loading the files is not difficult, but there are a lot of steps to follow and details to note. I’ve suggested some check points at which you should save your progress.  If you miss a detail you won’t have to start from the beginning. Just reopen the file and resume from the last save point.

The first time you try this, give yourself at least forty five minutes at a time when you feel ready to concentrate.

It’s much quicker when you get used to it. The entire process described below takes me less than five minutes.

Thank you to my husband and to Aillin O’Brien who tested these instructions and provided invaluable feedback.

An index to this series of posts is available here.

Preparation: Set up a Spreadsheet for Additional Input

There’s one final step of preparation before we load the data.

If we load the information as it is, the chart will show connections between the test taker (I’ll call that person “you”) and every one of their matches. All you will see is a mass of dots. It’s also likely to tie up your computer while it thinks about all those lines it has to draw. I’ve made this mistake more than once... The graph appears eventually, but it isn’t very useful.

This will also occur when a direct line relative who can be expected to share a substantial number of matches with you from across your tree has also tested - a sibling, parent, child or grandchild.

The most efficient way I have found to get around this is to load in a small additional spreadsheet. We can also use the new spreadsheet to add other information, but we’ll get to that later.

  • Open Excel and create a new workbook:  File – New – Blank Workbook.

    image

In the first row of your new workbook, type in the following column headings:

  • matchID
  • Match name
  • Match admin
  • Vertex 2
  • Name
  • Vertex Type
  • Edge Type
  • Visibility
  • Comment

image

We need to enter one line of information in this table for each person with a large number of matches.

Under matchid you will enter the test ID number that was assigned to that person’s test by Ancestry. For your own test you can the URL when you go to your DNA page on Ancestry. It will look something like this – you need the part marked red:

https://www.ancestry.com.au/dna/insights/AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE

  • Copy and paste your test ID number under both matchid and Vertex 2. 
  • Type the word “Skip” under Visibility.             <<< Don’t miss this step!

That’s all that’s strictly necessary for it to work, but a little extra information will remind you what this line is for later:

  • Put your name in three columns: Match name, Match admin and Name.
  • comment is for reminders to yourself. Put whatever you like there. I added a short explanatory note about what this line does.

image

Repeat the process for each close relative (sibling, parent, child or grandchild) who has DNA tested.

This time, use your own id as the matchid, and your close relative’s id as Vertex 2.

If you’re not the administrator for the test, you can find their match ID on your DNA match page. The red part is your ID, and the blue part is your relative.

https://www.ancestry.com.au/dna/tests/AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE
/match/VVVVVVVV-WWWW-XXXX-YYYY-ZZZZZZZZZZZZ

Alternatively you can look up the match ID numbers in the matches files.

Save the file somewhere you will find it again. I’ll call this file the Additional Input file from now on.

OK, we’re all set. Now we create a NodeXL file and load the information in.

Create a NodeXL Workbook

The method required to use the template may vary with your version of Excel.
I have an Office 365 subscription. I select File New PERSONALNodeXLGraph

If this method doesn’t work for you, try searching for “NodeXL” in the Windows “Search programs and files” field or equivalent on your system, and double click the “NodeXL Template” file returned.

image

A new spreadsheet will open. It may check for template updates as it opens, and you will need to wait for 20 seconds for the splash screen to close. Once it does, your screen should look something like this:

image

A new ribbon called NodeXL Basic has appeared. It won’t be there when you open a normal file, it will only appear when you are using the special files created with the template. Click on the new ribbon and take a look. This is where most of the action will take place.

image

Load your files

Open your match list (m_yourname.csv), in-common-with list (icw_yourname.csv), and additional input file in the normal way then return to the new NodeXL sheet. With each load we have to tell the template which fields in the file hold information about people (‘vertices’) and relationships (‘edges’). I’ll tell you what to put in at each stage.

Important step before loading the first time:

On the NodeXL ribbon, click the Import button. It’s at the far left hand side.

image

  • Choose Import Options… (the bottom item on the menu).
  • Clear the box next to “Clear the NodeXL workbook before data is imported”. There should be no tick in the box.

Why: Otherwise, no matter how many files you load only the last one loaded will be in the spreadsheet. We need all three files – matches, in-common-with, and additional input – to go in.

Save the file now that you’ve adjusted the setting.

In-common-with file

  • Select Import from the NodeXL ribbon and choose From Open Workbook…image
  • Select the in-common-with file (icw_yourname.csv) in the top box of the dialog that appears
  • Leave “Columns have headers” checked
  • Tick the boxes for “match id” and “icwid” under Is Edge Column. No other boxes should be ticked
  • Confirm that “matchid” is selected in the  “Which edge column is Vertex 1?” dropdown
  • Under “Which edge column is Vertex 2?” choose “icwid”
  • Click import (say OK to the message about text wrapping if you get it.)

image

Check the import

Navigate to the Edges worksheet using the tabs at the bottom left of the screen.

image

You should see lots of ID numbers in the “Vertex 1” and “Vertex 2” columns.

The ID numbers will overlap each other and the other columns. That doesn’t matter. You should not see any other data entered in the sheet at this stage.

If the import looks correct, save your progress and carry on.

Matches file

  • Open the import dialog again:
    image
  • Click on the matches file in the top box (m_yourname.csv)
  • Set “testid” and “matchid” as edge columns (tick boxes). No other boxes in that column should be ticked.
  • Under “Is Vertex 2 Property Column” check the boxes for:
    “name”,
    “admin”
    “SharedCM”,
    “note” and
    “matchurl”   (you’ll need to scroll all the way to the bottom to find this).
  • Choose “testid” in the dropdown box under “Which edge column is Vertex 1”
  • Choose “matchid” for “Which edge column is Vertex 2”.
  • Click import (say OK to the message about text wrapping if you get it.)

image

Check the import

Navigate to the Vertices worksheet.

image

Check that the first column “Vertex 1” contains ID numbers. You should not see any names or other information in that column.

Scroll right and check that the “names”, “admin”, “shared CM”, “note” and “matchurl” columns have appeared and have information in them. You may need to scroll right to see them.

image

If this looks right, save you progress and continue.

Additional Input file

Important: Don’t forget to load this file!

  • Open the import dialog again
    image
  • Click on the Additional Input file in the top box
  • Set “matchid” and “Vertex 2” as edge columns (no other boxes should be ticked)
  • Scroll down and tick “Name” and “Visibility” under Is Vertex 2 Property Column    <<< Don’t miss this step
  • Choose “matchid” in the dropdown box under “Which edge column is Vertex 1”
  • Choose “Vertex 2” in the dropdown box under “Which edge column is Vertex 2”
  • Click import (and clear the text wrapping message if it appears)

image

Check the import

Move to the Vertices worksheet again and find the row with your own name (Control-F will bring up a search box).

Confirm that the word “Skip” is in the Visibility column.

image

If this looks right, save again and move to the next step.

Now make a chart!

Find the toolbar in the chart area, and click Show Graph.

Troubleshooting: If it takes more than a few seconds, there was probably a problem with the additional input file. When Excel has finished drawing thousands of dots, go back and check the Additional Input file instructions again and make sure you’ve loaded it. If it was missing and you’ve fixed it, click Refresh Graph which will have appeared where you found Show Graph before.

image

Don’t be disappointed if your chart looks like the image below (and it probably will). It will get better with a few tweaks.

image

Identify groups

On the NodeXL ribbon, find the Groups button. Click it and select “Group by connected component” from the option list.

This option works well for me, but if you have a lot of very interconnected matches you might find that one of the choices under “Group by Cluster” works better.

image

Refresh Graph will add the newly created grouping information to the chart. Your chart will become more colourful but no more tidy. Just one more step, and you’ll have something more interesting to look at.

Separate groups in the chart

You can access layout options from both the NodeXL Ribbon, and the chart area. Click on the dropdown in either location and select Layout Options from the dropdown menu.

image

Change the “Layout Style” option to “Lay out each of the graph’s groups in its own box” and click OK.

image

Click Lay Out Again to apply that change to the chart. You didn’t need to refresh the graph a second time because the data itself didn’t change, only the layout instructions.

image

This is how mine looks now. Each dot represents a person I have a DNA match with. Each line represents a relationship between two of my matches.

Move back to the Vertices worksheet and see what happens when you click on the chart dots.

image

That’s plenty for today. Don’t forget to save your file!

Go and get yourself a nice cup of tea (or whatever beverage you prefer) knowing that if you’ve made it this far you can definitely manage the next steps I have in mind.

82 comments:

  1. Phew.Thanks for your explicit instructions.I now have 23 boxes with lines and coloured dots. Can't make heads or tails of them - presume the next step is to do something with the gathered trees. Bring it on.

    ReplyDelete
    Replies
    1. Well done! The next post will be shorter but should help you make more sense of what you're seeing. :-)

      Delete
  2. Shelley... I have gone back three times now and started over because when I type "skip" in the visibility box, it doesn't show later after I imported the three worksheets. Should I be putting the word in quotation marks or otherwise delineating the word? Meg S.

    ReplyDelete
    Replies
    1. Hi Meg,
      Just the word Skip should be enough with no quotation or other marks. Start with a capital letter, just to be safe.
      Can you find the line that relates to you on the Vertices worksheet? Does your name appear there? I'm trying to work out if the line went in at all, or if it's just that one field with a problem. If you can find the row with your name then you can type skip in manually until you work out where the problem is.

      Delete
  3. Yes I did find the line and I did try entering 'Skip" manually. I still ended up with a huge grey blob with a black blob on top of it. . I'm going to try it now one more time before I head to bed. Meg

    ReplyDelete
  4. Okay, I just tried importing everything again, after I double checked that I had entered skip under visability. When I looked next to my name and under visability on the Vertices page, the word did not appear. I am typing it in manually and going to try to proceed to see if sorting into groups makes any difference. Incidentally, I am using Windows 10 with Microsoft Office 2013, but haven't see any other problems. Meg

    ReplyDelete
    Replies
    1. Hi Meg, possible mea culpa here. I didn't think of everything! Do you have close family members - parents, children or siblings - who have tested? I've updated the instructions about the Additional Input file so that we will 'Skip' them as well. Hopefully it will resolve your problem. Fingers crossed!

      Delete
    2. Um...yeah...LOL.. I am an adoptee who was looking for her birth father's family (I found them in May). I am the administrator for both of my oldest children, my 1st cousin, my first cousin once removed and my 2nd cousin. So, I should "Skip" both of my children, but should leave my cousins?
      Meg

      Delete
    3. Definitely 'Skip' both your children. For the rest click on their 'dot' to see how much of the grey they account for. I expect you'll need to skip your 1C1R and probably your first cousin as well. It would be better not to skip your 2nd cousin if you can avoid it. I'm intending to offer further suggestions in my next post!

      Delete
    4. Hi Shelley,

      Thinking that perhaps my Microsoft Office accounted for some problems, I upgraded to Office 2016. Didn't help. I still cannot get the "Skip" to work, even if I manually enter it. So then, I tried removed all of my son's, daughter's, first cousin, first cousin once removed and second cousin's matches to see if that would clear anything out. It did clear a little but not much. I am wondering if the fact that my maternal matches far outweigh my paternal matches by about 80% - 20%, and the maternal side is heavily endogamous. If you have any other ideas I'll give it a try.

      Delete
  5. What can I say... wow! I discovered some additional links with this step :-) Looking forward to checking out the next step.

    ReplyDelete
  6. So I got through the icw import no apparent problems. Did the matches import as directed, but after loading, this error message pops up:
    ---------------------------
    NodeXL
    ---------------------------
    An unexpected problem occurred. If it occurs again, please copy the details to the clipboard by typing Ctrl-C, then post the details to http://www.codeplex.com/NodeXL/Thread/List.aspx.



    Details:



    [COMException]: Exception from HRESULT: 0x800A03EC



    at System.RuntimeType.ForwardCallToInvokeMember(String memberName, BindingFlags flags, Object target, Int32[] aWrapperTypes, MessageData& msgData)

    at Microsoft.Office.Interop.Excel.Range.get_Offset(Object RowOffset, Object ColumnOffset)

    at Smrf.AppLib.ExcelUtil.OffsetRange(Range& range, Int32 rowOffset, Int32 columnOffset)

    at Smrf.NodeXL.ExcelTemplate.GraphImporter.ImportEdges(IGraph oSourceGraph, String[] asEdgeAttributes, ListObject oEdgeTable, Range oVertex1NameColumnData, Range oVertex2NameColumnData, Boolean bAppendToTable)

    at Smrf.NodeXL.ExcelTemplate.GraphImporter.ImportGraph(IGraph sourceGraph, String[] edgeAttributes, String[] vertexAttributes, Boolean clearTablesFirst, Workbook destinationNodeXLWorkbook)

    at Smrf.NodeXL.ExcelTemplate.ThisWorkbook.ImportGraph(IGraph oGraph, String[] oEdgeAttributes, String[] oVertexAttributes, String sGraphSource, String sGraphTerm, String sImportDescription, String sSuggestedTitle, String sSuggestedFileNameNoExtension)
    ---------------------------
    OK
    ---------------------------
    HELP!

    ReplyDelete
    Replies
    1. I can only suggest closing the file, reopening, and trying again. Follow the directions at the top of the error to get help direct from NodeXL if it doesn't resolve.

      Delete
  7. Thank you for the step by step instructions. It seems I still goofed up. My graph shows up with the many dots of varied sizes but there is a gray background behind them. When I do the step for making the boxes, nothing happens. Any thoughts where I went wrong?. Denise

    ReplyDelete
    Replies
    1. Did you follow the step to skip yourself and close relatives, and check that the 'skip' instruction went in?

      Delete
    2. Shelley, the only people tested so far are my mom and myself. I put us both in the "additional" excel sheet and skip in visibility.

      Delete
    3. Have you checked that the word 'Skip' appears in the NodeXL worksheet (sorry to ask the obvious, but that's where I usually go wrong!). If you match with anyone else who has a lot of links to your matches you will see something like that. How close are your next closest matches? Is there endogamy in your tree?

      Delete
    4. I have checked "skip" over and over. I will start over tomorrow as I have realized a 2nd cousin tested so I will add her into the Additional Info Sheet. There is very little endogamy on my mother's side that I am aware of. Thanks for your help.

      Delete
  8. Shelly, Does the Chart have to share workspace with the spreadsheet or how do I "pop it out" to make it easier to see?

    ReplyDelete
    Replies
    1. You can pull it out by grabbing and dragging the "Document Actions" title bar. It's a bit tricky to place as it will 'snap' itself to the edges. I've found it best to let go when you're over the worksheet itself, then resize into place (rather than moving the box).

      Delete
  9. I am loading my matches and ICW files but only did 4th cousins...should I have done them all?

    ReplyDelete
    Replies
    1. It's up to you. I loaded distant relatives in mine, but I don't have that many close matches. If you have a large volume of shared matches you might decide to keep it manageable by limiting to fourth cousins.

      Delete
  10. Awesome - I love it however I can't see all my vertices. Some known ancestors that I should be able to place in a group are not showing up at all. Shouldn't all my vertices show up in the graph. Also I seem to have overwhelmed it with my husband's mother's Mennonite ancestry. Too many lines!

    ReplyDelete
    Replies
    1. Only matches with shared match links will show. If all your vertices were in the chart you would have the dreaded blob! Your matches with known ancestors should show if 1. you haven't 'skipped' them and 2. they also match other people in your file.

      Delete
    2. Thanks Shelley - That makes sense now. I haven't skipped them so they just must not have matches to anyone else, yet...

      Delete
  11. I get kind of one big cluster of matches but not the boxes. I see boxes over to the right side of the graph but can't enlarge. I have re-tried a couple of times but get the same result. Not sure if I am missing something or misunderstanding the output.

    ReplyDelete
    Replies
    1. Hi Steve, have you 'skipped' close relatives? If so, for now I'd suggesting trying different clustering algorithms (NodeXL Ribbon, Groups button, "Group by cluster"). I'm currently preparing a post with more ways to work with a moderately complex graph.

      Delete
  12. I did try Group by Cluster but it didn't change anything. This test is using my wife's results. I did set the file up to SKIP her but the closest relatives she has are a couple of 2nd cousins. Should I skip them too?

    ReplyDelete
    Replies
    1. Hi Steve, I'll be posting in the near future about things you can try.

      Delete
  13. OK, I thought the skip instruction only applied to immediate family. Should I have included first and second cousins in the "Additional Input" sheet? I got much denser clusters than yours.

    ReplyDelete
    Replies
    1. Hi Ted, I started using this tool to try to get the most from my rather sparse matches! I'll be posting in the near future about working with more complicated data - but skipping the closer cousins is an option. Try with just the first cousins at first.

      Delete
    2. Shelley, your last statement in here pertains to my issue. I downloaded the files from ancestry using the dnagedcom client. the icw files had 56156 lines, the m file had 27188. I ran through your instruction line by line (repeated a couple times to make sure) and ended up with a hug blob, heavy on the outside. Tried some of your suggestions and nothing made any difference.

      I decided to generate an an m file that eliminated all the distant cousins, so I ended up with a file of about 1200 lines - this I did in the csv file. I cleaned out the workbook and start over and re-imported the 3 files. When complete, the vertices sheet only contained about 14000 entries, but only the 1200 or so showed the matching data, so that left less than 13000 with no information or only a value in the vertex column. My question is, have I done some wrong here? Is 1200 lines too much to do. Of the 1200 only 11 are 2nd cousins, 10 3rd cousins, 0 1st cousins.

      The plotted graph is a lot less complicated to be sure, but still will not separate into groups. Any comments or suggestions would be appreciated.

      Did i properly filter out the distant cousins or is there a method within in the nodexl template to filter them out - I did load the "range" when I loaded the matches.

      Thanks,

      Gene

      Delete
    3. Hi Gene, see my reply to your first post: https://twigsofyore.blogspot.com.au/2017/07/visualising-ancestry-dna-matchespart.html?showComment=1504231747059#c8623981315986175049
      Are you seeing different coloured dots, all black, or all dark blue? All black would mean nothing is allocated to a group. All dark blue suggests they're all in one group - try a different clustering algorithm. Different colours, they're in groups - try a different layout.

      Delete
  14. Thank you for sharing this very useful tool. The insights the graphic charts added to my family DNA "brick walls" were amazing.

    ReplyDelete
    Replies
    1. You're very welcome, and I love hearing that these posts have helped!

      Delete
  15. Shelley, I am determined to get this right. I feel like it is the additional info spreadsheet that is the trouble. Please tell me if I am correct on these points. I am only concerned about my mother's family; so under match ID (col. A) I insert her ancestry number. Col. B. is her name, Col. C is my name since in am the administrator. Col. D. is her ancestry ID, Col E. is her name and Col. H is Skip. As I read the instructions, her ID goes in Col. A all the way down for anyone I am skipping, is that correct? Then I add myself (daughter) in B,C,D,E,H. Is it correct to have over 17,000 lines of info for one graph. Thank you so much. Denise Stanton

    ReplyDelete
  16. Shelley,
    I have written out the instructions in a step-by-step method, just to make certain I am doing this correctly. I grabbed my 4th cousin's file, since she doesn't administer anyone and doesn't have any super close cousins. I even limited her DNAGedcom download from Ancestry to 4th cousins or closer because we do have endogamy in our family tree. The "Skip" command still will not take and doesn't even seem to be working when I enter it manually. I consistently get a splotchy gray background with black dots in the foreground. I am now using Office 2016 on Windows 10 with the free NodeXL download. Do you have a contact perhaps at the Social Media Research Foundation? I really, REALLY want this to work because I am a visual learner. And I don't mind spending $$ for good software, but at this point I am getting very frustrated. Thank you. Meg Staton

    ReplyDelete
    Replies
    1. Hi Meg, are you quite sure the 'skip' line isn't taking? If you have endogamy in your tree you WILL get a grey background from all the interlinked people. Splotchy grey actually sounds promising - there might be some room to clean thing up. Can you hang in there until my next post? It might help.

      Delete
  17. Well, since nothing else seems to be working ... and I definately am not going forward ... I guess I'll wait on your next post ;-)

    Meg

    ReplyDelete
  18. OK. I am having trouble getting the NodeXL template to open. I have confirmed it is downloaded, and when I click on NodeBasicXLTemplateSetup2014 it says I have already set it up. However, my view of Excel has not changed. I am in Microsoft Excel Home and Student 2010 version of Excel. And when I ask how to import files it just says open the file with Excel. My other problem is that I do not have any close matches to open a icw file with.

    ReplyDelete
    Replies
    1. Hi Anonymous,

      You won't see any difference when you open Excel normally, only when you create a new worksheet using the template. If you open Excel and click File - New then hopefully you will see it as an option in the list of templates.

      If not, do you know how to search your PC? The file you need to find is NodeXLGraph.xltx. On my system it's sitting in Documents/Custom Office Templates. If you find it, double click to create a new worksheet based on the template (don't open it using File Open from Excel or you will be editing the template itself).

      I hope it works for you. If it does, could you please comment here again so that other people will know what steps worked?

      If you can't find it, please message me using the "CONTACT ME" link at the top of my blog as there's one more thing I can think of but I'll need to email you.

      Delete
  19. I'm not sure I have done this first phase correctly. My graph looks nothing like yours. It looks like a giant pincushion purple dots with solid grey in the background. I tried group by component but only got 1 group in visual properties. When I used group by cluster 23. After refreshing the graph I now have the pincusion, but with lots of multi coloured pins in a straight line down the right hand side of the screen. All my imports look OK. I am skipping me and my mothers test. Skip is there for both of us. Does the names have to be exactly the same as ancestry? I put my ancestry admin name in, not the same name as I have listed for myself. Cauld that be causing it?

    ReplyDelete
    Replies
    1. Hi Genemonkey, everything you're doing sounds OK. My clusters separate out that way because I have a relatively small number of matches, and geographically dispersed ancestors. Try the Wakita-Tsurumi clustering algorithm if you haven't already, it breaks charts more than the first one. Assuming that moves things in the right direction, some of my later posts talk about making sense of a busy chart. You're not the only one with this experience!

      Delete
  20. I completed the tasks in part 2, and generated a graph, but my graph has so many people in it that I can only see a great blob of ink with some connecting lines out at the perimeter. I tried grouping, separating groups, but I cannot see the results. How can I see the graph on its own, not in the right hand panel next to the spreadsheet? And how can I see the separate groups that you show in your second image? Thank you.

    ReplyDelete
    Replies
    1. Check that you 'skipped' yourself and any very close relatives.
      Try a different clustering algorithm - Groups > Group by cluster... > Wakita-Tsurumi breaks charts up the most.
      Check that you've followed the other steps suggested.
      It's quite possible your chart will still be complicated afterwards (my graph is very sparse!). If you see different colours for different groups you're on the right track. The two posts about busy charts should help.
      Good luck!

      Delete
  21. I just wanted to say, I'm with you so far and thank you very much indeed for providing such detailed step by step instructions.

    ReplyDelete
  22. I have run into a glitch, and I am hoping you can give me some ideas about what might be wrong. This is the fourth one of these spreadsheets I have done and the first time I have seen anything like it. It appears that NodeXL is not recognizing groups. Although this man has 12,000 matches on the Group Vertices page, NodeXL has found only 4 groups, with all but 10 of the matches in Group 1. Of course the single group remains a blob in the middle of the graph, regardless of layout settings. I have restarted the sheet from scratch 3 times, and found a minor error in my first Additional Input sheet, but the end result remains one huge and three tiny groups. Any ideas?

    ReplyDelete
    Replies
    1. What type of grouping are you using? If you're using "Group by connected component" try "Group by Cluster". The "Wakita-Tsurumi" option usually breaks clusters up the most.

      Delete
  23. Hi- thanks for the quick answer. On the problem spreadsheet, I tried both component and cluster. My cheat sheet says use component, but I am not sure I have done that consistently on all three of the successful ones. I have not tried "Wakita-Tsurumi" at all, so I will give that a try.

    I posted my question on this blog installment because the problem pretty much has to be either in the input files or in my setup of the various options. I have tried to find differences in all of those factors compared to successful spreadsheets, but no luck so far. Nevertheless, it has to be here somewhere!

    Thanks again

    ReplyDelete
  24. YES! Your suggestion worked! It looks like we are back in business.

    Thanks!

    ReplyDelete
  25. I hope someone can help me. I am still trying to get my hands around the input data. As we get the data from dnagedcom client in a csv, it contains all the icws and matches. How to filter out just the 4th cousin and closer data or is it even necessary. I know i can delete the distant cousins in m_ file, but is it even necessary because I can't really delete anything so easily in the icw_ file? When I make a graph with only the 4th or closer, the program still reference the names in the icw_ and shows them in the graph, making the graph more complicated. Any hints are greatly appreciated.

    ReplyDelete
    Replies
    1. Hi Gene, you're quite right about what happens when you delete from the matches file - the lines relating to those people are still there. See my reply to your first comment https://twigsofyore.blogspot.com.au/2017/07/visualising-ancestry-dna-matchespart.html?showComment=1504231747059#c8623981315986175049

      Delete
  26. Hi Shelley, my graph is taking forever to draw. I've waited more than 10 minutes and it's still processing. I went back (twice) to your instruction about importing the "Additional File" and the result is still the same. Everything else looked just fine. Any ideas on what the problem may be? Thanks. --Ken Waters

    ReplyDelete
    Replies
    1. Hi Ken, if it takes a very long time the problem is usually that the word "Skip" hasn't been entered for the test owner. It can also happen if you have filtered the columns. All the filters need to be cleared before you try to draw the graph.

      Delete
  27. Thanks, Shelley. I finally was able to get a graph and finish unit #2. The test owner was marked as Skip. The other close relatives (two sons, two grandchildren) were not marked as Skip even though they were in the "Additional File" spreadsheet. So, I did go into the big spreadsheet and manually marked them as "Skip" as well. Perhaps it's because there were so many matches, more than 32,000.

    ReplyDelete
  28. Hi Shelly, I am at the import stage for the icw file. When I click on import I get a message that says "If the columns in the other workbook have headers, then there must be at least two rows". I have clicked around and I can't find any help. Help! Thanks.

    ReplyDelete
    Replies
    1. Hi, I've never come across that message. I assume that your ICW file has more than two rows. Do you see the same "Import from Open Workbook" dialog as shown in the post? Do you get the message before or after that?

      Delete
  29. I've closed out for the night, and will work with it again in the morning. My screen looks exactly like the sample when I am at the import step. When I click on import, there is a pause and then that message pops up. Open to playing with any ideas....just an aside, I LOVE this tool and appreciate all of the work you have put in to writing up such great directions with illustrations. Thank YOU. (Now I just need to get it to work!)

    ReplyDelete
  30. Hi Shelley, I wonder if you might have an idea on what I've done wrong. I've been through the process twice, reading everything carefully (or so I thought), and yet I get nothing at the end. When I hit the button to make the graph, it thinks for about one second and acts like it's done, but no graph shows up. When I hit "refresh" --- same thing. I can move freely around the spread, so it's obviously not hung in the middle of thought. I use Excel every day at work (although nothing this complicated), so I feel that I am fairly well versed in its basic use. Any help would be greatly appreciated! Thank you in advance!

    ReplyDelete
    Replies
    1. Hi Daniela, if you can see data in the first two columns on the Edges sheet you should get a graph even if something else has gone wrong.
      To confirm that NodeXL works on your system, try creating a new empty NodeXL sheet and hand enter some information in the first few rows of the Edges worksheet. eg
      Vertex 1, Vertex 2
      A, B
      B, C
      C, A
      This should draw three dots connected by lines when you click refresh. Did that work?

      Delete
    2. Hi Shelley, I have the same problem as Daniela above. I have tried the test data as suggested and I get the three connected dots. Still no graph for my main data though. I am the only user named in the additional info file and the Skip has copied over into the Vertices tab. Wonder what is going wrong.
      TIA

      Delete
    3. Hi g, but you can see data in the first two columns of the Edges sheet?

      Delete
    4. Same thing just happened with me. I've imported everything correctly 3 times now, and I still don't get a graph showing up. I did the ABC test as above and that worked perfectly.
      I do have over 220,000 lines of data in the Edge section. Could it be that it is just too big to load?

      Delete
    5. Hi Caitlin, how much RAM does your machine have? The NodeXL FAQ page https://www.smrfoundation.org/nodexl/faq/ says that a network of 200,000+ edges would require 32GB of RAM.
      Assuming that's it, there are still some things to try:
      - Double and triple check that you and any very close relatives have "Skip" in the visibility column.
      - Follow the instructions about clearing duplicates from this page http://twigsofyore.blogspot.com.au/2017/07/visualising-ancestry-dna-matchespart-4.html
      - If you have downloaded ALL matches, try entering "Skip" in the Visibility column for the more distant ones. Either find the level you want to start skipping from and copy the work down, or use the Autofill Columns button > Vertex Visibility > arrow on the side to set which to skip. (double check you haven't accidentally unskipped yourself by doing this).
      If that doesn't work come on in to the Network Graphs for Genetic Genealogy Facebook group. There's an excel worksheet in the files section that takes care of the above for you, or we can help you get started with Gephi (another network graph tool that will handle larger networks).

      Delete
  31. Hello, Thank you very much for your wonderful program. I am at step 2 and trying to load my Additional Worksheet. I receive this error: [OutOfMemoryException]: Exception of type 'System.OutOfMemoryException' was thrown.
    Is there a workaround for this? My icw and match files loaded fine. I enjoy your step by step, detailed explanations! Thank you Lynn

    ReplyDelete
    Replies
    1. Hi Lynn, first I want to clarify that it's not my program. I just applied it to DNA matches. :-)
      Hopefully you might have had better luck when you closed out and came in again. The program is quite memory intensive, especially if you have a lot of matches, and it could be that your PC doesn't quite manage it.
      Some things you can try:
      Minimise the work your PC is doing while working on the spreadsheet. Don't have other processes running.
      The Additional worksheet is primarily a way to easily apply the word "Skip" to your own record on the Vertices sheet. You can enter the word manually if necessary. Your name won't be in the file, you will need to search on your ID number. (The general find function will work better than the column filters for that).
      If you downloaded all matches, you might be better off to only download 4th and closer cousins.
      Assuming you are able to move on... before you try to plot the graph, do make sure that you and any very closer relatives have "Skip" marked. Also, move ahead to part 4 and follow the instruction about removing duplicates before you try and plot. http://twigsofyore.blogspot.com.au/2017/07/visualising-ancestry-dna-matchespart-4.html
      I hope some of these suggestions help!

      Delete
  32. Loading "Additional Input File.xlsx" and it erases everything from Edges and Vertices tables. What am I doing wrong?

    ReplyDelete
    Replies
    1. Hi Chris, you probably missed fixing the import options to retain data already imported:
      On the NodeXL ribbon, click the Import button.
      Choose Import Options… (the bottom item on the menu).
      Clear the box next to “Clear the NodeXL workbook before data is imported”.
      I think that should fix your problem!

      Delete
  33. I imported the three worksheets. Should I be putting the word in quotation marks or otherwise delineating the word?

    ReplyDelete
    Replies
    1. Hi Alisha, sorry I didn't reply sooner. You just need to type in the word without delineating it.

      Delete
  34. Yes, indeed, thank you for the explicit instructions. I managed to get to step 3 of Import. That is where I have a problem. I don't have an "additional input file." I do, however, have a file that begins with "a_..." I tried that but did not get the subsequent questions/boxes/steps. How do I get the "additional input>?" I tried Googling it and was referred back here.

    ReplyDelete
    Replies
    1. Hi Jane, you will need to create that file yourself (see the first part). The main purpose of it is to make sure that the dot that represents you isn't shown. If you can find the correct row on the vertices sheet you can simply enter "Skip" in the visibility column if that's easier for you. Don't forget to do it one way or another or you'll end up waiting for ages only to see a blob!

      Delete
  35. Is the "Additional Input" file to be saved as .xls or .csv? Thanks!

    ReplyDelete
  36. I just created a new nodexl graph for a dna test with which I am intimately familiar, the first one I ever "solved" (for an adopted friend). I have skipped him but nobody else. He has two maternal uncles ~1800 cm, I have not skipped them. When I create the graph and tell it to put each group into a separate box, I get ONE group (all dark blue). I have tried tweaking the upper level of the cm to show such that the uncles are excluded, and that doesn't appear to break the data up into more than one group (would it?). This is not endogamous data, quite the opposite. There's his father's side which has colonial roots mostly from England, his mother's side - maternal is 100% norwegian immigrants, paternal is 100% scottish immigrants, the only overlap between the Scottish and the Norwegian is his mother (who hasn't tested) and her two brothers (who have).

    Yet somehow nodexl thinks this is one big group?

    ReplyDelete
  37. Instead of using the “Additional Input” file to skip the close matches I do the following:

    When importing the matches file, I select “range” item in addition to the other recommended items (“name”, “admin”, etc.) for “Is Vertex 2 Property Column”. Range is the relationship bucket displayed in Ancestry.

    Then I go to the Vertices tab of the workbook and set the Range column to display only the close matches (Parent/Child, Close relations, etc.) and also blanks. You can then change the display option for those lines that you wish to hide or skip. The line with the blank relationship is the test taker. While I am there I usually add SELF to the name field of the test taker to keep track of it.

    Now reset the range column to display all and that’s it. No need for that third spreadsheet.

    ReplyDelete
  38. If I am only interested in the grouping data and not the plot, does it matter if I enter skip in my closest matches? I am thinking they could help me identify which part of my tree these groups are.

    ReplyDelete
    Replies
    1. It will make a difference to the grouping. Try it both ways and see which makes more sense!

      Delete
  39. This is seriously awesome. I had a large Excel-sheet of known associates of a person of interest, and their associates. The NodeXL template together with your instructions allowed me to visualize the whole network. Thank you so much!

    ReplyDelete