In the last post we cast an appraising eye over the graphs we made using NodeXL Basic (a product of the ‘Social Media Research Foundation’). In this post, you’ll see some of the features of that may help calm a busy graph. Pick and choose from them as appropriate to you tree, research aims and aesthetic preferences.
If you haven’t made a graph yet, see the index to this series for earlier posts.
Display settings
Take it one group at a time
Once you’ve made groups, you can move to the Groups worksheet and enter ‘skip’ in the Visibility column for each group except the one(s) that you’re interested in. Click Refresh, and only the unskipped groups will be shown. You can also view a few groups at a time as I did in the previous post.
Reduce edge opacity
If there are a lot of crossing lines it might be easier to work with the graph if you reduce the line opacity. You can change the defaults used for the graph, including the edge opacity via the Graph Options button.
- Click the Graph Options button
- Lower the default Edges Opacity – the lower the opacity, the more transparent the line.
This may not remove as much visual clutter as you want, but if the dots appear to be sitting on a blanket of grey it may help you see some structure.
Swap labels for tooltips
In Part 3 we used the Autofill columns button on the NodeXL ribbon to add labels to the graph. For a busy graph you may prefer to use same button to clear the labels column and set the tooltip to ‘name’. That way you’ll see the match’s name by hovering over their dot.
Grouping
If your groups don’t break up nicely, try a different clustering algorithm.
- On the NodeXL Ribbon select Groups, Group by Cluster…
- Select an option from those presented and click OK. The calculations may take some time for a complex graph.
- Refresh Graph to apply the new groupings.
Clauset-Newman-Moore clustering algorithm
Same graph with Wakita-Tsurumi clustering algorithm
Remember that these algorithms were not created with your DNA results in mind! Hopefully one of them will work well with your data – but don’t assume that because it sounds scientific it must be right.
Try a different group box layout – or none at all
- Different group box layout options are available under ‘Layout Options’ on the graph area or main NodeXL ribbon, bottom item Layout Options…
‘Force-directed’ box layout algorithm used, box edge width 0 (I.e. no line)
Hide intergroup connections
It’s possible to hide all the the lines that run between different groups. This instantly cleans up a graph and makes connections within a group easier to see, but does so at the expense of between-group information.
- Click the Layout options dropdown on the NodeXL Ribbon or the graph area toolbar.
- Change Intergroup edges to ‘Hide’.
Intragroup edges hidden
The graph with between group edges hidden is clean and pretty. It’s easier to see relationships within groups – but relationships between groups are not visible. Again, that reminder that the grouping algorithms were not designed for your DNA data. Those between group connections may be the clue that points you in the right direction.
Alternative: ‘Combine’ is an interesting option to try. It will draw a single, thick line between groups that interlink with each other.
Removing relatives
Skipping close relatives
When we created the Additional Input file we added the word ‘Skip’ to the Visibility column for you and your very close family. The ‘Skip’ direction tells NodeXL not to include that person in the graph, or in the clustering calculations.
It may be helpful to ‘Skip’ some more of your close relatives, especially if PC performance is an issue. Take care though – skipping a relative means the graph loses information.
- Your closest relatives are probably at the top of the the Vertices worksheet. If not, sort the sharedCM column from largest to smallest using the dropdown. Your closest relatives will move to the top of the list.
- When you click on the row for a match, the dot that represents that person, and all the lines representing their relationships, will be highlighted in red. This will give you a sense of how widely spread their linkages are, and how much clutter will be cleared (or information lost) by skipping them.
There’s no magic number for the relationship distance or number of links that should be the threshold for skipping people. If I had an aunt and a second cousin who had the same number of links, I might skip the aunt, since theoretically her links are spread over half my tree. I would be much more likely to leave in the second cousin whose matches theoretically sit in a quarter of my tree.
As many readers have realised, you can ‘Skip’ people manually by entering ‘Skip’ in the Visibility column. However, I suggest that you also add the new ‘skip’ line to the Additional Input file as explained in Part 2 (note that the directions on this point have been revised since first posting). If something goes wrong, troubleshooting a large file with complex relationships can be difficult. Keeping the information in a smaller external file makes it easier remember what you’ve done, and allows you to reload or start again if necessary.
Note: After skipping people you might want to rerun your preferred grouping algorithm and refresh the graph.
Skipping children of known matches
It’s not a quick fix, but another category of person that you may want to ‘Skip’ is anyone who is known to be the child of another match. If they are only connected to you on the matching parent’s side you can safely ‘skip’ the child as they, at best, duplicate the parent’s relationship information. Take care that the relationship really is parent-child and not niece or nephew – the information visible to you may look the same in those cases.
Again, I suggest that you at least keep a record of these ‘skips’ outside your main file - the Additional Input file is made for this! If something goes wrong with the graph file, will you really want to track down those relationships again?
Filtering
Dynamic Filters allow you to hide your most distant and/or closest relatives.
- Select Dynamic Filters
- Scroll down or expand the window to find the sharedCM slider
- As you adjust the slider’s lower value, your most distant cousins will disappear.
- Adjusting the slider’s upper value will hide your closest cousins (you may need to slide it down a long way).
- If you want to still see the filtered information, but make it less prominent, adjust the filter opacity to your liking.
Wakita-Tsurumi grouping with matches below 15CM filtered out
Excluding matches
If you have a very large number of matches you may decide not to work with distant cousins at all. In this case you could enter ‘Skip’ next to each one, or you could save some time when downloading by using the Filter: 4th Cousin option in the DNAGedcom client.
Deleting smaller matches from the match list, whether before or after importing to NodeXL, won’t help. Matches listed in the in-common-with file will still be included in the graph, you just won’t know who they are!
If you want to go a bit past fourth cousins, but not all the way to those speculative distant matches, filtering or skipping may be a better option than excluding entirely.
Excel tips:
1) If you copy a cell then select multiple cells and paste, the paste value (e.g. ‘Skip’) will be entered into all of the selected cells.
2) Double click on the square at the bottom right corner of a cell to copy it down the page automatically to the next filled box, or the end of the table whichever comes first. It can be a bit fiddly to get the right spot – the curser should change into a black plus sign + without any arrows.
DNAGedcom Note:
There are two versions of the DNAGedcom client being used at present. Version 2 is necessary if you have FTDNA matches, but it doesn’t have the filter option for Ancestry DNA matches (I’m told the option will be reinstated in future). The version linked to in the first post of this series does have the option.
Coming up
In the next post, we’re going to extract more information from the files we already have.