Blog post

Showing posts with label search strategy. Show all posts
Showing posts with label search strategy. Show all posts

Sunday, December 18, 2016

A gift to you from Twigs of Yore (and son)

If you are the type to rip the paper off with abandon, go ahead and click here. If you always read the card first, carry on.

My 10 year old son (who was last mentioned on this blog snapping shots with the Billion Graves app) wants to be a coder when he grows up. I keep telling him that, once he has the skills, he can build my perfect genealogy software. He seems to have accepted this fate. Either that, or he thinks I’m joking*.

So one day, when Mr 10 was looking for ideas to code, I asked him to work out how to build a web form with a button that would return different search strings depending on what was entered. I wanted such a thing because late last year I analysed historical birth notices in Trove and came up with conclusions about an effective search approach to use. In short, the best results were obtained by running a series of searches with the surname and one other relevant search term in close proximity.

Mr 10 quickly worked it out and obliged with the coding. We are excited to present to you the ….

Trove Helper

Merry Christmas!

 

* I am joking. Mostly. Partly. A little bit.

Thursday, September 24, 2015

Perfecting Newspaper Searches: Birth Notices–Part 2

In this post I will turn the series of charts from my post Perfecting Newspaper Searches: Birth Notices - Part 1 into a search strategy. I will tell you why I am suggesting the searches, and I will give some tips on how to create an appropriate search string in Trove.

In each case, it’s a good idea to narrow the date range to a sensible window. You can also do it with the filters on the side after you search but “Refine search” or the advanced search form allows you to choose any range you want, not just a single decade, year, month, or day. “Refine search” becomes available once you’ve run the search:

image

I like to hold off on narrowing down my searches any further than that for as long as I can. Notices can sometimes turn up in unexpected places, and they are the ones you most want to find!

The series of searches I have come up with based on my previous post, and taking into account Trove search capabilities, is as follows:

  1. Surname only
  2. Surname and place
  3. Surname and father’s name
  4. Surname and mother’s name (after about 1910)
  5. Surname and child’s name (after about 1910)
  6. Surname and parents’ first names (after about 1940)
  7. Address only... or anything else you’ve got! (if all else fails)

Read on for more detail.

1. Surname only

Reason: Every birth notice included the surname at least once.

Search tips:
Depending on the surname, you may wish to expand or restrict the search.

Expand the search by searching for known variations, or by using a wildcard.

couper OR coupar OR cooper OR cowper

coup*

Trove adds some fuzziness to your search terms by default. You can restrict the search to exactly the term you want by specifying that only the exact term you entered should be returned.

fulltext:couper

Adding fulltext gets me from 603,972 results down to 34,878. I can see that it gets rid of news about coups, and advertisements for coupes – but I don’t know what else I might have lost. Still, that’s too many to read through. I have to hope the birth notice I want is on the first page or two, or start using the state and notice type filters to narrow it down!

Assuming you have more results than you can reasonably review, the next searches to try are:

2. Surname and place

Reason: Until the 1950s, the majority of birth notices included place names that a family researcher might know to look for.

Search tips:
The birth notices I reviewed included street and/or suburb names. Look at your information and identify all the places and addresses where the family was known to be during the date range of interest as well as immediately before and after. You may have to do a few different searches if there are a lot! 

Birth notices are rarely as long as 30 words. I found that the surname would usually appear at the beginning of the notice, and often in the middle as well (as part of the father’s full name). This means that the surname and place name you are looking for are likely to be no more than 10 words apart. You can safely restrict your results by specifying that the words you are interested must be near to each other. If Trove is in a good mood, you do that by specifying the amount of “phrase slop” to allow (I didn’t make up that expression, it’s what the Trove help page calls it!)

My Couper family lived in Rugby Road, Oakleigh. I might search for: “couper oakleigh”~10

When I started writing this post, Trove handled searches like the one above with no problem. The previous few days it has struggled – but seemed happier if I snuck up on it by trying smaller numbers first. Today it’s running complicated searches quite happily.

You can use “fulltext” with phrases – put it outside the brackets:
fulltext:“couper oakleigh”~10

Depending on how many surname variations you have, and how many place name parts you need to manage, you might have to mix and match surnames and place names. You can search on each combination one at a time, but I like an all-in-one search if I can manage it. For example:

“couper oakleigh”~10 OR “couper rugby”~10 OR “cowper oakleigh”~10 OR “cowper rugby”~10

I prefer that because separate search strings often bring up duplicated results. By running them all at once I don’t have to look through pages of the same articles to find the ones I want.

If you try a search like this and Trove isn’t co-operative, or it just seems too complicated to set up, here is another approach:

(couper OR cowper) AND (oakleigh OR rugby)

This search tells Trove to find articles that have any of the surname variations AND have any of the place names. Note the use of brackets, to assist Trove’s search engine make sense of the query.

This will bring up all the same results as the search above, but will also bring up more results that are not relevant because it doesn’t limit the distance between the search terms. Theoretically, articles where the words are closer together should appear closer to the top of the search results.  

3. Surname and father’s name

Reason: Over 85% of notices included the father’s name in some form. I suggested searching for places first, even though “searchable places” don’t appear quite so often, because places tend to have fewer name variations to work around.

Search tips:
The father’s name was sometimes shown as the given name, sometimes as initials, sometimes an abbreviation of a name (Chas, for Charles) and sometimes a mixture of these. This means that if I was searching for children of James William French, I would need to try:

  • “J French”      
  • “J W French”
  • “James French”
  • “James W French”
  • “Jas French”
  • “Jas W French”
  • “J William French”
  • “J Will French”

… you get the idea.

A reasonable starting point would be:

“J French” OR “James French” OR “Jas French”

I have deliberately ignored the W in the middle in this search as the default phrase search is equivalent to a search with ~1. Depending on how common the name you are searching for is, you might need to try more variations.

“J W French”~0 OR “James W French”~0 OR  [continue adding name variations]

That ~0 means that there can be no “slop”, the name must be exactly as written. Of course, the name in the newspaper may be written just like that but you still might not find it due to character recognition difficulties.

If after about 1910:

4. Surname and mother’s name

Reason: Increasingly from about the 1910s, birth notices started to mention the mother’s name.

Search tips: Sometimes the article included the maiden name, sometimes the given name(s) and sometimes both.

We saw the maiden name, if included, was always within a few words of the surname. A search that would find “Couper (nee Mary Allsop)” is:

“Couper Allsop”~2

Given name was sometimes included with the surname, as above, and sometimes in the middle of the text.

“Mary Allsop” is worth a shot. So is “Mary Couper”~10.

The name in a birth notice is often the name the mother went by, rather than as her full name so remember to search for Kate as well as for Catherine.

If after about 1910:  

5. Surname and child’s name

Reason: Increasingly from about the 1910s, birth notices started to mention the child’s name.

Search tips:
When included, the child’s name was written out with both the first and middle name, not nicknames, and was usually at the end of the notice. A search that insists on the first names and surname being close together won’t work.

French AND “James Henry”

If after about 1940:

5. Surname and parents first names

Reason: From about the 1940s birth notices became less formal in tone and often mentioned both parents by their first name, mother first.  

Search tips: Try casual and nickname forms of the names of interest first.

If you still have no luck, leave out the surname:

6. Address – or any other information you have to use!

Reason: Sometimes, the surname simply isn’t picked up accurately by the character recognition process.

Search tips:
Leave off the surname, and use whatever you’ve got! Just the address is good option as many birth notices included an address, and it is quite specific:

“12 rugby road” OR “12 rugby rd”

In this case I left in the word road (and included both “road” and “rd”), to avoid articles about rugby scores. If the street name was not such a common word I would have left “road” and “rd” off.

You could also use anything you know about the family that is a bit unusual. Very few birth notices mention anything other than the information I’ve discussed, but there were exceptions.

If after all that you still can’t find a birth notice… perhaps there wasn’t one, or perhaps the right newspaper just isn’t online yet. It cost money to place a notice, families were large, and for many times were tough.

 

Did these strategies work for you? Is there a strategy that I’ve missed out? Do you have clever ideas about how to put together a search string using what we know about birth notices? I’ve love to hear about it!

Wednesday, September 2, 2015

Perfecting newspaper searches: Birth notices–Part 1

In order to search effectively, you really do have to know what you are looking for! You might know that you are looking for the birth notice for John Doe and the search seems pretty straightforward. Enter the words “John Doe” and narrow down the year range. Unfortunately, that may not be enough to find John Doe’s birth notice (assuming he has one) even one if the entry is transcribed correctly.

What you need to know as you enter your search terms is not the name “John Doe”, but the unique combination of words that are used in John Doe’s birth notice. If you are looking for a historical notice it’s entirely possible, even likely, that the notice will not contain his name at all.

Over the past few days I’ve been compiling information about the information included in birth notices in The Argus (Victoria, Australia on the Trove website), from 1850 to 1955. I selected Family Notices articles spread across the years. I aimed to choose articles with multiple notices in order to process them in batches, and did not reject any batch once I had clicked on it. I stopped searching for additional birth notices within each decade when I had reviewed at least 30.

In total, I reviewed 447 birth notice items from The Argus. I also spot-checked other newspapers and states, and looked more carefully at an extra 162 birth notices for other states in order to test if the results I found are generally applicable for Australian newspapers.

I this post I will describe what I found. In my next, I hope you will join me in a discussion of what the results means for constructing birth notice searches in Trove.

The birth notices had three common features:

  • They were quite short, most were 30 words or less (newspapers would charge extra to insert a longer than standard notice).
  • They all included the surname of the person.
  • They all included the words “son” or “daughter”.

You will notice that I have not listed “they included the child’s name” or “they included the mother’s name” as common features. Before the 1950s, these were quite uncommon features!

Person’s surname

Every birth notice included the surname. The surname was usually given in capital letters at the start of each notice, and would also often appear in the middle of the notice when the parents’ names were mentioned. However, early birth notices did not start with the surname. The position of the surname relative to other search terms we might want to use becomes relevant when we consider how we might search Trove.

Child’s name

No birth notices prior to 1910 (in my Argus sample) included the child’s name. Inclusion of the child’s name was above 60% in the 1920s and 1950s. Still, even in those years more than 30% of birth notices did not name the person who had been born!

When the name of the child was given, it was almost always in brackets at the end of the notice.

Chart 1:  Proportion of birth notices that included the child’s name

image

Mother’s name

In the earlier papers, the mother was almost always referred to as “wife of …” or “Mrs husband’s name”. Almost always. Occasionally she wasn’t referred to at all.

The first instance of including the mother’s maiden name in my sample occurred in the 1900s. This practice had become more popular in the 1920s and by the ‘40s I found that more than 60% of sampled birth notices included the mother’s maiden name.

Where the maiden name was included, it was almost always placed in brackets immediately following the child’s surname at the start of the notice. There were some variations in the detail – use of the word “nee”, or inclusion of the given name.

    • SURNAME (mother’s surname)
    • SURNAME (nee mother’s surname)
    • SURNAME (nee mother’s full name)

Inclusion of the mother’s given name, either with the surname as above or in the text of the notice, also started taking off in the 1920s. In the 1950s over 80% of birth notices would include the mother’s given name.

Although I did not make a tally, it was my impression that in most cases the name the mother went by was included, rather than her full name. That is, “Dot”, rather than “Dorothy Jane”. Her name was often paired with her husband’s name in the text ie “Dot and Wal”.

Chart 2:  Proportion of birth notices that included the mother’s name

image

Father’s name

The father’s name was almost always included in some form. This would either be his full name or his initials. I did not tally whether “Mr. J. W. Doe” or “Mr. and Mrs. J.W. Doe” were more common - both were frequently used.

In the earlier time periods, the father’s name was more often spelled out in full. In later time periods, when inclusion of the father’s given name again became the norm it was more often included paired with the wife’s name eg “Dot and Wal”.

Chart 3:  Proportion of birth notices that included the father’s name

image

Other information

While there was very occasional reference to occupations (usually in the earlier notices) or sibling names (only in the later notices) these were quite rare.

I saw quite a lot of notices that included the word “twins” and sadly even more that included the word “stillborn”. In the later years I also saw the word “caesarean” in a few notices. These words might be useful if you already knew a bit about the birth.

The only other information frequently included was place names. Almost every birth notice included a place name, either the residence or place of birth. Not all of these would be useful when construction a search. In tallying inclusion of place names, I made a completely subjective judgement in each case as to whether the place name was one that a researcher would be likely to know was connected to the family, and sufficiently unusual that it wouldn’t bring up too many false positive results.  For example, if the place was a hospital (without mention of a suburb), I did not suppose the researcher would have information that would lead them to search on that term.

While most birth notices through most of the time period included potentially “searchable” places, this dropped of in the 1950s. Two things seemed to be happening:

  • A shift to including information about the immediate family instead of place of residence.
  • Possibly, more births were occurring in hospital. I generally did not include a hospital name without a suburb as a “searchable place”.

Chart 4:  Proportion of birth notices that included a searchable place

image

Other States

A spot-check of other newspapers and other States suggested that the patterns I saw in the Argus were generally relevant. I was not keen on replicating the whole exercise across every State… but I did want a bit more information to reassure myself on this point. The decade starting 1910 seemed to be a turning point for inclusion of mother and child names in the Argus and that is the decade I chose for comparison.

For each State, I chose items from the newspaper with the most “Family Notice” articles in that State. Apologies to Tasmania and the Northern Territory. You were not forgotten. It was just that the small number of birth notices per article made the data extraction task more onerous. I’m doing this in my spare time, remember!

Results were reasonably consistent. Victoria was perhaps a little ahead in including details of the mother and child.

image

Of course, I only looked at a few hundred out of potentially millions of birth notices in total (as at the time of writing there are 1,543,548 “Family Notices” articles in Trove, many of which would contain multiple birth notices). Local newspapers especially may have entirely different patterns. It would always be worthwhile to look at some birth notices for the paper and era you are searching, to make sure that the terms you are searching for were used at that time and place.

 

 

Copyright 2015 Shelley Crawford

Thursday, August 14, 2014

Filter Ancestry hints by collection

Did you know it is possible to filter Ancestry’s shaky leaf hints by any collection you want? No, there is no link on the website, at least not in the Australian .com.au version Ancestry, but it’s not hard to do.

Here’s the recipe for a link to shaky leaf hints from the collection of your choice:

http://trees.ancestry.com/tree/TREE/hints?src=hw&hf=record&hs=last&hdbid=DATABASE

Replace TREE with the number of your Ancestry tree. You’ll find it in the URL when you look at your tree on Ancestry.

Replace DATABASE with the ID number for the collection you want to filter on. You’ll find it in the URL when you navigate to the search page for that collection. For example, to search the 1851 England census you would head to http://search.ancestry.com.au/search/db.aspx?dbid=8860&enc=1 and find the number 8860.

That’s all there is to it! Soon you will have your fill of the low hanging fruit hidden behind those shaky leaves, and even better you will decide for yourself if today you feel like apples or bananas!

Some of my favourite database ID numbers are:

1635 Victoria, Australia, Assisted and Unassisted Passenger Lists, 1839–1923
1904 England & Wales, National Probate Calendar (Index of Wills and Administrations), 1858-1966
2972

England & Wales, Non-Conformist and Non-Parochial Registers,
1567-1970

8978 England census 1841
8860 England census 1851
8767 England census 1861
7619 England census 1871
7572 England census 1881
6598 England census 1891

If I keep going I’ll soon list their whole catalogue…

But I won’t stop before I say… Pssst! If you want to check BillionGraves this way while the link still works (it did for me today 14/8/2014) the ID number to use is 70734.

Friday, July 4, 2014

Ever feel like you’re going in circles?

This evening I was searching for a great-great uncle. He has a common name so I didn’t know if the records I found related to him or some other person.

I decided to do a quick search of Ancestry member trees to see if there were any clues. I found only one other tree that included this man. Although the information recorded appeared to be the same, minimal, information that I had, I clicked in to take a closer look.

I noticed a source link on the side – Ancestry Family Trees. Interesting, since there were just the two of us. I’ve never bothered going further with a “Member Trees” source but this evening I was curious. I clicked the link to find this not-particularly-informative page:

image

There was another link – to view the individual member trees. While I was clicking links I may as well go there too!

My final destination was a side by side comparison of the tree I was looking at and the source tree for that information – my own tree! I had come full circle.

I think there are two things to learn from this:

  1. For genealogy newbies – or not so newbies – this is an example of why you shouldn’t blindly take other trees’ agreement with your information as any sort of verification!
  2. With a bit of patience, it might be possible to make your way through those links and work out who the first person was to enter some nugget of information since copied around all the Ancestry trees. THAT’s the person you need to talk to about the source. You want to talk to them about the source a) to save time and money and b) because it could turn out to be a privately held document that you would never find online or in an archive.

Tuesday, December 3, 2013

Check your Google alerts!

What’s a Google alert? It’s an automatic notification that Google has found something new matching search terms of your choice. It’s very handy.

I have just discovered that for some time now many of my carefully crafted Google alerts have not been working. The reason is that I had used the + search operator. Until a year or two ago, + could be used to force Google to include a particular +word in the search result. No longer. Now you must do the same thing by putting quote marks around the “word”.

I read about it when the change happened, and have used it when searching, but it never occurred to me to check that my alert search terms still worked.

So, go and check your Google alerts!

Tuesday, July 5, 2011

How to make fabulous cousin connections on Ancestry

So you want to make fabulous cousin connections on Ancestry?

I’ve had great success in making fabulous cousin connections on Ancestry. My distant cousins have told me old family stories, sent me documents and photos, suggested places I might look and people I might ask for more information and they have remembered me and contacted me again later. I am holding off on contacting any more of my wonderful distant cousins in order to avoid information overload.

Since this happy situation doesn’t seem to be the case for everyone, I thought I’d share some of the things that have worked for me. Some of what I say may seem counterintuitive. Like… source citations don’t matter (actually they do, but not in the way you think). Bear with me!

Disclosure: I have no connection with Ancestry other than as a paying customer. I am not suggesting that you should, or should not, join Ancestry. While some of my suggestions are specific to Ancestry.com, others are applicable to any genealogy site where you can search family trees and contact the owner.

If you build it…

…they probably won’t come. Build it anyway. Your family tree, that is. Put it on Ancestry. This can be accomplished by exporting from your desktop software to GEDCOM. You will need a skeleton family tree (birth, death and marriage details) for your direct ancestors and at least a few generations of their descendants. Take the usual precautions about removing living people.  Source citations don’t matter, so don’t bother exporting them.

[Gasp! What did she just say?!]

The thing is, you’re putting your tree on Ancestry as a tool, not a publication. I found that my attempts to upload source citations to Ancestry mangled them unacceptably. They cluttered the place up distracting from what I actually wanted and needed to see. They were hard-going to maintain with Ancestry’s horrible source management interface. It wasn’t worth the effort.

If stepping out sans citations embarrasses you, set the tree to Private. You will still be able to do most of what I suggest, although it may assist in establishing a rapport with other members if they can see some of your information. You can change your mind later. I did.

Here are a few reasons for members to put their tree on Ancestry:

  • Other members may contact you. This hasn’t happened to me very often but it is possible.
  • It’s easier to fill out the search forms. As you start typing, Ancestry will offer you a drop down list of ancestors from your tree to choose from. Just click a name, and all the search fields are filled for you.
  • You can avail yourself of the member connect features (I’ll talk more about this).

I have quite a bit more to say, so I think I will break here and make this post a multi-part one.

Friday, June 25, 2010

Drawing on other disciplines

Think of your family as a disease. They infect web pages and database records.

Think of individually opening and inspecting all those web pages and database records as a gold standard diagnostic test. The test is great, but you can't check every single page and record, can you?

What you need is a screening test to pick out the most likely candidates. You can administer the test via a search engine or the database search tools. Your job is to come up with the most appropriate screening test. One that identifies as many cases of the disease as possible, without overloading you with pages that have similar but unrelated symptoms (or family names!).

In medicine, the ability of a screening test to pick up all true cases is called its sensitivity, and it's ability to pick up only true cases is its specificity.

So how can this help us when we are looking for records relating to our family?

Ideally, a medical screening test should be both sensitive and specific - that is it should pick up all true cases, and not too much else. You don't want anyone with the disease to go undiagnosed, but you don't want to subject healthy people to diagnostic tests that may be uncomfortable, embarrassing, or even dangerous. Unfortunately, in a screening test there is always a trade-off between sensitivity and specificity. Otherwise it wouldn't be a screening test, it would be the diagnosis!

The same ideas apply when coming up with search terms to find your family. It's useful to remember that trade-off, and to think about the result you want to achieve.


I started thinking along these lines as I was testing out search terms for use in google alerts last week. Having blogged about a find I made while doing so, I was asked in the chat session for a course I'm doing about how I was setting up my searches. As I blathered on (I fear I did blather) part of my mind was thinking how much easier it would be to properly describe what I was doing, if only I could talk about what I was hoping to achieve in terms of the sensitivity and specificity of my search results.

Now I can.

So... I was trying out google searches for suitability as google alerts. If you are going to create a screening test, you need to know WHY you are screening. What do you hope to find? What are the repercussions if you don't identify every single case?

My reason for creating the alerts is to see new material about my family, even though I may not be actively researching that part of the tree. The impact of missing relevant pages through the alerts is low. I'm likely to come across them when I get around to researching that part of the family. I don't want to wade through irrelevant results each day, and I don't mind creating lots and lot and lots of alerts.

In other words, I'm more concerned about the specificity of my search results than the sensitivity. It would be different if I was looking for my family in a census. I would be much more concerned about finding every relevant record and may have to sacrifice specificity in order to pick up spelling and transcription variations. Fortunately, in genealogy unlike medicine you can usually run lots of population screening tests!


How does this help?

Knowing why you are doing a search makes all the difference to knowing what search terms will get you a good result. I knew that from my starting point of a search on "couper", I wanted a big increase in specificity but didn't mind if I lost sensitivity.

Google these days seems to search on word variations for you, even without using "~" in front of a word. If I had wanted to increase the sensitivity of my test I could ask google to search on multiple names using | between words, which acts as "or". That is, "Couper|Cooper|Coupar|Cowper" will give me 100 million plus results, compared to 5.8 million with "Couper" alone.

There are plenty of ways I can increase specificity of the search. Any additional information on the family might do the trick. The more unique to the family, and the more likely to be mentioned when discussing the family, the better.

Some ideas are place names, street addresses, family member's first names, occupations, employers, year ranges (use ".." for the date range that should appear on the page eg "1850..1935".) All these things can be used to increase the specificity of the search. "Genealogy" or "~genealogy" will also help narrow the results down.

When I searched on Couper and the place name Oakleigh, about 1 million results were returned, with a relevant result that I would want appear in alerts on the front page. I also see that google has added in variations for "couper", giving me an unwanted increase in sensitivity. Suddenly, Oakleigh looks like the place to buy and sell "Mini Cooper S coupe"s which isn't my interest at all!

"+Couper Oakleigh" looks better, and "+Couper Oakleigh ~genealogy" even better again. Just a pity that so many of the results are me, one way or another!

So far I have only set up half a dozen alerts, but google will allow me 1000.

Do you draw on ideas from other fields? 


Note: In case you are wondering, I don't have any medical qualifications, but I do have a post-graduate qualification in Public Health.

Friday, February 19, 2010

Searching Government Gazettes

I've recently been playing with Government Gazettes on the State Library of Victoria (SLV) website, here. The gazettes included are the New South Wales Government Gazette (1836-1851), Port Phillip Government Gazette (1843-1851) and the Victoria Government Gazette (1851-1997).

The State Library website provides and search facility for an index to the gazettes, and the relevant gazette pages can be viewed and downloaded in pdf format. It's all free of charge. However, people's names are not necessarily indexed. The help pages say:
You can find many details about individuals in the Gazette. Sometimes people’s names are listed in the index, but very often they are not. So if you are looking for information about a person it is useful to know something about them first. For example, that they won a government tender, or were appointed to a government position. With this information you can search using keywords related to the tender or position.
It is possible to search these gazettes by name (or any other term you want). To do so, just add the search term site:gazette.slv.vic.gov.au to a Google search. It appears that Google have not only picked up the pdf files, they've also run OCR over them which seems to have worked very well, so they are searchable.

For example:
  • A search on the name "Couper" through the site index came up with eleven results. None of these looked like my family.
  • A Google search over the gazettes on the name "Couper" came up with 277 results. Some may have been my family, but I wasn't in the mood for looking through that many results. A further search on the street name my Couper ancestor lived in netted ten results. Two of these related to my great-greataunt who was listed as a registered midwife. I had believed her to be a midwife, anecdotally, but had only ever seen her described as a nurse in other records. This was a nice find.
I'll definately be playing with this some more...

If you give it a try and find something useful, please come back and comment!  

My intention when I started this blog was to write and post up little pieces of the family story. While that's still my intention, the act of trying to write some of the stories up has shown me how much more work there is to be done! As a result this blog contains bits and pieces of whatever I happen upon that I find useful or interesting. I have previously written about Google searches I didn't expect to be able to do here and here.

Thursday, January 14, 2010

Searching the library catalogue via google

This may be old news, but it's new to me. I was doing a google search today on some work-related subject, when one of the search results caught my inner genealogist's eye. Or not so much the result itself, but the site it came from.

http://catalogue.nla.gov.au/   -   The catalogue of the National Library of Australia (NLA).

I tried searching google for the names of a few reasonably obscure Australian titles, and sure enough an NLA result came up on the first page. Interesting! When I clicked into a result it looked like the standard NLA result page, except that it also contained a big blue box explaining what the catalogue was, services offered by the library, and how to get back to google.

I've mentioned before that the NLA newspaper site is searchable via google, but I didn't realise that the entire catalogue was.

Time to experiment
I tried searching on the words "Avoca history" in the NLA catalogue and came up with 38 results. The google search "Avoca history site:catalogue.nla.gov.au" came up with 478 results. Looking down the list of results, I soon saw a likely reason for the difference in numbers. The google search was reading the entire catalogue page, including headings such as "search history", not just the record results.

I tried the NLA catalogue again without the word "history". This time I had 236 results. Not enough to account for the difference. The gap widened when I tried google on just "Avoca". 2,810 results. Hmmm.

Another scan of the google results, and I could see that they not only included catalogue record pages, but also catalogue search pages, record comment pages, and possibly others attached to the record itself. Difference explained, I think.

I find the NLA's own search results more useful than the google results. They offer all sorts of relevant filtering and sorting options and don't have google's repetition of the real content, the item record. The one time I think I would want to use the google results is to get at the cached pages if the NLA site was unavailable for some reason!

But still, I thought it an interesting discovery. It's nice to think that a relevant NLA catalogue entry could appear in the results for someone who would never have thought to look there otherwise, when doing a google search.

Saturday, October 17, 2009

Getting more from my newspaper archive searches

I've been experimenting with Google's News Archive Search, which we were recently reminded of by Randy Seaver. I had this post in mind before I read his article, I swear!

Although I'll be talking about the application of the Google News Archive Search to the National Library of Australia's (NLA's) newspaper archive site, I imagine my comments would be applicable many of the newspaper archives indexed by Google.

I noticed that the NLA newspaper archive is picked up by the Google search. I was interested to see how the site's own search, and the Google search, would compare. It's possible to make a comparison by adding an appropriate site restriction to the Google search term. eg, site:.nla.gov.au

I compared the search results I got from:

  1. the NLA site's own search (http://newspapers.nla.gov.au/) , and
  2. the Google news archive search (http://news.google.com/archivesearch), limited to the NLA archive.

The first search I tried was the surname STANNUS. It's the surname I usually use for experimenting with new databases. It's common enough that I get hits, but rare enough that the number of hits doesn't overwhelm me. Also I have some idea of how most of the people returned (outside of the USA) connect to my tree, which is nice.

Running the search on the NLA site, I got 259 hits. The Google News Archive Search, limited to site:.nla.gov.au, gave me 61 hits.

This was about what I expected. The NLA seems to have added a lot of newspapers lately and it looked as though the Google indexing had not yet picked up the additional newspapers or changes to the archives (the NLA OCR results are user editable). I could see that Google had picked up older edits to the NLA archives, because a few I had made several months ago came up in the Google search.

Then I noticed something interesting in the Google results. This:



You see how the OCR of newspaper text split the word STANNUS into STAN and NUS? Google picked it up as a hit, the NLA site didn't. (The Stannus referred to turns out to be my GG Uncle).

Further experimentation with a search on "Couper, Oakleigh, butcher" gave 151 results on the NLA site, the first of those being the story about the death of Leslie Couper Miller. There were more hits - 453 - on Google. That was a surprise. I could see that Google had also included hits for "Coupe" and "Coupar". It's a pity that there's no easy way (correct me if I'm wrong!) to find out what set of words Google searched on. I didn't see any Coopers or Cowpers in the Google results. When I tried a search on Couper|Coupar|Coupe I still got 453 results (the "|' works as OR in the search term). I don't think those other common name variations were included.

If I forced Google to look for exactly the search terms given (adding a + in front of each or putting quotation marks around each word works here) I found only 31 results. They did not include the article about young Leslie's death.

All this will change my NLA newspapers search strategy, if only slightly. I think that I will definately still use the NLA (or other archive site) first, thanks to the better coverage and finer options available. I will then follow up with a search via Google as it might pick up some name variations, or OCR errors, that I hadn't thought of.

If you find this interesting or (especially) if it helps you with your searches, please leave a comment!