My own mini-scanfests

When you come back home after a productive research trip to an archive or library do you often end up with a stack of photocopies?

Yes, me too.

I use my digital camera whenever I can but sometimes it just isn’t possible to take photos. Sometimes the repository doesn’t allow it, and other times the documents are folded up so well that it is just easier to get the experts to photocopy them. When I get home I tend to leave them for a while in the ‘filing’ pile, and the longer they stay there the harder it is to get around to dealing with them.

For me a major part of the post-research trip process is scanning the photocopies. A piece of paper is no good to me if it fades or gets tea spilled on it, or the laser toner sticks to something other than the paper, or it goes up in a bushfire.

To address the post-research filing issue I bought one of those multi-function printers. It prints in colour and black-and-while, it scans, it photocopies, and it faxes. It’s a marvel of modern technology. When I chose it I made sure of two things -

  1. it prints and scans both sides of the paper (duplex)
  2. it has a document feeder

The duplex requirement is fairly self-explanatory. The document feeder means I can put a stack of pages in the top, press some buttons to tell it to scan to my laptop, and away it goes. All I have to do is press the OK button on the laptop, and then I can get on with something else. If both sides of the page needs to be scanned I can select that option and the pages are scanned in the correct order.

Of course, at some stage I have to rename the files to something more meaningful than SCAN0001.jpg or whatever I’ve chosen as the default, but I can do that later, and sitting down.

My scanner is not much bigger than A4, so A3 photocopies are a problem. There are a couple of solutions – perhaps you have others?

  1. scan each half at a time, making two images that can then be joined together (or not!) in your photo software
  2. photocopy the A3 at a library or somewhere with a big photocopier, reducing it to A4, and then scan the A4 photocopy. Yes, some quality is lost, but it takes much less time and is more likely to result in a useable scan than option 1, which I rarely get around to doing.

Another important part of the process is to write the citation on the photocopy before scanning it, if I hadn’t already done it at the time of the photocopying. If I’ve requested copies at State Records NSW I pay for them before I leave and so this labelling must be done at home, preferably the same day while the file is still fresh in my mind.

Then there’s the analysing, data entry, filing into my family binders, and all of the other tasks that give meaning to whatever I’ve found, but that’s another story.

What do you do with your photocopies when you get them home?

Sharing documents on the web

I’ve been playing with a couple of sites that allow you to share documents. Initially I had to find a way to share Powerpoint slides on a blog, and my solution was to use Slideshare, a free website that allows you to share Powerpoint slides.

Slideshare is simple to use and works well. You can upload presentations quickly and easily, and make them public or restricted access, by being given a URL that you then share with those you wish to have access to the presentation. Viewers can leave comments, although if your presentation is public these may be spam, a common hazard.

The winner, though, is Scribd.

My Scribd profile

With Scribd I can share other kinds of documents, not just Powerpoint, so I can keep the slides and the handouts together. PDFs, Word, Excel, so far I haven’t found a format I can’t upload, although I admit I haven’t tried very hard. It does what I need so far.

Scribd upload

As you can see, you can import Google Docs and even create one from scratch by typing or cut-and-pasting into the text box. I haven’t tried either of these yet. I can see why sharing a Google Doc here would be easier for the people I know who inexplicably have trouble with Google Docs, particularly if you just want them to see it and not update it.

Others share documents, academic papers, even whole books on Scribd, and you can download the documents and follow the uploaders to see what else they come up with. You can also add documents of interest to collections so you can more easily find them again later, without having to download them.

You can also upload documents that you want to sell. I may do this in the future.

Have a look at Scribd and let me know what you think.

Picasa face-recognition scan conclusions

Picasa face recognitionI have posted previously about letting Picasa 3 scan for faces so I can identify them. I had hoped to publish the results at the time but I was caught up with other things and didn’t get a chance.

Unfortunately I don’t have an accurate record of how long it took. I started it on about the 1st October with 14,000 photos to process. On the 4th it was 50% completed after I had added an additional 5000 photos because I added some of the folders under Documents. On the 5th it was saying all day that it had 51% to go. Then that evening it changed to 52%. I thought it was going to take another week, but the next day it was finished.

That’s 5-6 days. For 19,000 photos.

It ran for 24 hours a day, and I only closed it down occasionally when it was slowing down what I was doing. It used an average of 45% of my CPU, so sometimes this was a problem. I don’t remember the processor that my laptop has, but it’s a bit over 2 years old.

Of course, not all of these photos have people in them – there are landscapes, wildlife, and images of documents.

Some things I have noticed:

  • if I sign in to Google it can get the names from my contacts list
  • it runs very slowly at other times and quite quickly at others
  • it picks up faces from the covers of books and photos on the wall behind the real people
  • it can find faces in very fuzzy pictures
  • it is not bothered by hats and sunglasses
  • it quite often suggests the wrong person but that person is closely related, such as a sister, aunt or grandmother
  • it identifies people more accurately the more photos you have identified
  • it can identify people at all ages in their lives
  • it is better at identifying babies than I am
  • it doesn’t recognise cats, dogs or gorillas, although it did identify one front-on picture of a dog
  • I have a lot of duplicate photos, and when I identify one it suggests the same name for the others very quickly
  • I am terrible at remembering names
  • I nearly have more photos of my nieces than I have of my husband or myself

By the time it finished it said it still had about 6500 faces to identify. I am slowly whittling those down. I now have just over 5000. There are also the faces it can’t identify as faces, which I have to do manually if I want it done at all.

It seems to have trouble with faces if they are:

  • at an angle
  • have hair over one side
  • side-on unless they are completely from the side
  • really, really fuzzy

And yet sometimes it sees a face where there isn’t one. I thought this one must be in the background somewhere.

Panda face

He looks like he has a little beard and a receding hairline.

This is the photo it came from:

Picasa panda

Can you see the face, in the top right corner? Not a face at all!

It also picks up the hundreds of faces in the backgrounds of photos and wants to know who they are. You can mark each one as ignored, and you can see these later if you want to. When the Sydney Harbour Bridge was 75 years old they opened it to the public to walk across, and the photos from that day have many people in the background. Fortunately they are mostly wearing lime green hats so I could quickly exclude them when I saw them.

All the people in a wedding photo can be identified if you have already identified them elsewhere. Even if you don’t know their names you can give them a number, like Wedding 12, and group photos of the same person together. You can then more easily identify the person, or a relative can, when you can see a number of photos of the same person together.

I have had a wonderful time with Picasa, and I still am. I am finally learning, through having to identify photos, which of my grandmother’s three sisters is which, and what my mother’s older brothers looked like when they were young.

I have also very much enjoyed seeing pictures of the same person throughout their lives all in the one place. Here are some of my grandmother Amy Eason nee Stewart:

Amy Millicent Eason nee Stewart

You can see her from the earliest photo of her that I have, when she was a baby; as a teenager, a young mother, and so on all through her life. The photos are of varying quality but the only one I had to manually identify was the blurry side-on one in the 3rd row.

A valuable lesson I learned was in trying to identify what it is that makes this person look like that person. What is it in my face that Picasa mistakes for my grandmother’s? Or two of three nieces but not the third?

To be fair, sometimes Picasa is totally wrong. It tried to tell me that this same grandmother was in a shot of my husband posing with the Wests Tigers rugby league team. It wasn’t. When it ‘groups’ unnamed faces it tends to put faces together that are shot at the same angle. Sometimes I think it is suggesting names based on the frequency with which that name appears, or on the previously identified name, but that might just be my cynicism.

All in all I am so glad I went through this exercise. Identifying faces has become my procrastination-of-choice, and it has made me much more likely to name the faces of photos I have just taken rather than leave it for years when I can no longer remember the names. I am also determined to research the names I should know but can’t remember – school classmates, fellow safari tourists, even Wests Tigers. All those unnamed faces bother me!