discussion / AI for Conservation  / 1 August 2024

Beginner Bird Classification Project Question

I'm currently working on a bird image classification project for two birds, the warbling vireo and the house wren. I have pulled four months of entries for both of them through pyinaturalist, a client for the iNaturalist API. The accumulated data - including a photo URL - is saved in a combined CSV file. However, I am hoping to use the photos themselves for my model, not the URL. After searching for some guidance, I feel a bit overwhelmed and unsure of where to go from here. Should I be downloading all of the photos? Any tips would be appreciated. Glad to have this platform to ask questions! 




Hi Ashley,

What is your ultimate goal in this project - are you just wanting to gather a labeled training dataset or used existing labeled data from iNaturalist to train your own ML model? Do you have unlabeled data you're hoping to run the model through? There are a number of existing camera trap software and stats packages so you perhaps wouldn't have to re-invent the wheel. But this depends on what the end goal of your project is. 

I always recommend folks take a look at Dan Morris' ( @dmorris ) ML & camera-trap resource page here - tons of amazing info!

-Carly

I second Carly's question about having a really concrete idea what your end goal is, i.e. what you want your model to do that the iNaturalist and/or Merlin apps don't do.  If the answer is "I just want to learn about training image classifiers and I like birds", that's also a great answer.  The data you want to download will also depend a lot on how your images will be collected: a human taking pictures of birds, for example, suggests a very different training dataset than a video camera running 24/7 pointed at a bird feeder.

But... to answer your question, yes, I would definitely download a whole bunch of images even while you're figuring out exactly what you want to do; downloading can take some calendar time, and it's a huge pain when you really get some flow going and then realize you have to pause for a day to download stuff.  I would basically download all the images of your target species that you can conveniently fit on your hard drive, no reason to limit yourself to the last month.

That said, assuming that those aren't the only two species you will ever see in your camera, the background categories are just as important.  You may be able to lump everything else into an "other" category, but if you have five or six non-target species that are really common, I would recommend creating categories for those, plus categories for the birds that are likely to appear that are visually similar to your target species, even if they're a little less common.  So for example, if you have a moderate number of warbling vireos and house wrens, but your camera will see tons of robins, tons of crows, and the occasional marsh wren (or anything that looks like a house wren), all of those categories are equally important.  So you might want to download a data set that's equal parts warbling vireo, house wren, robin, crow, marsh wren, and [everything else].  If you're only ever going to take pictures of your two target species and you just want to tell them apart, you don't need all those other classes.

Hope that's some useful food for thought!