Group

Data management and processing tools / Feed

Conservation tech work doesn't stop after data is collected in the field. Equally as important to success is navigating data management and processing tools. For the many community members who deal with enormous datasets, this group will be an invaluable resource to trade advice, discuss workflows and tools, and share what works for you.

discussion

Camera Trap Data Visualization Open Question

Hi there,I would like to get some feedback, insight into how practitioners manage and visualize their camera trap data.We realized that there exists already many web based...

6 0

Hey Ed! 

Great to see you here and thanks a lot for your thorough answer.
We will be checking out Trapper for sure - cc @Jeremy_ ! A standardized data exchange format like Camtrap DP makes a lot of sense and we have it in mind to build the first prototypes.
 

Our main requirements are the following:

  • Integrate with the camtrap ecosystem (via standardized data formats)
  • Make it easy to run for non technical users (most likely an Electron application that can work cross OSes).
  • Make it useful to explore camtrap data and generate reports

 

In the first prototyping stage, it is useful for us to keep it lean while keeping in mind the interface (data exchange format) so that we can move fast.


Regards,
Arthur

Quick question on this topic to take advantage of those that know a lot about it already. So once you have extracted all your camera data and are going through the AI object detection phase which identifies the animal types. What file formation that contains all of the time + location + labels in the photos data do the most people consider the most useful ? I'm imagining that it's some format that is used by the most expressive visualization software around I suppose. Is this correct ?

A quick look at the trapper format suggested to me that it's meta data from the camera traps and thus perform the AI matching phase. But it was a quick look, maybe it's something else ? Is the trapper format also for holding the labelled results ? (I might actually the asking the same question as the person that started this thread but in different words).

Another question. Right now pretty  much all camera traps trigger on either PIR sensors or small AI models. Small AI models would tend to have a limitation that they would only accurately detect animal types and recognise them at close distances where the animal is very large and I have question marks as to whether small models even in these circumstances are not going to make a lot of classification errors (I expect that they do and they are simply sorted out back at the office so to speak). PIR sensors would typically only see animals within say 6m - 10m distance. Maybe an elephant could be detected a bit further. Small animals only even closer.

But what about when camera traps can reliably see and recognise objects across a whole field, perhaps hundreds of meters?

Then in principle you don't have to deploy as many traps for a start. But I would expect you would need a different approach to how you want to report this and then visualize it as the co-ordinates of the trap itself is not going to give you much  information. We would be in a situation to potentially have much more accurate and rich biodiversity information.

Maybe it's even possible to determine to a greater degree of accuracy where several different animals from the same camera trap image are spatially located, by knowing the 3D layout of what the camera can see and the location and size of the animal.

I expect that current camera trap data formats may fall short of being able to express that information in a sufficiently useful way, considering the in principle more information available and it could be multiple co-ordinates per species for each image that needs to be registered.

I'm likely going to be confronted with this soon as the systems I build use state of the art large number of parameter models that can see species types over much greater distances. I showed in a recent discussion here, detection of a polar bear at a distance between 130-150m.

Right now I would say it's an unknown as to how much more information about species we will be able to gather with this approach as the images were not being triggered in this manner till now. Maybe it's far greater than we would expect ? We have no idea right now.

See full post
discussion

Machine learning for bird pollination syndromes

I am a PhD student working on bird pollination syndromes in South Africa and looking specifically at urbanizations effect on sunbirds and sugarbirds. By drawing from a large...

2 2

Hi @craigg, my background is machine learning and deep neural networks, and I'm also actively involved with developing global geospatial ecological models, which I believe could be very useful for your PhD studies.  

First of all to your direct challenges, I think there will be many different approaches, which could serve more or less of your interests.

As one idea that came up, I think it will be possible in the coming months, through a collaboration, to "fine-tune" a general purpose "foundation model" for ecology that I'm developing with University of Florida and Stanford University researchers.  More here.

You may also find the 1+ million plant trait inferences searchable by native plant habitats at Ecodash.ai to be useful.  A collaborator at Stanford actually is from South Africa, and I was just about to send him this e.g. https://ecodash.ai/geo/za/06/johannesburg

I'm happy to chat about this, just reach out!  I think there could also be a big publication in Nature (or something nice) by mid-2025, with dozens of researchers demonstrating a large number of applications of the general AI techniques I linked to above.

See full post
discussion

Free/open-source app for field data collection

Hi all, I know something similar was asked a year ago but I'd like some advice on free applications or software for collecting data in the field on an Android device (for eventual...

11 3

Thanks! Essentially field technicians, students, researchers etc. go out into the field and find one of our study groups and from early in the morning until evening the researchers record the behaviour of individual animals at short intervals (e.g., their individual traits like age-sex class, ID, what the animal is doing, how many conspecifics it has within a certain radius, what kind of food the animal is eating if it happens to be foraging). Right now in our system things work well but we are using an app that is somewhat expensive so we want to move towards open-source

See full post
Link

Accessing the Global Register of Introduced and Invasive Species (GRIIS)

Signposts to different ways to access the new/updated GRIIS (through GBIF or open-access country lists) - as part of the IAS Toolkit for GBF target 6

1
discussion

Which LLMs are most valuable for coding/debugging?

Hello! I'm curious to get folks' impressions on the most useful LLMs for helping with data analysis/debugging code? I've been using chatgpt, which is at times helpful and at times...

4 2

When it comes to coding and debugging, several large language models (LLMs) stand out for their value. Here are a few of the most valuable LLMs for these tasks:

1. OpenAI's Codex: This model is specifically trained for programming tasks, making it excellent for generating code snippets, suggesting improvements, and even debugging existing code. It powers tools like GitHub Copilot, which developers find immensely helpful.

2. Google's PaLM: Known for its versatility, PaLM excels in understanding complex queries, making it suitable for coding-related tasks as well. Its ability to generate and refine code snippets is particularly useful for developers.

3. Meta's LLaMA: This model is designed to be adaptable and can be fine-tuned for specific coding tasks. Its open-source nature allows developers to customize it according to their needs, making it a flexible option for coding and debugging.

4. Mistral: Another emerging model that shows promise in various tasks, including programming. It’s being recognized for its capabilities in generating and understanding code.

These LLMs are gaining traction not just for their coding capabilities but also for their potential to streamline the debugging process, saving developers time and effort. If you want to dive deeper into the features and strengths of these models, you can check out the full article here: Best Open Source Large Language Models  LLMs

See full post
discussion

Video evidence for the evaluation of behavioral state predictions

Hi all, glad to share two of our contributions to the current e-obs newsletter in the context of the evaluation of behavioral state predictions and the mapping of...

6 0

Currently, the main focus is visual footage as we don't render audio data in the same way as we do for acceleration (also: the highly different frequencies can be hard to show sensibly side by side).


But In this sense, yes, the new module features 'quick adjust knobs' for time shifts: you can roll-over a timestamp and use a combination of shift/control and mouse-wheel to adjust the offset of the video by 1/10/60 seconds or simply enter the target timestamp manually down to the millisecond level. This work can then and also be saved in a custom mapping file to continue synchronisation work later on.

 

No, not yet. The player we attached does support slower/faster replay up to a certain precision, but I'm not sure that this will be sufficiently precise for the kind of offsets we are talking about. Adding an option on the frontend to adjust this is quite easy, but understanding the impact of this on internal timestamp handling will add a level of complexity that we need to experiment with first. 

As you said, for a reliable estimate on this kind of drift we need at least 2 distinct synchronized markers with sufficient distance to each other, e.g. a precise start timestamp and some recognizable point event later on.

I perfectly agree that providing an easy-to-use solution does make perfect sense. We'll definitely see into this.

See full post
discussion

Firetail 13 - now available

Thanks to our wonderful user community and a lot of feedback, shared sample data and fruitful discussions I am glad to announce that Firetail 13 is now available, featuring a...

10 2
See full post
discussion

Detecting Thrips and Larger Insects Together

Hello everyone,I’m reaching out to discuss a challenge we’re tackling here in Brazil related to pest monitoring in agriculture. Thrips (Frankliniella spp., Thrips spp...

6 1
Hi Kim, the yellow sticky paper I have today is around 10cm by 30cm. I took a picture with a really good cellphone of the whole paper and the resolution was not good enough (3072 × 4080 in Ultra HDR). This gave me 10pixels / mm, but I could not get a precise enough model with yolo at this resolution for the Thrips... I will play around with the cellphone for a bit more and see if 2 or 4 pictures are enough. We even made a support for the cellphone to have always the same distance (but if I could avoid this for practical reasons in the future would be fine too). With the digital microscope we used we got over 50 pixels per mm and so got a quiet good model in identifying them (but time consuming), sometime dust also shows up and with the phone camera you can't differentiate Thrips and dust ;)))) Lets see if I can edit my post to include some cellphone images into it.

Yeah, I would expect that you might need to have higher resolution if the critters are very small. Still might be just a lens choice. But not up on this amount of lens difference, so don't know how hard it would be.

So, updated the text a bit with images cropped at 100% zoom :) we are already happy with the time reductions we got, but... would like to get at least 90% time reduction instead of 70% :))) we know that with a very expensive and high power camera we could probably do it, so one approach we are thinking of is just taking a closer macro picture with a cellphone of let's say 1/3 or 1/4 of the sticky paper and use this data instead of everything...  or take 2-3 pictures (but we don't want to waste time in sticking the images together).

See full post
discussion

Mirror images - to annotate or not?

We are pushing hard to start training our custom model for the PolarBearWatchdog! soon.This includes lots of dataset curation and annotation work.A question that has come up is...

18 0

I made a few rotation experiements with MD5b.

Here is the original image (1152x2048) :

When saving this as copy in photoshop, the confidence on the mirror image changes slightly:

and when just cropping to a (1152*1152) square it changes quite a bit: 

The mirror image confidence drops below my chosen threshold of 0.2 but the non-mirrored image now gets a confidence boost.

Something must be going on with overall scaling under the hood in MD as the targets here have the exact same number of pixels. 

I tried resizing to 640x640:

640x640

This bumped the mirror image confidence back over 0.2... but lowered the non-mirrored confidence a bit... huh!?

My original hypothesis was that the confidence could be somewhat swapped just by turning the image upside down (180 degree rotation):

Here is the 1152x1152 crop rotated 180 degrees:

The mirror part now got a higher confidence but it is interpreted as sub-part of a larger organism. The non-mirrored polar bear had a drop in confidence.

So my hypothesis was somewhat confirmed...

This leads me to believe that MD is not trained on many upside down animals .... 

 

Seems like we should include some rotations in our image augmentations as the real world can be seen a bit tilted - as this cropped corner view from our fisheye at the zoo shows.

See full post
discussion

Conservation Data Strategist?

Hello everyone – long time lurker, first time poster...I’m the founder of a recently funded tech startup working on an exciting venture that bridges consumer technology and...

9 1

Great resources being shared! Darwin Core is a commonly used bio-data standard as well.  

For bioacoustic data, there are some metadata standards (GUANO is used by pretty much all the terrestrial ARU manufacturers). Some use Tethys as well.

Recordings are typically recorded as .WAV files but many store them as .flac (a type of lossless compression) to save on space. 

For ethics, usually acoustic data platforms with a public-facing component (e.g., Arbimon, WildTrax, etc.) will mask presence/absence geographical data for species listed on the IUCN RedList, CITES, etc. so that you're not giving away geographical information on where a species is to someone who would use it to go hunt them for example. 

 

Hello, I am experienced in conservation data strategy. If you want to have a conversation you can reach me at SustainNorth@gmail.com.

See full post
discussion

unsupervised machine learning to infer syntax and temporal organisations of animal vocalizations

I wish to share an update on my MSc thesis project, that contributes to the field of decoding animal communication. In my work I conducted a factorial experiment  to...

1 2

Super interesting work! Maybe one day you will also be able to have a career as a science fiction writer. Lot of interesting outcomes can come of this. ❤️

See full post
discussion

Need advice on data for an app that recommends plants.

This is going to show how little I know, but I'm doing my best! Basically I could use help understanding what I should do from a data standpoint in order to enable a...

1 2

It's a really good question, Colleen!

Ideally, I would work with two developers, or at least one developer and one other party who knows what developing an app like this would mean. This will lower the risk that the developer answers your question too much in their advantage. So this is one reason why it is a good question, and maybe this is why you ask it. However, if you really trust your developer not to take advantage, then go with just her/him/they.

It's also a good question because there is no easiest way to go about this if you're on a budget. You mention that you do not have a tech background, so here comes some explaining. If I misunderstood, then please skip and continue at "Back to 'the easiest way to go about this'"

There are at least two things you should be really aware of when it comes to software development.

The first is that once the basic data structure is defined and the software built around it, it is extremely costly to change the data structure. It's like deciding after the car has been built, that the engine should go to the back of the car instead of the front.

The second thing is that the basic data structure is dependent on ( among other things, but I'd say these are the two most important factors ) the complexity of what needs to be achieved and the speed at which it needs to be done.

The difficulty is that the required complexity and the speed may change over time, which brings one back to the car and engine situation. Changes in complexity and speed requirements may be the result of many things, one of which is success. You get far more clients than anticipated, so the system needs to be scaled up. In addition, with more users come more feature requests ( this can work both ways: new features result in more users, and more users may result in more feature requests ).

There is no real solution to this problem ( well, except not growing beyond the point that the first design can handle ). When it comes to scaling up, one vendor may claim that their database back bone easily scales up. Maybe so - but it may come at a price and they may also underestimate your and their own future needs. When it comes to changes in complexity, additional features can in the beginning probably be added on without changing the basic data structure. Maybe an additional row of seats at the back of the car, a trailer hook, bigger lamps, a roof-rack, a trailer, suitcases on the rack. At some point the car will need a new and bigger engine to carry all those add ons and keep at the same speed.

Here is a prediction : the more you stress cheap and efficient at the beginning, the bigger these problems will be later on. But when the business is successful, there will be more money to invest in scaling up and redesigning. Obviously yes, but in terms of the car metaphor, you may find that you want the car to be kept running with all its added on features, while the engine is replaced and moved to the back. It may be possible, but perhaps out of reach of patience and the increased income.

 

Back to 'the easiest way to go about this':

Invest a little effort to find out not only what are the minimum requirements for the MVP, but also what else you or your clients may want in the future. The developer should then have these future requirements in mind (and future upscaling) when they start developing for the minimum ones. This means, develop a somewhat more generic data structure than what is necessary for the MVP. This will cost some more at the beginning but should save a lot later on. I'm writing 'should' not 'will' on purpose. It's a balancing act because taking too much into account has the risk of over-engineering for a future that may not happen, or develop differently than expected. Like I said, there is no easiest way out.

 

A few more detailed comments

If there is no API for a source, try to go around it if possible at the beginning. API's are made with some long term stability in mind. Websites and web pages not necessarily so or less so. They will require more monitoring and maintenance on the web scraping routines.

Perhaps the developer may not be aware of the necessary pre-work that needs to be done if the pre-work depends on biological knowledge needed to transform the data from your data sources into data that allows easy ( and fast ) calculation of results to the users. 
 

See full post