discussion / Acoustics  / 21 July 2025

A technical and conceptual curiosity... Could generative AI help us simulate or decode animal communication?

Hi everyone,

I recently watched a talk by Aza Raskin where he discusses the idea of using generative models to explore communication with whales. While the conversation wasn’t deeply technical, it sparked a line of thought I’ve been quietly exploring: can we use generative AI to better understand, replicate, and eventually "translate" the communicative structures in animal vocalisations?
I work primarily in conservation and ecology, and I’ve applied several predictive models (CNNs, LSTMs, and some transformer-based architectures with transfer learning). However, I’ve never trained a generative model myself. Still, I’ve been sketching a conceptual pipeline that could link bioacoustics and behavioural ecology.
The idea is now something like start with a large, unlabeled dataset of an animal vocalisations (e.g. bird songs, whale calls, primate vocalisations). Use unsupervised learning (e.g. embeddings + UMAP + HDBSCAN or even kmens, don’t know) to group vocalisations by structure or spectral similarity.
Then, based on field knowledge or ethological observations, manually label some clusters with possible communicative functions (e.g. alarm, contact, courtship). Only if they make sense. Use these labelled clusters to fine-tune a generative model (like SpecGAN, AudioLDM, or even an autoregressive model like WaveNet or VALL-E) to create synthetic sounds conditioned on function.
But… then explore whether this can help us simulate or even engage in meaningful communicative loops, perhaps as a tool for playback experiments or for probing animal perception in a controlled field experiment or exsitu one.
This is still a very very very early-stage idea, more like a sketch hahaha. but I’m curious whether anyone in this community has tried something similar. Not necessarily for whales—any species or sound system would be relevant.




Hi, Jorge,

There was a milestone study a while ago that used AI to show that elephants have specific sounds that translates as individual names that they use to call each other. I think that could give you a starting point and some names to reach out to. Animal translation is a great idea worth pursuing. 

I think you will also appreciate what is going on at the Interspecies internet.

Good luck!

 

Hi Jorge, 

 

I think you'll find this research interesting: https://blog.google/technology/ai/dolphingemma/

Google's researchers did exactly that. They trained an LLM on dolphin vocalizations to produce continuation output, exactly as in the autoregressive papers you've mentioned, VALL-E or WaveNet.

I think they plan to test it in the field this summer and see if it will produce any interesting interaction.

Looking forward to see what they'll find :) 

Besides, two more cool organizations working in the field of language understanding of animals using AI:

https://www.projectceti.org/

https://www.earthspecies.org/

This is a really fascinating concept. I’ve been thinking about similar overlaps between AI and animal communication, especially for conservation applications. Definitely interested in seeing where this kind of work goes.

This is such a compelling direction, especially the idea of linking unsupervised vocalisation clustering to generative models for controlled playback. I haven’t seen much done with SpecGAN or AudioLDM in this space yet, but the potential is huge. Definitely curious how the field might adopt this for species beyond whales. Following this thread closely!