Hi everyone,
I recently watched a talk by Aza Raskin where he discusses the idea of using generative models to explore communication with whales. While the conversation wasn’t deeply technical, it sparked a line of thought I’ve been quietly exploring: can we use generative AI to better understand, replicate, and eventually "translate" the communicative structures in animal vocalisations?
I work primarily in conservation and ecology, and I’ve applied several predictive models (CNNs, LSTMs, and some transformer-based architectures with transfer learning). However, I’ve never trained a generative model myself. Still, I’ve been sketching a conceptual pipeline that could link bioacoustics and behavioural ecology.
The idea is now something like start with a large, unlabeled dataset of an animal vocalisations (e.g. bird songs, whale calls, primate vocalisations). Use unsupervised learning (e.g. embeddings + UMAP + HDBSCAN or even kmens, don’t know) to group vocalisations by structure or spectral similarity.
Then, based on field knowledge or ethological observations, manually label some clusters with possible communicative functions (e.g. alarm, contact, courtship). Only if they make sense. Use these labelled clusters to fine-tune a generative model (like SpecGAN, AudioLDM, or even an autoregressive model like WaveNet or VALL-E) to create synthetic sounds conditioned on function.
But… then explore whether this can help us simulate or even engage in meaningful communicative loops, perhaps as a tool for playback experiments or for probing animal perception in a controlled field experiment or exsitu one.
This is still a very very very early-stage idea, more like a sketch hahaha. but I’m curious whether anyone in this community has tried something similar. Not necessarily for whales—any species or sound system would be relevant.
25 July 2025 1:32pm
Hi, Jorge,
There was a milestone study a while ago that used AI to show that elephants have specific sounds that translates as individual names that they use to call each other. I think that could give you a starting point and some names to reach out to. Animal translation is a great idea worth pursuing.
I think you will also appreciate what is going on at the Interspecies internet.
Good luck!
Interspecies Internet
" /> <link rel=
African elephants address one another with individually specific name-like calls | Nature Ecology & Evolution
Personal names are a universal feature of human language, yet few analogues exist in other species. While dolphins and parrots address conspecifics by imitating the calls of the addressee, human names are not imitations of the sounds typically made by the named individual. Labelling objects or individuals without relying on imitation of the sounds made by the referent radically expands the expressive power of language. Thus, if non-imitative name analogues were found in other species, this could have important implications for our understanding of language evolution. Here we present evidence that wild African elephants address one another with individually specific calls, probably without relying on imitation of the receiver. We used machine learning to demonstrate that the receiver of a call could be predicted from the call’s acoustic structure, regardless of how similar the call was to the receiver’s vocalizations. Moreover, elephants differentially responded to playbacks of calls originally addressed to them relative to calls addressed to a different individual. Our findings offer evidence for individual addressing of conspecifics in elephants. They further suggest that, unlike other non-human animals, elephants probably do not rely on imitation of the receiver’s calls to address one another. Machine learning analyses and playback experiments in wild African elephants suggest that individuals address conspecifics with name-like calls that do not rely on imitation of the receiver.
25 July 2025 2:30pm
Hi Jorge,
I think you'll find this research interesting: https://blog.google/technology/ai/dolphingemma/
Google's researchers did exactly that. They trained an LLM on dolphin vocalizations to produce continuation output, exactly as in the autoregressive papers you've mentioned, VALL-E or WaveNet.
I think they plan to test it in the field this summer and see if it will produce any interesting interaction.
Looking forward to see what they'll find :)
Besides, two more cool organizations working in the field of language understanding of animals using AI:
28 July 2025 3:51pm
This is a really fascinating concept. I’ve been thinking about similar overlaps between AI and animal communication, especially for conservation applications. Definitely interested in seeing where this kind of work goes.
30 July 2025 9:46am
This is such a compelling direction, especially the idea of linking unsupervised vocalisation clustering to generative models for controlled playback. I haven’t seen much done with SpecGAN or AudioLDM in this space yet, but the potential is huge. Definitely curious how the field might adopt this for species beyond whales. Following this thread closely!
Maristela Martins de Camargo