It started in 2017 when an international group of scientists spent a year together at Harvard University in Cambridge, Massachusetts, at the Radcliffe Fellowship, a program that promises “an opportunity to step away from usual routines.” One day, Shafi Goldwasser, a computer scientist and cryptography expert also from Israel, came by the office of David Gruber, a marine biologist at City University of New York. Goldwasser, who had just been named the new director of the Simons Institute for the Theory of Computing at the University of California, Berkeley, had heard a series of clicking sounds that reminded her of the noise a faulty electronic circuit makes—or of Morse code. That’s how sperm whales talk to each other, Gruber told her. “I said, ‘Maybe we should do a project where we are translating the whale sounds into something that we as humans can understand,’” Goldwasser recounts. “I really said it as an afterthought. I never thought he was going to take me seriously.”
But the fellowship was an opportunity to take far-out ideas seriously. At a dinner party, they presented the idea to Bronstein, who was following recent advancements in natural language processing (NLP), a branch of AI that deals with the automated analysis of written and spoken speech—so far, just human language. Bronstein was convinced that the codas, as the brief sperm whale utterances are called, have a structure that lends them to this kind of analysis. Fortunately, Gruber knew a biologist named Shane Gero who had been recording a lot of sperm whale codas in the waters around the Caribbean island of Dominica since 2005. Bronstein applied some machine-learning algorithms to the data. “They seemed to be working very well, at least with some relatively simple tasks,” he says. But this was no more than a proof of concept. For a deeper analysis, the algorithms needed more context and more data—millions of whale codas.