Holly Herndon on the power of machine learning and developing her “digital twin” Holly+

In the latest episode of The FADER Interview podcast, Jordan Darville speaks with Holly Herndon about her wild new project.

By Jordan Darville, The FADER

Holly Herndon on the power of machine learning and developing her “digital twin” Holly+

The FADER Interview is a brand new podcast series in which the world’s most exciting musicians talk with the staff of The FADER about their latest projects. We’ll hear from emerging pop artists on the verge of mainstream breakthroughs, underground rappers pushing boundaries, and icons from across the world who laid the foundations for today’s thriving scenes. Listen to this week’s episode of the podcast below, read a full transcript of this week’s episode after the jump, and subscribe to The FADER Interview wherever you listen to podcasts.

Holly Herndon is an experimental composer based in Berlin and her latest project isn't an album, but something that has much deeper implications for music as a whole. It's called Holly+, and it Herndon's deep fake AI, digital twin. The tool transforms any audio or user uploads into a new sound using Herndon's voice. It's free and anyone can visit the website to experiment with it. If you don't know what a deep fake is, you probably encountered one on the internet and not realized it. A deepfake is an artificial replication of a person. To create a deepfake, programmers feed hours of material of the person they want to copy into sophisticated computer programs. In Holly Herndon's case, she worked with her partner, Matt Dryhurst, as well as Chris Deaner and Yotam Mann of Never Before Heard Sounds sending her isolated vocals into the program.

Holly+ separates itself from other AI models in the complexity of the sounds it's able to work with and produce as well as on a conceptual level. Holly+ will be overseen by a decentralized autonomous organization, otherwise known as a DAO. It's a group of people selected by Herndon and Dryhurst that would officially license out Holly+ to approved artists, giving Herndon more control over her deepfake likeness and what it's used to create. Perhaps sort of this sounds strange, even scary, but as Herndon writes in her press release for Holly+, "focal deep fakes are here to stay." You may have caught the controversy last week over the upcoming Anthony Bourdain documentary, Roadrunner, where director Morgan Neville used a deep fake to recreate Bourdain's voice for at least one portion of the film's duration.

Herndon believes technology like Holly+, if managed properly, can empower artists and give them control over their likeness in an era when frauds are only going to get harder spot. One day after Holly+ went live The FADER's Jordan Darville spoke with Herndon about her intentions in creating the tool, the controversy surrounding NFTs and who she hopes will give Holly+ a spin.

The FADER: Holly Herndon, thank you for joining us today for The FADER interview.

Holly Herndon: Thanks for having me.

So Holly+ has been live for about 24 hours. How have you felt about its reception so far?

Honestly, I've been really super pleased with it. I think at one point there were 10 hits to the server a second. So that means people were kind of going insane uploading stuff and that's basically what I wanted to happen. So I've been really, really happy with it. I also am happy with people kind of understanding that it's like, this is still kind of a nascent tech, so it's not a perfect rendition of my voice, but it's still, I think, a really interesting and powerful tool. And I think most people really got that. So I've been really pleased.

That's one of the things that drew me to Holly+ when I first read about it in a press release was that it seems like the technology is being developed specifically for this time and it is nascent and it is sort of still growing, but it feels like an attempt to get in on the ground floor of something that is already happening in a lot of different sectors of technology.

I mean, I've been working with, I like to say machine learning rather than artificial intelligence, because I feel like artificial intelligence is just such a loaded term. People imagine kind of like Skynet, it's kind of sentient. So I'm going to use machine learning for this conversation. But I've been working with machine learning for several years now. I mean the last album that I made, PROTO, I was creating kind of early models of my voice and also the voices of my ensemble members and trying to create kind of a weird hybrid ensemble where people are singing with models of themselves. So it's been going on for a while and of course machine learning has been around for decades, but there have been some really interesting kind of breakthroughs in the last several years that I think is why you see so much activity in this space now.

It's just so much more powerful now I think, than it was a couple decades back. We had some really interesting style transfer, white papers that were released. And so I think it's an exciting time to be involved with it. And I was really wanting to release a public version of kind of a similar technique that I was using on my album that people could just play with and have fun with. And I was actually just kind of reaching out to people on social media. And Yotam sent me back a video of one of my videos, but he had translated it into kind of an orchestral passage. And he was like, "I'm working on this exact thing right now." So it was perfect timing. And so we kind of linked up and started working on this specific Holly+ model.

Talk to me a little bit about some of those really powerful developments in machine learning that have informed Holly+.

Oh gosh. I mean, there's a whole history to go into. But I guess a lot of the research that was happening previously is a lot of people were using MIDI. Basically kind of trying to analyze MIDI scores to create automatic compositions in the style of an existing composer or a combination of existing composers. And I found that to be not so interesting. I'm really interested in kind of the sensual quality of audio itself. And I feel like so much is lost in a MIDI file. So much is lost in a score even. It's like the actual audio is material. I find really interesting. And so when some of the style transfers started to happen and we could start to deal with audio material rather than kind of a representation of audio through MIDI or through score material, that's when things I think got to be really interesting.

So you could imagine if you could do a style transfer of kind of any instrument onto any other instrument. Some of the really unique musical decisions that one might make as a vocalist or as a trombonist or as a guitarist, they're very different kind of musical decisions that you make depending on the kind of physical instrument that you're playing. And if you can kind of translate that to another instrument that would never make those kinds of same decisions, and I'm giving the instrument sentience here, but you know what I mean. Some of the kinds of decisions that a musician would make playing a specific instrument, if you can translate that onto others, I find that a really interesting kind of new way to make music and to find expression through sound generation. And I do think it is actually new for that reason.

I also wanted to talk a little bit about some of the ethical discussions around machine learning and some of the developments that have happened over the past year. Of course, the last time we spoke, it was about Travis Scott, which was an AI generated version of a Travis Scott song using his music, which was created without his consent. And over the past year as well, a Korean company has managed to develop an AI based on deceased popular singer and an entire reality show was created around that. It was something called AI vs. Human. So I was wondering if these sorts of developments in this sphere informed how you approached Holly+ and the more managerial aspects of how you wanted to present it to the world.

This is something that I think about quite a lot. I think that voice models, or even kind of physical likeness models or kind of style emulation, I think it opens up a whole new kind of question for how we deal with IP. I mean, we've been able to kind of re-animate our dead through moving picture or through samples, but this is kind of a brand new kind of field in that you can have the person do something that they never did. It's not just kind of replaying something that they've done in the past. You can kind of re-animate them in and give them entirely new phrases that they may not have approved of in their lifetime or even for living artists that they might not approve of. So I think it opens up a kind of Pandora's box.

And I think we're kind of already there. I mean if you saw the Jukebox project, which was super impressive. I mean, they could start with a kind of known song and then just kind of continue the song with new phrases and new passages and in a way that kind of fit the original style. It's really powerful. And we see some of the really convincing Tom Cruise deep fakes and things. These are kind of part of, I think, our new reality. So I kind of wanted to jump in front of that a little bit. There's kind of different ways that you could take it. You could try to be really protective over your self and your likeness. And we could get into this kind of IP war where you're just kind of doing take downs all the time and trying to hyper control what happens with your voice or with your likeness.

And I think that that is going to be a really difficult thing for most people to do, unless you kind of have a team of lawyers, which I'm sure that's probably already happening with people who do have teams of lawyers. But I think the more interesting way to do it is to kind of open it up and let people play with it and have fun with it and experiment. But then if people want to have kind of an officially approved version of something, then that would go through myself and my collective, which is represented through a DAO. And we can kind of vote together on the stewardship of the voice and of the likeness. And I think it really goes back to kind of really fundamental questions like who owns a voice? What does vocal sovereignty mean?

These are kind of huge questions because in a way a voice is inherently communal. I learned how to use my voice by mimicking the people around me through language, through centuries of evolution on that, or even vocal styles. A pop music vocal is often you're kind of emulating something that came before and then performing your individuality through that kind of communal voice. So I wanted to find a way to kind of reflect that communal ownership and that's why we decided to set up the DAO to kind of steward it as a community, essentially.

I saw on Twitter Professor Aaron Wright, he described DAOs as, "Subreddits with bank accounts and governance that can encourage coordination rather than shit posting and mobs." So how did you choose the different stewards that make up the DAO?

That's a really good question. And it's kind of an ongoing thing that's evolving. It's easy to say, "We figured out the DAO and it's all set up and ready to go." It's actually this thing that's kind of in process and we're working through the mechanics of that as we're going. It's also something that's kind of in real-time unfolding in terms of legal structures around that. I mean, Aaron, who you mentioned, he was part of the open law team that passed legislation in Wyoming recently to allow DAOs to be legally recognized entities, kind of like an LLC, because there's all kinds of, really boring to most people probably, complications around if a group of people ingest funds, who is kind of liable for tax for the XYZ? So there's all kinds of kind of regulatory frameworks that have to come together in order to make this kind of a viable thing.

And Aaron's done a lot of the really heavy lifting on making some of that stuff come about. In terms of our specific DAO, we're starting it out me and Matt. We started the project together and we've also invited in our management team from RVNG and also Chris and Yotam from Never Before Heard Sounds who created the voice model with us. And as well, we plan on having a kind of gallery that we're working on with Zuora. And so the idea is that people can make works with Holly+ and they can submit those works to the gallery. And the works that are approved or selective, then there's kind of a split between the artist and the gallery, the gallery being actually the DAO. And then any artist who presents in that way will also be invited into the DAO. So it's kind of ongoing. There will probably be other ways to onboard onto the DAO as we go, but we're wanting to keep it really simple as we start and not try to put the cart before the horse.

Now, of course, Holly+ is free to use right now for anyone who wants to visit the website. I was hoping you could explain to me how the average listener or a consumer of art can discern the difference between an official artwork that's been certified by the DAO versus something that was just uploaded to the website and taken and put into a tracker, a piece of art?

This is something we had to think about for a long time. It was like, "Do we want to ask people to ask for permission to use it in their tracks to release on Spotify or to upload?" And actually we came to the conclusion that we actually just wanted people to use it. It's not about trying to collect any kind of royalties in that way. I just want people to have fun with it and use it. So in terms of creating works and publishing them, it's completely free and open for anyone to use. We're kind of treating it almost like a VST, like a free VST at this point. So you can use it on anything and it's yours and what you make with it is yours. And you can publish that. And that is 100% yours.

We do have this gallery that we're launching on Zuora. That space is a little bit different in that you can propose a work to the DAO and then the DAO votes on which works we want to include in the gallery. And then those works, there would be a kind of profit split between the DAO and the artists. And basically the funds that are ingested from that, if those works do sell, are basically to go back to producing more tools for Holly+. It's not about trying to make any kind of financial gain, really. It's about trying to continue the development of this project.

Do you have any idea of what those future tools could look like right now?

Well, I don't want to give too much away, but there will be future iterations. So there might be some real-time situations. There might be some plugin situations. There's all kinds of things that we're working on. I mean, I think right now this first version, Chris and Yotam have been able to figure out how to transfer polyphonic audio into a model, which is... I'm a major machine learning nerd. So for me, I'm like, "Oh my God, I can't believe you all figured that out." That's been such a difficult thing for people to figure out. Usually people are doing monophonic, just simple one instrument, monophonic lines. But you can just put in a full track and it will translate it back. And what you get back, it's still does have that kind of machine learning, scratchy kind of neural net sound to it.

I think because it has that kind of quality it's easier for me to just open up and allow anyone to use that freely. I think as the tools evolve and speech and a more kind of maybe naturalistic likeness to my voice becomes possible, I think that that opens up a whole new set of questions around how that IP should be treated. And I certainly don't have all of the answers. It's definitely something that I'm kind of learning in public, doing and figuring out along the way. But I just see this kind of coming along the horizon and I wanted to try to find, I don't know, cool and interesting and somehow fair ways to try to work this out along the way.

The fairness aspect of this model is something that I think is arguably one of its most important facets. And I think that given your background, you're uniquely placed to transmit this to the general public and to help them understand its implications and help them get over any sort of initial trepidation or feelings of ickiness that are associated with machine learning/AI and the culture of deepfakes as well. So while you were developing Holly+, was this something that was on your mind?

I think since I released PROTO, I've been kind of in the public square trying to talk about the possibilities and the cool aspects and the lame aspects and the problems that arise from machine learning. So it's not necessarily new territory for me because it's something I've been kind of dealing with for a while. But it is of course always this kind of dance that's happening where I think the stories that can capture a lot of media attention and a lot of the public imagination are ones of replacing musicians altogether, or this kind of Skynet vision, whatever.

And I think that those kind of obfuscate what's actually really interesting here and some new interesting possibilities. So I've been saying that for a while. So instead of just saying there's all these interesting new things we could do, I'm like, I'm just going to try to do one of those things and kind of stumble my way through it and see, also, what questions arise from just kind of launching this project and what did I not think of. What kind of problems arise that I didn't even consider because I'm sure there will be many along the way.

And speaking of just headline grabbing things that tend to skew the public's perception, Holly+'s implementation of NFTs, I think, will generate some kind of discussion because NFTs in and of themselves are quite controversial with certain artists who see them as playground for the wealthy and environmentalist who are concerned about the ecological impacts. While that discussion of NFTs and their ramifications really reached a fever pitch last year, what were your reactions to that criticism and did they affect how you were approaching Holly+?

The whole rise of NFTs was really surreal to watch because a lot of the community that I'm involved with and a lot of my friends have been minting NFTs for years in a kind of experimental way. So it was really wild to see it kind of hit the mainstream and then what the reception of that was. I feel like what people think about as an NFT is a very small sliver of what an NFT can and potentially will be. There's a lot to unpack there. So the playground of the wealthy, I think the art world is very much the playground of the wealthy and so I'm not really sure about that criticism there around collecting art. The environmental issue I think is a little bit more complicated than it's been portrayed online. I've been very vocal about some of what I think are false representations of the figures specifically by some of the artists who have been posting about that.

I recorded a podcast episode on my own podcast called Interdependence. It's at interdependence.fm and we invited Dr. Koomey on to talk about what he thinks are the kind of ecological ramifications of blockchain. And he's not a blockchain Stan or anything. The reason why we invited him on is because at the advent of the internet, you can go back and you can find really similar kind of moral panics happening where you have Citibank releasing research documents saying the internet will boil the ocean if we don't stop the internet. And he kind of was the main voice that jumped into that conversation at the time to try to figure out what the actual numbers are, how does technology actually scale, how does networked technology specifically scale? So I wanted to kind of bring the temperature down, to use a really inappropriate pun there, on the conversation and try to actually get down to what do we know now and what of this is people really just being kind of outraged about a new technology?

I think a lot of it is, how do I want to put this? Basically COVID, it decimated most musicians' livelihoods for at least a year. I mean, it just kind of wiped everything out. And then all of a sudden you have this new seemingly very kind of wealthy system or wealth driven system kind of come on the scene and you see kind of celebrities able to kind of magic large sums of money out of nowhere. And that's super alienating. First of all, the technology is really complicated. So it's hard to get your head around that. And then you see people just able to kind of magic huge sums of money out of nowhere. And I think that there was just a huge disconnect between these kind of star examples and how most artists could be experimenting with it or how it could impact their kind of art practice.

So I think a lot of that was just a failure of messaging on the Ethereum community side, because I do think that there are a lot of really interesting potentials there. The album that I made before PROTO was called Platform. And one of the reasons why I call it Platform is because I was really critical of the kind of dynamics of platform capitalism that we were all creating work through. we're all essentially working for Facebook through Instagram and Twitter and not really reaping any of the benefits from that. And there is a really interesting counter conversation to that design with some of these kinds of decentralized systems. And a lot of that was really lost in the kind of glitz and glamor of these really high sale NFTs.

So I just want to take it back to the actual practical applications of Holly+ and where it might go from there. And I suppose the exciting possibilities of that. You've spoken before about John Chowning, the composer. And you said that Chowning, "Invented and composed with FM synthesis in the late '60s. Nobody understood him or knew what to do with it for years. It defined the '80s once others heard what he heard." Did you prepare Holly+ thinking that maybe it would have a similar timeline where it takes perhaps decades for the technology and what it's doing to be heard by the general public?

Well, those are really big boots to fill. John is a total hero to me and an incredible composer and inventor and human being who has remained insanely curious and prolific throughout his long career. I don't know what kind impact Holly+ might have, but I definitely see this on the horizon. I think that this is a very... I think we'll listen back to Holly+ and it'll be like, oh wow, that sounds like 2021 neural net and I think it will be very kind of timestamped in that way and maybe even a kind of quaint way. But I do think that these kind of models will be definitely a part of a kind of musician's workflow. And not even just a voice model. It could be any kind of instrument model or kind of performance model.

I think that machine learning will be integrated into the electronic musician studio in a very real way soon. And I think it's happening really fast. That's what's so exciting about it is a lot of this research happens in universities and also in kind of corporate settings, but also just kind of DIY research is happening and it's happening at a really fast speed. I mean, who knows maybe in a year there'll be a much more high fidelity version of Holly+, and then that'll open up entirely new set of questions around how I deal with that likeness when it could potentially sound really, really like me. So I mean, I think that it's coming and I think that it's not anything to be afraid of.

So obviously Holly+ is very accessible. I'm wondering if, in your opinion, you think that overall the sphere of machine learning and the tools created around it are generally accessible to the public or if more could be done to make it more accessible.

That's a good question. And it's somewhat complicated because in some ways even private research institutions, corporate research institutions, they're publishing technical papers, making public the research so that people can play with the code and making all of that publicly available. But whether or not anyone can do anything with that information is another question. There's certainly is a high barrier for entry if you're wanting to create your own models, or if you're wanting to kind of open up the so-called black box and kind of tinker.

I do think that there are some kind of off the shelf tools that people are able to play with that maybe is a good place to get started. That was one reason why we wanted to make Holly+ so simple is anyone can just drag audio in and it doesn't really require a huge explanation. You can kind of immediately see the results. But I think anytime you have a kind of emergent technology like this, it's always a question of who has access to the technical skills to be able to do something with even a publicly available kind of technical paper. So I would love to see more people get involved. Absolutely.

Do you have any sort of hopes or aspirations for a specific artist to use Holly+ in any way? Do you have any sort of musicians that you admire that you maybe think, it would be great if we could hear what they would do with it, or maybe this song would sound interesting with Holly+ integrated in some way?

That's a really good question. I mean, of course I've done a ton of my own transfers with classic Enya what would that sound like. For someone specifically? I mean, this is kind of insane, it would never happen, but I would love for Dolly Parton to use it. I would love to make a Dolly model and to be able to play with the Dolly model. She's from a town that's right next to where I grew up. So she kind of always loomed large in my childhood. So I don't know. That would be a wild kind of future country aesthetic that I would be down for.

All right. Thanks so much for joining us, Holly.

Thanks for having me. Check out Holly+.