You might say that the voice is the most human of all the elements in music. It comes from deep within our bodies, organically, ever since we came to be human, and indeed long before. It is probably the first musical instrument that we learned to recognise, and, since it is built into most of us, the first that we learn to use. Yet the voice is much more than just a sound, however primordial. It represents a person: their activity, agency, opinions, experience, their self-projection. In doing so, it differentiates itself from the environment around it, musical or cultural, either drawing power from this context or somehow standing against it. Or, most intriguingly, doing both at the same time.
In most 20th century popular music, the role of the voice in the musical landscape was fairly clear. There were the vocals and there was the backing, divided into guitars, percussion, bass, riffs, solos, breaks, and even supporting vocals. The voice would stand out clearly and tell a story and convey a message. But then things got more complicated. The voice started to disappear into the guitars, started to drift away from clearly making sense, and started to merge with new technologies—the gramophone, the microphone, distortion, the spinning Leslie speaker, the talk box, the sampler and auto-tune. What’s more, human-like voices started coming from non-human sources, from the vocoder and speech synthesis and, more recently, vocaloids (which I’ll look at later). Today, a major characteristic of 21st century pop music, especially underground, is the erosion of that former distinction between the human voice and the musical landscape in which it stands, especially digitally, and of which it now forms a part. No longer is the voice merely a figure in the landscape, but it fuses with the landscape itself.
Not only is this sonically intriguing (appealing or disturbing, take your pick), but it has consequences for the way we consider the voice as a representation of a person in their place in the emerging digital world. There’s a tendency to regard the digital world as non-human, as encroaching on humanity, but I think that’s quite wrong. The internet is not just a landscape made up of technology, it is one made up of humanity. Like it or not, Facebook and Twitter are made of people, and of people’s voices in particular. People’s voices and the lives behind them reduced to assertions, opinions, arguments, anxieties, reduced to pieces of data, to samples—but sounding out nonetheless. And this landscape is reflected and explored in the music it produces, where the lines between the human and its technological environment are no longer clearly drawn.
There are still plenty of artists in whose music the voice is clear and relatively unmanipulated, but the tumultuousness of the surrounding accompaniment nevertheless suggests the voice hemmed in by busy, exotic, unfamiliar environments. More clearly perhaps than on any of her previous albums, Maria Minerva’s Histrionic puts her voice and her persona at the centre of an idealised, impressionistic club experience teeming with delirious colour and sensation, a position somewhere between confusion and empowerment from which she addresses those around her. Autre Ne Veut rails grandiosely against his fears in last year’s Anxiety, his voice squeezed and scrunched tightly inside its twisted pop-synth cradle. FKA Twigs’ largely conventional and polished voice is suspended in a mysterious chamber crawling with synthesized entities on tracks like “Two Weeks,” as if she were the Borg Queen.
But plenty of other artists and genres have seen the membrane between the voice and its technological environment begin to rupture. A continuum has been established which runs from the traditional scenario of vocals plus instrumental technologies, through sampling, chopping, screwing, the voice as an instrument, sample-based synthesis, and synthesised speech. Perhaps the most famous artist to have taken the voice to new places is Burial, now a huge influence on underground music, who uses speaking voices to populate his imaginary, dilapidated environments, as well as chopping and stitching singing samples together to create entirely new singing voices. Footwork has taken the sampling of the voice to new extremes, taking such small and repetitious slivers of it that the sense of a human performer collapses almost entirely in a cascade of vowels and consonants, taking on the same character as the accompanying drum machine. Following DJ Screw, Clams Casino has used voices, often slowed-down or deepened (‘screwed’ or ‘dragged’), as a building material in his hip-hop instrumentals, often entirely removing their capacity to communicate lexical meaning in the process. James Ferraro has explored a quasi-naive use of auto-tune in his recent, beleaguered songs, his prone, all-too-human voice trickling over angular sculptures of digital crystal. Oneohtrix Point Never uses choir samples mapped to his keyboards regularly, on one occasion creating an a capella work for which, strangely, no new voices were recorded. Holly Herndon has used every facet of the voice in her electronic constructions—her song “Chorus” is like a voice that has exploded and is being stitched together into a new form.
Yet go deeper into the digital world and the voice finds endless new forms, further integrated into its landscape. I had been listening to the music of Oakland-based Nima for quite some time before I realised that her tracks, though they regularly draw on recorded vocals, are not songs per se. On Spirit Sign and her upcoming tape for Harsh Riddims, SEE FEEL REEL, the voice has partly receded into the musical environment, becoming part of the furniture of her airy and mysterious rooms. Even with words attached, her voice does not compete with percussion and keyboard riffs. The final tracks of both albums (called “Landscapes” on Spirit Sign) feature synthesised speech, as if completing a transition from human to machine, yet it’s set against some of Nima’s most elegant instrumental textures. At the end of SEE FEEL REEL, a voice almost obsessively repeats crypto-romantic refrains such as I’m… in love… with… the… digital age… over strings, before switching to breathy voice-like tones as if it were dissolving into air. Nima’s music certainly lives in the digital age, and all the elements within it expand to fill the enormous space that results.
Another producer who has been using synthesised speech is Chaz Allen as Metallic Ghosts, also well known as one of the folks behind live-streaming platform SPF420. Echoing Ferraro’s celebrated album Far Side Virtual, where a synthesised voice appears as a touchscreen waiter and a virtual chef, Metallic Ghosts employs them on his albums The Pleasure Centre and Sky Tower 2032 as narrators and characters in a drama. Yet my favourite Metallic Ghosts release is the multidimensionally weird The City of Ableton, which is ostensibly just an injoke about the Ableton software used by so many producers, depicting it—squarely according to neoliberal capitalist rhetoric—as a city of endless possibilities. To do this, Allen adopts a musical style that idiosyncratically evokes the urban-planner-simulation game series Sim City, a screenshot from which provides the album’s cover. Both musically and conceptually, the album suggests a dream cityscape as it might have appeared at the turn of the 1990s, a weird, multicoloured, postmodern union of the past and the future, where citizens of all professions glide beatifically down immaculate beige sidewalks past bright red fire stations, neo-1930s banking skyscrapers, parping bandstands, faux-eighteenth-century colleges, and green, green, lawns, all presided over by a moustachioed mayor who warmly greets his public at the exponential tree-planting and ribbon-cutting ceremonies. One of the crucial components of this digital landscape—now looking rather misplaced, both poignant and arrogant, in the post-recession era—is the voice. Fittingly, the vocals, and the human beings hinted at behind it, are just yet further objects swirling in the cityscape, forming its melodies, scatting ooh and aah or urging Work it! and Get down like feckless, automated cheerleaders. It’s a landscape we might recognize and enjoy with a little disquiet mixed in.
The technique Allen uses to deploy voices in the track “Mass Transit” is called sample-based synthesis, and involves inputting samples into a synthesiser so that every key plays a sample whose pitch corresponds to that key, allowing you to play a sampled vocal as if it were a piano. It was also used by Allen-associate Saint Pepsi on his recent EP Gin City (which is akin to a mini tour of different uses of the voice in the online underground) on the track “Mr Wonderful.” When the track’s lead tune moves onto a more conventional synthesiser, the effect is not of a singer falling silent, but of a singer showing a different side to themselves.
Sampling processes of all kinds appear in Blank Banshee’s album Blank Banshee 1, a masterwork of the new digital psychedelia, and it’s quite easy not to really notice that the album is filled with voices at every turn, much like its videos are filled with virtual beings, objects and environments. Despite the fact that these voices are pitched up, down and all around, they’re never more than virtual avatars of their owners: social media masks that are both freeing and constraining. Freeing in that they allow the voice to move to new places and be new people, constraining in that these surrogate people are not yet as infinitely flexible and free as they would like to think they are, and might still have an air of the uncanny about them. This double-edged nature of the human user in the modern digital playground, its mixture of strange new opportunities and dangers, might be why one of the tracks goes by the name “Anxiety Online.”
But not all digital voices have flesh and blood humans directly behind them. With synthesised speech, the voice is the digital landscape. Wholly synthesised speech has been around for decades, appearing on Kraftwerk records of the 1970s in roles such as the voice of energy… a giant electricity generator. But in the past decade, an entire subculture (mostly confined to Japan but with strong showings in Europe and North America) has grown up around a software series that synthesises song: vocaloids. Developed by Yamaha and produced mainly by the brilliantly named Japanese company Crypton Future Media, vocaloids offer the user the chance to create a voice that can sing both melodies and lyrics, based on ‘sound banks’ recorded by human singers (so really, they’re a kind of sample-based synthesis). They appear in a wide range of different voices, languages and genders, and one of the most interesting things about them is that they are personified with given names, images (which typically appear on album covers), even ages and weights. As such, they are treated much like the J-pop idols they emulate. Disconcertingly, there are far more female vocaloids than male ones, and they are invariably more popular—probably because women are more likely to be treated as objects, even technological ones (for example, in films from Metropolis through to Spike Jonze’s recent Her, about a man who falls in love with a speaking operating system).
The most popular vocaloid is Hatsune Miku [pictured above], whose name means something like ‘first sound of the future.’ First released in 2007, her success greatly expanded the profile of vocaloids, eventually causing vocaloid compilation albums featuring her, such as Exit Tunes Presents: Vocalogenesis, to top the Japanese charts. And long before Tupac appeared as a hologram, Miku (who wasn’t even biologically born let alone killed) used the same technology to appear in front of a live band and an audience of fans in 2010. She has since performed a duet with another popular vocaloid, Megurine Luka. Miku has even opened for Lady Gaga, and was immortalised on two metal plates attached to the Akatsuki space probe bound for Venus (where else?). Vocaloids don’t quite sound realistic, but that isn’t entirely the point. For me, part of their appeal is in their human yet beyond-human qualities. One of my favourite vocaloids is Sonika, primarily due to the bizarre, clucking chorus of her song “Sonika Says.”
Vocaloid producers around the world have observed a convention of putting a ‘P’ after their moniker, which stands for ‘producer.’ One of the most popular vocaloid producers on Bandcamp is Circus-P, who uses them for the vocal lines of rave pop. Others include the hypersentimental Empath-P and the trancey Daria-P. ‘Vocaloid’ is a widespread tag on Bandcamp and Soundcloud, and there are underground record labels devoted to vocaloid producers, such as Vocallective. While the dedicated subculture tends to use them for pop and hardcore dance, vocaloids have appeared in dozens of different genres, including disco / funk, metal, jazzy hip-hop beats, chilled-out house, indie rock, traditional, trappy witch house, mashup humor, pop punk, and even opera.
In many of these cases, the vocaloid basically operates as a replacement for a vocalist. Perhaps more interesting are the cases where the vocaloid forms part of a more experimental project overall, where their unique sonic qualities can offer something more unusual. You might have expected vocaloids to crop up more often in global experimental electronic musics as sonic tools themselves rather than simulations and surrogates (in just the way drum machines came to be used in the 1980s), but currently, they’re mostly used that way in Japan. Mariko’s Room is a prolific and varied rock project in which vocaloids appear as unearthly voices draped in effects (try this and this) or as an ironic counterpoint to ultra-lo-fi on the album 堀田 (‘Hotta’). Vocaloids blend seamlessly with fractious techno on Cholesterol Records, often ground up and mixed in with the stuttering beats, and they also make sense in the sweetly glitchy electronica of 全自動ムー大陸 (Fully Automatic Mu).
Voltex makes gorgeous footwork-like tracks from joyfully leaping vocaloids and stereophonically pingponging synths. Then there’s tac_ for whom vocaloids are a perfect element gently woven into dainty, elfin compositions. Another sound tac_ frequently uses is that of the mellotron, which in many ways is the forerunner of the vocaloid, being an analogue sample-based synthesiser where keys were attached to tape loops featuring recorded instruments such as strings and flutes (you might know it from the opening to “Strawberry Fields Forever”). Both dolls designed to emulate more organic musics, the vocaloid and mellotron complement one another not as fake, insufficient, robotic entities, but as toys that have run away to a miniature fantasy kingdom where they can now be loved only by history and nature.
It won’t be long, then, before the vocaloids earn the same retro appeal as the vocoder, and they may even be praised alongside all older technologies as ‘warm’ and yes, ‘human.’ Even when it comes to the voice, the technologies we use to represent ourselves are all relative—all tools, all within the bounds of human agency, even when that voice is almost entirely constructed. The voice was no less a tool, a technology, when it let out its first cry. It may seem unnatural or fearful that the human voice melts into the digital landscape—where we all become samples—and even more so when the digital landscape seems to greet you itself in a nearly human form. But we are the vocaloids. And like it or not, everything in the musical-digital landscape has a human voice. A more fluid interrelation of the subjects and objects within it better reflects the richness, self-perception and experience of the modern human.