Introduction
Words and music are two distinct symbolic modes. Yet, as human ways of communication or forms of expression, they have much in common. They have important similarities as signifying systems, their mutual penetration is suggested by metaphors like "the language of music" or "the music of language," and they are multi-modally united in all song genres. Popular song in particular is generally "text-intensive" (Booth 188). Many attempts to distinguish univocally between them turn out to be more difficult than may first be expected. Most cultural practices have no problems separating or combining them, but their defining differences tend to be explained in highly divergent and even contradictory ways. I will here exemplify some such fascinating paradoxes, problematizing what might seem to be self-evident, rather than offering any simple answers. By pointing at certain interesting complexities, at least some main dimensions of the relations between words and music will be discerned. (1)
How is the difference between music and words constructed in cultural practices and theories? The first section mentions some obscure aspects of the words/ music distinction in listening practice, with an emphasis on popular song. The general juxtaposition of the verbal and the musical symbolic modes, as two institutionalized fields of cultural practice and theorization, is particularly dramatized in the phenomenon of song, where their coexistence in one single performance act puts their distinction to a difficult test. Digitally processed techno and rap music has sometimes deliberately blurred the boundaries between speech and song, between words and sounds, or between lyrics and music. The second section presents a few such confusing musical examples that experiment with those borders, by transforming words and noises into music, and making traditional generic classifications useless. The third section then discusses some general problems in conventional definitions of the basic terms, while the final section sums up a possible way to regard the bifurcation of the verbal and musical modes.
Words with music
Song is a multimodal supergenre that mediates between words and music. (2) In most rock and pop songs, they are treated as neatly distinct. On record sleeves, the formula "Words & Music" points towards the creators of a song, indicating that they may be two different persons or at least distinct functions: one author of the lyrics, and one composer of the melody and its accompaniment. In rock, these functions are often amalgamated, when a single singer-songwriter, a pair of artists, or a whole band is presented as the undifferentiated origin of both words and tunes. But copyright laws and music industry practices still tend to stick to this dichotomy, and even singer-songwriters or tight rock groups sometimes adhere to it by writing "Words & Music: ..." rather than just "Songs made by: ...".
However, the "writing" of any song-hook, like, for instance, "Be bop-a-lula," often means inventing words and music simultaneously, in one single move. Such a text-line when spoken is as much a rudimentary melodic motive as are the appropriate five notes on a musical staff. It creates a rhythmic organization that makes only a certain range of musical realization possible. And in the actual performance of such a hook (or indeed any song), the singer again performs the words and the music as one unified whole, in one single act that is indivisible in time and physical space. Instruments play the music, but the singer performs lyrics and tunes absolutely simultaneously. So, the distinction between writing (or performing) music and writing (or performing) words is not always quite clear.
At the opposite end of the communicative process, listeners also often tend to differentiate between words and music, even though they reach the ear at the same moment. Being socialized into modern listening practices, most people can instantaneously disentangle the lyrics from the melodic and rhythmic lines, interpret the words, and compare that meaning with the musical sounds by which it is supported (or counteracted). One may disregard or actively listen for the words, feel how they combine with the music, but it is hard not to hear them as something other and more than pure musical sound. Discussions of popular songs generally make those distinctions: "Good tune, but lousy lyrics...."
But, again, things are not actually quite so clearly divided. Listening to songs and singing means entering a mode of perception where words and music continually interfere. To understand a verbal text one has to perceive how its units are articulated and grouped, which is affected by its rhythmic and melodic performance. And the sound, rhythmic, and melodic parameters of the music in turn depend on which words are sung. The material qualities, the form-relations, and the semantics of words and music often tend to merge. The verbal content and the musical organization are thus linked by an intense cross-traffic, rather than being the two completely separate systems they often appear to be in the end--and in much theoretical analysis.
The different ways to translate aural into visual forms further emphasize their bifurcation. Mostly, it is possible to transcribe and analyze the main musical structure of a pop tune approximately as some kind of a score, with the words in alphabetic writing under the melody. But even though such a notation system creates a neat division, it is never the same as the actual song. Every such visual and spatializing translation of an aural and temporal performance is a kind of analysis that separates what was united by using fundamentally different kinds of transcription systems for the words and the music. In modern Western societies, both the alphabet and conventional musical notation utilize reductively discrete means to summarize complex and continuous processes. They both translate time flows into vertically stacked (read downwards) horizontal lines (read from left to right), written or printed on pages (read in a routine order from front to back), but they also still obviously differ in the precise way this is done, and, when they are combined, it is generally easy to see which visual signs are to be read as verbal and which belong to the musical level.
Other forms of fixation--as, for example, the graphs produced by an oscillograph or the engraved tracks for the needle or the laser beam in a vinyl or a CD record--do not clearly separate words from music within the continuous sound flow of a song. Several elements of a song seem simultaneously to be words and music; in fact, this is true of most of them! Even simple, common words have duration and pitch whether sung or spoken, and within the frame of a song these can be interpreted as musical parameters. Conversely, the most nonsensical utterance can hypothetically be understood and transcribed as part of its lyrics. The separation between words and music is thus only the result of a complicated but quasi-automatic analytic process effected by the transcriber already in the first listening.
Song thus contains both words and music, but speech performance is also more than just a neutral deliverance of verbal semantics. In a rich key work on music and performance, Simon Frith (Performing 159) argues that three things are heard at once when listening to the lyrics of pop songs: words (as a "source of semantic meaning"), rhetoric ("words being used in a special, musical way"), and voices ("human tones" as "signs of persons and personality"). Performance traits like vocal gestures, timbre, and rhythmical inflections are always and inevitably present in spoken words, but they tend to be perceived as some kind of addition to the words as such, and to disappear when they are transcribed in writing. Just as song adds a melody to song lyrics, speech seems to superimpose certain essentially nonverbal traits onto the words spoken. Speech and song are two modes of vocal performance and, while singing written words transforms texts from writing to song, reading them aloud reconstructs them as speech. Visual markers and signs are in both cases translated into audible sound structures. Combinations of letters are read as combinations of phonemes, while intonation and phrasing help convey the formal organization represented by dots and commas in the written version. Reading written words aloud is, thus, just like singing, a performance mediating between the aural and the visual symbolic modes. "The voice records a text," and "a song text is a script for a public event" (Booth 187, 34), but this is as true when a text is performed in speech form, in lectures, radio shows, live poetry, or theatre drama.