Lyrics! Trends, Analysis, and Graphics.

(Note: always click on the plots to see them properly)

People are good at understanding the meaning of complex (or simple) texts. They are also good at programming computers to do something else: to count and calculate. That’s what I did with artist lyrics.

I looked at 53 artists (mostly some of my favourites, some requests), 202 albums, 2428 songs, totalling 10386 minutes of music, and extracted their lyrics to try to develop ways of visualizing the data, and to find broad word usage patterns that would otherwise be hard to detect. It’s not a systematic study, it’s an exploration, and it’s based on examples and artists I like. Enjoy.

 

Word Density:

So let’s look at the artists. And let’s start by looking at which artists are more verbose. So we can count the number of words, and divide by the minutes of music. (click on the plot to view it properly please).

wordsDensity

First: there’s a huge variety in word density. About 10 fold difference between lil Wayne or Miley Cyrus (more than 100 words per minute) and Bonobo or Opeth (only about to 10 words per minute). The general trend makes sense: hip hop, and pop on the verbose side, and electronica and prog-rock/metal on the other side.

But, what words are they using? This is the main focus of all that follows.

Word frequency:

What’s interesting now is to start comparing content. In order to do that, the main concept here is going to be the frequency of a word in relation to the total amount of all other words used. A word-X frequency of 0.05 would mean that word represents 5% of the lyrics analyzed. Now for example here is the frequency of the word “sun”.

sun

The swedish duo The Knife, is in front of using “sun”, and The Beatles are also frequent mentioners of the glorious star (and it’s not just in their song “Here comes the sun” – although that is the most sun-dense song).

But one word at a time is not going to get us very far isn’t it?

Word Groups:

Perhaps more interesting than individual words, are word groups. The groups allow for a better understanding of general themes and include more information for each artist. Here’s an interesting question: who mentions more body parts?

body parts

Here the total height of the bar is the total frequency of all the group. And the different colors in each bar are the proportions of each body part.  Mastodon, Florence and the Machine, Moby, Bonobo, Opeth, and The Knife are the 6 artists who most frequently mention body parts (words include both the singular and plural forms). But they mention them with different balances. Moby for example repeats “body” very frequently. Florence on the other hand (…) mentions a wide diversity of body parts! Some artists mention specific body parts much more than any other: Jose Gonzalez for example uses “hands” very frequently (interestingly mostly when covering The Knife’s “Heartbeats”).
Jeff Buckley connoisseurs  will not be surprised to see “lip/s” be more strongly represented in his lyrics than any other band – except for Nirvana who repeatedly scream “kiss, kiss molly’s lips” in a Vaselines cover.

Another interesting group of words is to look at is colors!

Colors

 

Jeff Buckley shows up as the most colorful artist. Mostly because of how much he says “black, black, beauty” in “Mojo Pin”. Amy Winehouse is, perhaps unsurprisingly, mostly black and blue (as is Moby). Take a look at The Beatles, and you’ll find their yellow submarine; Martina Topley Bird’s “Baby blue” is also present.  Portishead, in 3 albums, only mention the color white, and Cat Power almost only mentions black (with a little spot of red).

Any George Carlin fans? In the 1970s he had a famous bit about seven dirty words you can’t say on TV. I decided to look them up in our artists. I had to remove some of them because they appeared nowhere. But you get the feeling.
Carlin

Is it any surprise that rap (lil Wayne, Das Racist) and heavy (Rage Against the Machine, Pantera) appear at the top? But do not generalize! Opeth are a metal band and they don’t swear.

I’ll end with Water Themes:

Water Themes

Puscifer and Morphine take the lead here. Two of my favourites.

I could go on with plots like these (and in fact I have many in reserve and will do some for you if you ask), but for now, something different.

 

Word Relations

What might be interesting to look at as well, are word relations. Are there words that tend to be mentioned together or to exclude each other?  Given the variability of of vocabulary and the fluidity of language, it’s going to be very hard to find nice straight linear relations as you get in basic physical laws, but we still get significant associations! Here are some examples I’ve chosen to show.

Apparently the word “heart” is inversely correlated with the word “head”.

headVSheart_spearman

It seems like the more you mention your head, the least likely you are to mention your heart (and vice versa). Pain of Salvation and Das Racist mention neither very much.

There are several other relations between words. But there is a particular trio that I like a lot. First: skin and flesh.

fleshVSskin_spearman

Not many artists mention flesh (that’s why you have a pile on the left side of the graph). But the general trend is clear: the more skin, the more flesh. And interestingly, it’s the metal bands – Opeth, Pain of Salvation, Mastodon, Nine Inch Nails, Pantera – who take the lead. But also Jeff Buckley – not surprisingly for anyone knowing his corporeal themes.

What’s interesting also is that flesh, for those who mention it, seems to incite cries:

fleshVScries_spearman

It’s Pain of Salvation, and Jeff Buckley there at the top right with a lot of dramatic language. The flesh is a powerful thing.

 

 “Me” & “You”

One thing that combines the last two analyses, is to look at the relationships not just between words, but actually between word groups. I picked out my favourite example: the relation between the group {me, mine, myself, my, i} and the group {you, your, yours, yourself}. This way we’ll find not only the artists who are most self-centered, or other-centered, but also we’ll learn about the existing relation between the two tendencies. Here’s the result:

me_mine_myself_my_iVSyou_your_yours_yourself_spearman

So there is a general trend whereby talking more about oneself, takes away from talking about another: the more me, the less you. Jose Gonzalez for example almost doesn’t mention himself, but is quite high on the “you” scale. An opposite example might be Moby whose lyrics apparently include a lot of “me” and very little of “you”.

Looking at one axis at a time is also instructive. For example, we can see that Fiona Apple, Nirvana and Amy Winehouse seem to take the lead in mentioning themselves. But when it comes to mentioning another, the champions are Martina Topley Bird, Dillinger Escape Plan, the Deftones and Cat Power. The two woman are in the good company of some heavy bands.

Thinking about the bottom left  and the top right corners is also interesting. On the the top right one-on-one personal relations abound, and on the bottom left, are artists who use very little personal vocabulary. No surprise to find Rage Against the Machine there – it’s about the people as a collective comrade!

 

 Comparing artists groups

Instead of doing something obvious (like comparing pop vs rock artist lyrics) I decided to ask a simple and more personal question: are there words that occur more frequently in the artists I most like? To answer it, I simply divided the 53 artists into categories (“Favourite” and “Less Favourite”), and then calculated statistically significant differences in word frequencies between the two groups (yes, I did a t-test). Here are 2 words that artists I like tend to use more:

boxplot3truthboxplot3embrace

And  two words that my favourite artists use less.

boxplot3wife boxplot3tv

 

It appears my artists (and me?) are not much into married life and sitting around watching tv, but instead care about truth and embraces. I agree – and it reminds me of this!

 Artist focus

Finally I wanted to know what makes each artist special and different from the others. We could do the traditional word clouds. Here are two examples.

Tooltool

and Radiohead

radiohead

This is not exactly what I want. They give you a feeling for which words are more common in each of the artist’s albums. What they do not do, however, is comparison among the artists. To understand what makes an artist different we need to compare the frequency of word-X in that artist with the same word-X in all the other artists. That’s what I did. (For the analytical minds out there: I computed the z-score distribution for each word frequency and extracted anything greater than 5). So here are words that Tool and Radiohead use more frequently than the others:

Zscore_toolZscore_radiohead

 

Let’s also see what words make Pink Floyd and Cat Power special:

Zscore_pinkFloydZscore_catPower

 

General patterns are hard to spot, but it’s interesting to see all the words with mathematical connotations in Tool: calculated, forty, third, divide, spiral, union. There’s also a prevalence of vulnerable or darker language: insecure, satan, withering, drags, worthless, widow, crawled.

Cat Power seems geared towards affective language that others dare not mention: marry, romance, kissing. But also travels and sights: manhattan, rome, daydream, mexico, wilderness, waterfall.

Other than broader patterns you can clearly identify specific songs or themes: for example there’s Pink Floyd’s crazy diamond, their suicidal “Waiting for the Worms”, and the Wall (as well as bricks) is clearly uniquely represented. I’ll leave Radiohead up to you.


Thank you to Luisa for some suggestions. If you want to see specific plots let me know, I can easily do them. If you want more artists included, let’s find the lyrics and they’ll be here in version 2.

Peace.

 


 

 

A request from my friend Ana (quoted here): who says more “la” (as in “la la la” or other similar expressions – lah, laaa, laaah, etc – all condensed into one)? Admittedly this really depends on the transcribing of the lyrics. But we’ll still learn from the plot:

la

Oh brit-pop …. 😉

 

 

 

 

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s