Introduction


Ever since I started playing guitar some 10 years ago I dream of writing a song of my own someday that I can be proud of. Sometimes a nice melody or chord progression pops into my head, but somehow the lyrics that I write always feel a bit lame.

Then I joined a team for the AI Song Contest in 2021, and we wrote a complete song with the help of AI. We had a process of turning sci-fi stories into lyrics using GPT-3. This was very cool, but didn't really allow the user to guide a song in a desired direction. That is why I started this new pet project: Using GPT-3 to write lyrics that the user can steer in a direction, using textual description.

This blogpost explains the following parts of the process:

  • The plan: What am I trying to achieve?
  • Creating the dataset:  Using GPT-3 to refine and generate data.
  • Finetuning the model and generating lyrics.
  • Conclusion.

The plan


You can't really create a lyrics generator that writes "good" songs. Songwriting is a very personal thing. There probably is a style of writing that you prefer. A topic that you have in mind when you start writing a song, and the mood that should go with that topic. It would be boring to have a lyrics generator that just writes generic stuff. You want to be able to steer the results in a direction that speaks to you personally.

Our goal for this blogpost is going to be to: write a song about an old cat who came home after being outside for a few days in a storm.

I think Jack Johnson would be suited to write such a song, so I gave the largest model of GPT-3, called Davinci, a whirl. The black on white part is input, and the green part is output.

The result looks like lyrics for a song, but when you read it you realize it is not very interesting. The topic of the song is almost spelled out exactly. There is not real wordplay.

How can we make GPT-3 a better songwriter? You could add an example in the prompt. But because there is a limit of 4096 characters, we would be limited to just one or a few examples before text generation. To tune the model to more lyrics the finetuning feature can be used.

To finetune a model you have to create a dataset of hundreds of examples in the following format. Where prompt contains the user input, and completion contains what the generated text should look like in response to the given prompt.

{
  "prompt": "...",
  "completion": "..."
}

Because I want the user to have fine-grained control over the lyrics generation I am going to add some information about the song in the prompt and the lyrics in the completion.

The prompt is going to have the following properties:

  • Artist: What is the artist called that wrote the lyrics.
  • Bio: A short story describing the artist. For example what genre their music is, their personality, etc.
  • Description: A short description of the meaning of the lyrics. What the song is about.
  • Mood: The mood of the lyrics. Are they happy, sad, confident, etc.

The completion is going to have only two properties:

  • Title: The title of the song.
  • Lyrics: The lyrics themselves.

This should allow the user to pick an artist they like and generate lyrics in the style of this artist. These fields could also be filled with non-existing artists, to create completely new ideas.

According to the OpenAI documentation around 500 examples are needed for finetuning a model. Let's create a dataset.

Creating the dataset


Writing artist bio's, descriptions and mood for hundreds of songs would be quite a lot of work. Since I am rather lazy than tired (dutch saying) I decided to automate this part of the process.

I created a python script where I can enter the easy to copy-paste information: name of the artist, title of the song, and the lyrics for the song. The script then calls the OpenAI API to add the artist bio, describe the song and fill in the mood and add the combined information to the dataset. This allows me to add new songs to the dataset in a matter of seconds.

Mood

To determine the mood of a song, I enter the lyrics of the song as a prompt, followed by:

what is the mood of the above lyrics?

The mood of the lyrics is

The stopping character is a dot, so the answer is always one sentence.

Bio

To write an interesting bio for artists I use the following prompt:

Write a short bio for {artist}. What is the spirit of their music, their genre, and how would you describe their personality?

Description

To describe the lyrics I use the following prompt:

How would you describe the meaning of the above lyrics?

Here I added a small presence penalty. This gives a penalty to the model for generating sequences that already exist in the output, encouraging the model to describe the song using new words.

Examples


It turns out that GPT-3 is pretty good at augmenting my data. I expected descriptions for the lyrics to be a really difficult thing to do, since the meaning of lyrics is often abstract and hidden behind metaphors. But more often than not the descriptions of the lyrics were on point, and captured the overall meaning of the song correctly. Also it turned out that the model knows a lot about music. By just giving artists' names GPT-3 can write matching bio's.


Artist: Red Hot Chili Peppers
Song: Otherside

[ generated ]

Bio: Formed in 1983, the Red Hot Chili Peppers are a funk rock band from Los Angeles, California. The band has achieved commercial success with their albums "Mother's Milk" (1989), "Blood Sugar Sex Magik" (1991), "One Hot Minute" (1995), "Californication" (1999), "By the Way" (2002), "Stadium Arcadium" (2006), and "I'm with You" (2011). The band's music is marked by its funk-inspired style, as well as by Anthony Kiedis' melodic vocals, John Frusciante's guitar solos, and Flea's basslines. The band is known for their energetic live performances and for drawing inspiration from a variety of musical styles.
Description: The lyrics describe the feeling of hopelessness and despair that comes with addiction. The speaker is asking how long they will continue to slide down the spiral of addiction, and whether there is any way to separate themselves from it. They express their disbelief that things could be any different or any better, and resign themselves to the fact that they will always be struggling with this problem.
Mood: The mood of the lyrics is dark and depressing.


Artist: Jack Johnson
Song: Banana Pancakes

[ generated ]

Bio: Jack Johnson is a professional surfer turned musician. He is best known for his mellow acoustic music, which has been dubbed "surf-rock." Johnson is a laid-back guy who likes to keep things simple. He is known for his eco-friendly lifestyle and his support of various charities.
Description: The lyrics of this song are about a couple who wants to spend more time together. They are trying to find ways to avoid having to go outside and face the world. They want to stay in bed and sleep in, make banana pancakes, and pretend that it is the weekend.
Mood: The mood of the lyrics is relaxed and carefree.


Artist: Eddie Vedder
Song: Guaranteed

[ generated ]

Bio: Born December 23, 1964, in Evanston, Illinois, Eddie Vedder is a singer, songwriter, and musician who is best known as the lead vocalist and one of three guitarists for the alternative rock band Pearl Jam. He is also a member of the band Temple of the Dog, and has released three solo albums.Vedder's music is often characterized as introspective and emotional, and his lyrics often deal with topics such as love, loss, and social and political injustice. He is known for his powerful vocal range and his passionate delivery. He is also known for his intense stage presence and for his intense dedication to his fans.
Description: The lyrics of this song are about a person who feels out of place in the world. They feel like they don't fit in with the society that they live in and that they don't really have a place to call home. The lyrics also express the idea of being free. The person singing the song wants to be free to explore the world and find their place in it. The lyrics also contain metaphors about life and the world. For example, the lyrics say that "everyone I come across in cages they bought." This is a metaphor for the way that people are restricted by the rules and limitations of society.
Mood: The mood of the lyrics is hopeful and defiant.

Finetuning the model and generating lyrics


To finetune the model we need to provide the training data as a json list file. This is just a list of separate json objects stored in one file. Finetuning Davinci, the largest GPT-3 model, on 500 songs for 2 epochs takes just a few minutes and costs a bit over 20 dollars.

After finetuning GPT-3 to the prepared dataset, let's see how much better it is at writing lyrics than the standard model. I use the same prompt as before, but now in the format the finetuned model was trained on.

After a few tries this is what the model came up with. The song is less direct and to the point as before. There is a lot more wordplay going on, but the lyrics still match the description. I can actually imagine Jack Johnson singing this song.

The next time I left a bit more to the imagination of the model. I wanted the song to be more metaphorical and less straightforward. I added "the cat is a metaphor for..." to the prompt, and left the rest to the model. It came up with the song "Tangled up in blue". Somehow it wrote down the mood twice.

Then I wondered how these lyrics would adjust to a different artist. I picked the Queens of the stone age, since they have a completely different style from Jack Johnson. The same prompt produced this song about our cat, including some swearing to make the song feel more like it could be sung by Josh Homme.

Conclusion


Although it still takes a lot of tweaking to get the results you want, this finetuned version of GPT-3 is very useful for providing inspiration for lyrics. It often comes up with ideas that I would never have thought of on my own, and it's a great way to get the creative juices flowing. And the process of generation new lyrics and trying different styles is a lot of fun.

I was also surprised at how useful GPT-3 is out of the box for preparing data. You can ask it to extract useful features from text, and use these features to finetune a new model.

I can definitely see tools like this being used by writers or musicians. Maybe not to write entire songs, but sometimes you just need a spark of inspiration to get started.