A Tutorial Series for Software Developers, Data Scientists, and Data Center Managers
This is the 19th article in the AI Developer Journey Tutorial Series. The previous articles discussed the AI fundamentals and deep learning for images. Now that a set of emotion-eliciting images has been selected, it is time to do the analogous step with the music generation side of the project. The search for a music dataset is the topic of this article.
The goal of this project is to create an app that takes in a set of images, determines the emotional valence of these images, and generates a piece of music that fits this emotion. Given the selected approach to emotion-modulated music generation, see the (Project Planning) article—two set of music data need to be found:
- A training data set for the long short-term memory (LSTM) neural network used for melody completion and harmonization.
- A set of base melodies, which will be modulated using the emotion-based modulation algorithm (to be further explored in a future article).
Training Dataset for LSTM Neural Network
BachBot* was the model used for melody completion and harmonization1.
To define a set of criteria for the dataset, this article illustrates some of the potential challenges faced by algorithm-generated music.
A connectionist (neural network) paradigm for generating music uses regularity learning and generalization. However, music as an art form is diverse and anything but regular. Conventions in one genre of music may break the rules of another genre. For example, extended and altered chords are commonplace in jazz, but almost never found in Baroque and Renaissance music. Furthermore, many genres of music have very few or ill-defined regularities. These factors may cause trouble for a model that is trying to learn regularities. Therefore, the selected music should all be from one style that is somewhat regular.
Another important aspect to consider is the complexity of music (number of instruments, harmonic vocabulary, and so on). Clearly, completing only the melody would not be sufficient (or very interesting!) for the purposes of this project. However, generating overly complex music from a single melody line would not yield good results. Thus, a balance of complexity and feasibility is required.
Furthermore, to generate an effective and robust LSTM neural network, a large amount of training data is required. With a large dataset, it is also important that it is in a format that is easily adaptable to code representations—converting from image or pdf files to a usable format is a large project in and of itself!
Lastly, the dataset should be out of copyright (in the public domain) and/or licensed for non-commercial use. It is important to note here that although the music may be in the public domain, the particular arrangement, sheet music or encodings of that music may still be under copyright protection5.
Therefore, the criteria is as follows. The training dataset:
- Must contain samples of music that have shared regularities.
- Must be sufficiently large.
- Should be in a format that is easily adapted to code representations.
- Should be out of copyright (in public domain), and/or licensed for non-commercial use.
- The selected music must have an apt balance of complexity and feasibility.
The dataset used by Bachbot is a collection of chorales written by Johann Sebastian Bach and found in the music21* toolkit. Given this set of criteria, it can be seen why the Bachbot dataset was ideal:
Most music in the Baroque period followed specific guidelines and practices (rules of counterpoint)6. Furthermore, chorales share the same structure and arrangement (four voices: soprano (melody), alto, tenor, and bass, grouped in a series of phrases). These standard practices result in a dataset in which samples share many regularities. Additionally, the output of the music-generation algorithm can be tested against these rules to qualitatively evaluate success.
The Bach chorales have a good level of complexity without being infeasible in a music-generation model. Chorales used diatonic (based on a single scale) harmony, and the availability of four voices gives just enough room to make music that is sufficiently interesting.
Bach wrote over 100 chorales in his lifetime. Note that, if this number doesn’t seem very large at first, remember that chorales are just one type of composition. In the music domain, such a number of compositions by one composer of one type of composition is rarely found.
A collection of Bach chorales in MusicXML* format was compiled by Margaret Greentree, and is available as a part of the music21 corpus3. Music21 is a Python* based toolkit for computer-aided musicology that is freely available on the web 2. The MusicXML format of the Bach chorales is already a code representation of musical notation! An example of the MusicXML format is shown in Figure 1.
Furthermore, music21 provides a set of tools that allow for easy manipulation of these files. Another advantage of the music21 corpus is that the included music is either out of copyright in the United States and/or are licensed for non-commercial use4.
<?xml version="1.0" encoding="UTF-8" standalone="no"?><!DOCTYPE score-partwise PUBLIC"-//Recordare//DTD MusicXML 3.0 Partwise//EN""http://www.musicxml.org/dtds/partwise.dtd"><score-partwise version="3.0"><part-list><score-part id="P1"><part-name>Music</part-name></score-part></part-list><part id="P1"><measure number="1"><attributes><divisions>1</divisions><key><fifths>0</fifths></key><time><beats>4</beats><beat-type>4</beat-type></time><clef><sign>G</sign><line>2</line></clef></attributes><note><pitch><step>C</step><octave>4</octave></pitch><duration>4</duration><type>whole</type></note></measure></part></score-partwise>
Figure 1: An example MusicXML* file that represents ‘middle C’ on the treble clef 7.
As you can see, the corpus of Bach chorales found in the music21 toolkit soundly satisfies all the requirements for a training dataset for the LSTM neural network. Hence, the project is able to proceed using Bachbot as the music completion algorithm.
Base Melodies
Five basic melodies were selected after considering several parameters. First of all, as with the training dataset, each melody must be out of copyright (in the public domain) and/or licensed for non-commercial use.
Entertaining and interactive qualities of our project imposes another condition—input melodies have to be popular and recognizable. Amongst the music in the public domain, American, English, and French folk or children’s songs comply with our condition most of all. We have also chosen two author’s melodies because of their high popularity.
Later processing (rearranging algorithm and BachBot) requires melodies to be in a proper source format such as a musical instrument digital interface (MIDI) file or music score. Being rearranged in a particular mood, a melody as a MIDI file will be converted into the MusicXML format for the BachBot.
Our algorithm of rearranging a melody according to the mood is an experimental project, so to decrease the complexity of processing and to obtain a more or less predictable artistic result we have chosen simple, monophonic (one line, unharmonized) melodies. Additionally, as Bachbot was trained on diatonic and tonal-based music, the selected melodies should also follow this musical subsetting for the best result.
It should be noted that a rearranged melody could be already out of public domain. The list of melodies and websites we found our MIDI files on is included in the article 8, 9, 10.
List of base melodies:
Aura Lee (George R. Poulton/W. W. Fosdick, 1861)
Happy Birthday to You (Patty and Mildred J. Hill, 1893)
Brother John (unknown/traditional, 1780)
Old McDonald (unknown, 1917)
Twinkle, Twinkle, Little Star (traditional, 1761)
Conclusion
All in all, the search for the music data itself was not a very time-consuming task. However, a lot of careful consideration was required to come up with criteria for a dataset to optimize performance for the purposes of each project. Regardless of the project, datasets should be unrestricted by copyright.
Now that the dataset has been found, the project can proceed to collecting, storing, and processing this data.
References and Links
1. Liang, F. (2016). BachBot: Automatic composition in the style of Bach chorales Developing, analyzing, and evaluating a deep LSTM model for musical style (Unpublished master's thesis, 2016). University of Cambridge.
2. Cuthbert, M., & Ariza, C. (2008). Music21 Documentation. Retrieved May 24, 2017, from http://web.mit.edu/music21/doc/index.html
3. Cuthbert, M., & Ariza, C. (2008). List of Works Found in the music21 Corpus. Retrieved May 25, 2017, from http://web.mit.edu/music21/doc/about/referenceCorpus.html
4. Cuthbert, M., & Ariza, C. (2008). Music21 Authors, Acknowledgments, Contributing, and Licensing. Retrieved May 24, 2017, from http://web.mit.edu/music21/doc/about/about.html
5. Copyright and the Public Domain. (n.d.). Retrieved May 24, 2017, from http://www.pdinfo.com/copyright-law/copyright-and-public-domain.php
6. Zbikowski, L. (2009). Guidelines for Species Counterpoint. Retrieved May 24, 2017, from http://hum.uchicago.edu/classes/zbikowski/species.html
7. Hello World. (n.d.). Retrieved May 25, 2017, from http://www.musicxml.com/tutorial/hello-world/
8. Best Known Popular Public Domain Songs, http://www.pdinfo.com/pd-music-genres/pd-popular-songs.php
9. Folk Songs, http://www.pdmusic.org/folk.html
10. Folklore, http://www.csufresno.edu/folklore/
Find more helpful resources at the Intel® Nervana™ AI Academy.