Why study music performance
scientifically?
In the introduction, it was
mentioned that researchers may have many different reasons for wanting to study
music performance. To the psychologist, studying musical performances allows
them to consider how individuals create, process information, and communicate
through a shared, non-verbal “language”. To the historian, studying
performances enables them to consider the nature of musical interpretation and
changes in style over time. Finally, to the data scientist, studying
performance allows for the development of classification and recommendation
algorithms that enhance music discovery, user engagement, and personalised
content delivery. However, regardless of their field, there are broad
similarities in how researchers extract and analyse data from musical
performances.
The scientific study of
musical performance dates back to the late 1800s; however, progress has
invariably been constrained by the pace of technological development. In the
earliest studies of performances, measurements had to be made manually from
musical instruments; one experiment involved attaching magnets to the keys of a
piano keyboard, which in turn were placed underneath a spinning metallic drum.
As the performer pressed the keys, these “drew” traces onto the drum, and the
length of the trace could then be measured to estimate how long the key was
held for. Nowadays, with the advent of music streaming and digital downloads,
we can collect data automatically from thousands of hours of audio recordings,
without even having to conduct an experiment in-person.
Collecting data from
musical performances
We have already touched on two
methods for collecting quantitative data from musical performances: 1) by
conducting experiments, or 2) by analysing recordings. Experimental studies
involve participants performing music in a controlled environment. We will then
manipulate something to do with the performance situation, with this manipulation
becoming our “independent variable”. We then study the effect that manipulating
our independent variable has on another series of variables, called our “dependent
variables”. We can do this by collecting audio and video recordings of the
performers, as well as “self-report” data like questionnaires and interviews.
In an experiment, we don’t have to manipulate the performance context in an
extreme manner. One example involves an experiment where the members of a
string quartet were instructed to perform the same piece several times, either
“expressively” or “unexpressively”. The researchers then studied how their body
sway changed in response to the instructions they were given.
The second method to collect quantitative
data involves working with commercial audio or video recordings, such as those
we’d find on CDs or streaming services. In this case, we no longer have an
explicit “independent” variable: we’re not manipulating anything to do with the
performance context. But we can still study how musical factors might differ between
performers or historical eras, for instance, and these categories can become
equivalent to our independent variable. The main disadvantage of studying recordings,
as opposed to running experiments, is that we can’t control the quality of the
performances. For instance, when working with historical recordings, we often
have to consider the age of the recorded medium: recordings made on tape or
vinyl might be noisy or play back slower or faster than the actual performance,
and this can affect the quality of the data we extract. However, working with
recordings is the only way to study important performers or performances, as
well as study how trends in musical performance have evolved historically over
time.
Regardless of how we have
collected our quantitative data, we also need to think about how we can analyse
it. Here, we are generally concerned with “features” (sometimes also known as
attributes). Each feature represents a measurable piece of information that can
be used in analysis. If we consider the “features” of an animal, this might
include its species, height, weight, and age; for a musical performance, this
could include its tempo, loudness, and duration. When we represent this data in
a table, we typically would see our “observations” (individual performances) as
rows, and our features as columns. We can then use our features to develop models
to help explain our research questions: going back to our earlier string
quartet example, we could produce a regression model that predicts how much a
performer will move based on whether or not we told them to play expressively.
Real-world applications
There are many possible
applications for this work, some of which you may already be familiar with. One
example includes the development of music recommendation systems. When a
streaming platform like Spotify or YouTube recommends content for you to listen
to next, it needs to be able to extract a variety of different audio features
from the tracks you’ve already heard. Some of these features are relatively
straightforward, like the mode (major or minor key) and tempo. Others are more
complex: for instance, a track’s “danceability” refers to how suitable it is
for dancing, based on a combination of its tempo, rhythmic stability, beat
strength, and overall regularity, while “energy” considers dynamic range, loudness,
and musical density to provide a score for the perceived intensity and activity
of a track. Once these features have been extracted, they can then be
cross-referenced with your listening habits (e.g., the genres and artists you
typically prefer listening to, as well as related genres and artists) and other
metadata in order to compile personalised recommendations for you.