Research Questions – Anna-Maria Christodoulou

My PhD examines how computers can better understand music by looking at more than just audio. I do that by looking at how we humans understand music by not only listening to it but also moving to the beat, looking at different visuals, reading text/lyrics, understanding cultural cues, and monitoring the physical behavior of performers. This multidimensional integration is called multimodal perception. When we combine recordings of multiple data sources in a computational system, we call it multimodal fusion, and it is a way to model music in a deeper and more human-like manner

This page explains the main research questions in my project, and gives you a visual “map” of how the ideas connect.

Research Question 1

Different fields talk about multimodality in different ways. In my PhD, I propose a simple idea: use multiple data sources only when each one adds something new and meaningful to the task. This helps researchers choose the right combination of audio, video, motion, text, and metadata, depending on what they want to understand.

Research Question 2

I worked with several multimodal datasets, such as folk music, classical music, ensemble performance, motion capture, lyrics, and metadata, to see whether machines can answer questions like:

Let's just say that music performance is music in action. It’s when performers turn compositions, improvisations, or musical ideas into sound, movement, and expression for an audience. Think of it as storytelling with sound, gestures, and emotion.

Context is everything around the music that shapes its meaning: the venue, the audience, the gestures of performers, cultural rules, and even the placement of instruments. It’s what makes the same piece of music feel different in a concert hall, a jazz club, or a folk festival.

Music understanding begins when we stop simply hearing sound around us and start listening with attention. Hearing is passive and happens automatically. Listening is active and happens when we focus, follow, feel, and make sense of what unfolds in the music. Understanding music isn’t about knowing fancy terminology or analyzing scores, but about noticing what the music is doing, how it moves, how it feels, and how our own experiences help us interpret it.

People often imagine that only trained musicians or musicologists can “understand” music, but everyone can! A person may not know what a motif or cadence is, but they can still feel repetition, arrival, contrast, or surprise. They can follow a musical idea as it evolves, or sense the energy in a performer’s gesture. Understanding music is less about naming things and more about experiencing how music flows and how it speaks.

Music Understanding
- Listening (not just hearing!)
  - Attention + curiosity
- Perception
  - How we group, follow, and sense musical ideas
- Cognition
  - Memory, prediction, timing, reasoning
- Embodiment
  - How our body feels, moves, and simulates the music
- Emotion & Experience
  - What the music makes us feel and imagine
- Culture
  - What our background teaches us to notice and value

Computational music information processing is about turning the physical experience of music into digital data, and then into forms of machine cognition, so that computers can help us explore, analyze, and understand music in new ways.

metadata = {
"tempo": 95,
"key": "A minor",
"instrumentation": ["violin", "laouto", "voice"],
}

question = "What is the key of the piece?"

if "key" in question.lower():
    answer = metadata["key"]
else:
    answer = "Not sure yet!"

answer

Is my dataset multimodal?

Step 1 — What task are you performing?

Step 2 — What data does your dataset contain?

Audio Lyrics/Text MIDI/Symbolic Score Video Motion Capture / IMU / EMG Physiology (ECG, GSR...) Human Annotations Images (covers, etc.)

Multimodality Guide

This interactive tool helps you explore which music processing tasks you can perform with your dataset, or which data you need for a target task.

Step 1 — I have this data:
Audio
Lyrics/Text
MIDI/Symbolic Score
Video
Motion Capture / IMU / EMG
Physiology (ECG, GSR...)
Human Annotations
Images (covers, etc.)

Step 2 — I want to perform this task:

Data Management Ethics & Compliance Test

This interactive test helps you assess whether your dataset management aligns with ethical and legal requirements.

What My PhD Is About

Research Question 1

Research Question 2

Multimodality Guide

Data Management Ethics & Compliance Test