Research Questions

An introduction to my PhD project

What My PhD Is About

My PhD examines how computers can better understand music by looking at more than just audio. I do that by looking at how we humans understand music by not only listening to it but also moving to the beat, looking at different visuals, reading text/lyrics, understanding cultural cues, and monitoring the physical behavior of performers. This multidimensional integration is called multimodal perception. When we combine recordings of multiple data sources in a computational system, we call it multimodal fusion, and it is a way to model music in a deeper and more human-like manner

This page explains the main research questions in my project, and gives you a visual “map” of how the ideas connect.

Multimodal Music Research Map
Music is Multimodal
We Use Data to Study Music
Technology Supports Understanding
⬇️
RQ1: What Is multimodality in music processing?
RQ2: To which extent can technology enhance the contextual understanding of music?
⬇️
Definitions & Frameworks
Audio-Video-Motion Datasets
Music Question Answering
Ethics & Data Management
⬇️
Better Understanding of Music

Research Question 1

How can multimodality be defined and used in computational music analysis?

Different fields talk about multimodality in different ways. In my PhD, I propose a simple idea: use multiple data sources only when each one adds something new and meaningful to the task. This helps researchers choose the right combination of audio, video, motion, text, and metadata, depending on what they want to understand.

Research Question 2

How much can technology enhance the contextual understanding of music?

I worked with several multimodal datasets, such as folk music, classical music, ensemble performance, motion capture, lyrics, and metadata, to see whether machines can answer questions like:

What is Music Performance?

Let's just say that music performance is music in action. It’s when performers turn compositions, improvisations, or musical ideas into sound, movement, and expression for an audience. Think of it as storytelling with sound, gestures, and emotion.

What is Music Performance Context?

Context is everything around the music that shapes its meaning: the venue, the audience, the gestures of performers, cultural rules, and even the placement of instruments. It’s what makes the same piece of music feel different in a concert hall, a jazz club, or a folk festival.

What is Music Understanding?

Music understanding begins when we stop simply hearing sound around us and start listening with attention. Hearing is passive and happens automatically. Listening is active and happens when we focus, follow, feel, and make sense of what unfolds in the music. Understanding music isn’t about knowing fancy terminology or analyzing scores, but about noticing what the music is doing, how it moves, how it feels, and how our own experiences help us interpret it.

People often imagine that only trained musicians or musicologists can “understand” music, but everyone can! A person may not know what a motif or cadence is, but they can still feel repetition, arrival, contrast, or surprise. They can follow a musical idea as it evolves, or sense the energy in a performer’s gesture. Understanding music is less about naming things and more about experiencing how music flows and how it speaks.

What is music information processing?

Computational music information processing is about turning the physical experience of music into digital data, and then into forms of machine cognition, so that computers can help us explore, analyze, and understand music in new ways.

metadata = {
"tempo": 95,
"key": "A minor",
"instrumentation": ["violin", "laouto", "voice"],
}

question = "What is the key of the piece?"

if "key" in question.lower():
    answer = metadata["key"]
else:
    answer = "Not sure yet!"

answer

Is my dataset multimodal?

Step 1 — What task are you performing?

Step 2 — What data does your dataset contain?

Multimodality Guide

This interactive tool helps you explore which music processing tasks you can perform with your dataset, or which data you need for a target task.

Step 1 — I have this data:








Step 2 — I want to perform this task:

Data Management Ethics & Compliance Test

This interactive test helps you assess whether your dataset management aligns with ethical and legal requirements.