Algomus datasets
This repository contains and links public data released by the Algomus team. Open data and corpora are essential for digital humanities studies. Systematic musicology approaches use corpora to infer new music knowledge or to confirm or challenge existing theories. Computational music analysis (CMA), or, more generally, music information retrieval (MIR) studies design algorithms that need to be evaluated.
Building corpora with reference analyses is not so easy – there are many different analyses of the same piece that are musically relevant. However, some analysis are definitely more correct than others. Some reference annotations can be used to evaluate some parts of analysis algorithms. As computer or data scientists, we would like to have computer-readable reference datasets that may be used as a ground truth to evaluate MIR/CMA algorithms or to infer new music knowledge. But as music theorists, we know that there is not only one correct analysis of a given piece: listeners, players, or theorists often disagree or at least propose several points of view. Anyway, there is consensus about some analytical elements by many music theorists, players or listeners. The fact that reaching consensus may be difficult on some points should not prevent us from trying to formalize some elements.
Available datasets
Fugues
Data: https://gitlab.com/algomus.fr/algomus-data/tree/master/fugues
- 24 Bach fugues (WTC I, BWV 846-893) + 12 Shostakovich fugues (op.57, 1952)
- S/CS/CS2 patterns, cadences, pedals (1000+ labels)
.ref
(2013 – 2016) + converted.dez
(2019)
Reference: M. Giraud et al., Computational Fugue Analysis, Computer Music Journal, 39(2), 77-96, doi:10.1162/COMJ_a_00300, 2015, https://hal.archives-ouvertes.fr/hal-01113520
See also http://www.algomus.fr/fugues
Texture in string quartets
Data: https://gitlab.com/algomus.fr/algomus-data/tree/master/quartets/texture
- 11 movements (Haydn, Mozart, Schubert)
- 700+ texture labels
.ref
(2014 and 2021), also translated from.dez
files (2021)
See also http://www.algomus.fr/texture
References:
- M. Giraud et al., Towards modeling texture in symbolic data, ISMIR 2014
- L. Soum-Fontez et al., Symbolic Textural Features and Melody/Accompaniment Detection in String Quartets, CMMR 2021
Texture in piano music
Data: https://gitlab.com/algomus.fr/symbolic-texture-dataset
- 9 movements from Mozart piano sonatas (K279, K280 and K283)
- 1164 texture labels
.dez
,.txt
(human-readable format),.tsv
(machine-readable format)
References:
- L. Couturier et al., A Dataset of Symbolic Texture Annotations in Mozart Piano Sonatas, ISMIR 2022
- L. Couturier et al., Annotating Symbolic Texture in Piano Music: a Formal Syntax, SMC 2022
Mozart String Quartets
Data: https://gitlab.com/algomus.fr/algomus-data/tree/master/quartets/mozart
- 32 sonata-form and sonata-form-like movements
- sonata form structure (following Hepokoski&Darcy 2006) and cadences (2000+ labels)
.dez
encoded within Dezrann by L. Feisthauer (2017 – 2019)
Additional data: features occurrences, models with learned probabilities (2019)
Reference: P. Allegraud et al., Learning sonata form structure on Mozart’s string quartets, in revision
See also http://www.algomus.fr/sonata
Orchestration
Data: https://gitlab.com/algomus.fr/orchestration
- 24 first movements of Haydn, Mozart, and Beethoven symphonies
- 7941 texture labels describing 8528 bars
Reference: D. V. T. Le et al., A Corpus Describing Orchestral Texture in First Movements of Classical and Early-Romantic Symphonies, DLfM 2022
Guitar positions
Data: https://gitlab.com/algomus.fr/algomus-data/tree/master/guitar/
- 2248 tracks in 1022 songs/pieces
- 1000 most frequent position vectors for chords (with 2-3 notes, with 4-6 notes)
Reference: J. Cournut et al., What are the most used guitar positions?, DLfM 2021
Data: https://gitlab.com/lbigo/rhythm-guitar-detection
For 102 guitar tablatures, at the bar level:
- Manual annotations (rhythm guitar or not)
- 31 computed features (note/chord values, variety, playing techniques, etc.)
Reference: D. Regnier et al., Identification of rhythm guitar sections in symbolic tablatures, ISMIR 2021
Data formats
Data are presented in several formats, including:
.dez
json files, that can be read and written- by Dezrann
- by our
music21.schema
module (see Bagan 2015, upcoming update in 2019)
.ref
annotation files, described in syntax.ref
We publish all new data in .dez
format, and will progressively convert the old files.
License
These data are made available under the Open Database License. Any rights in individual contents of the database are licensed under the Database Contents License.
Other data
AI Song Contest
https://gitlab.com/algomus.fr/algomus-data/tree/master/ai-song/2020-i-keep-counting
Reference: G. Micchi et al., I Keep Counting: An Experiment in Human/AI Co-creative Songwriting, TISMIR, in press