Algomus datasets

https://www.algomus.fr/data

This repository contains and links public data released by the Algomus team. Open data and corpora are essential for digital humanities studies. Systematic musicology approaches use corpora to infer new music knowledge or to confirm or challenge existing theories. Computational music analysis (CMA), or, more generally, music information retrieval (MIR) studies design algorithms that need to be evaluated.

Building corpora with reference analyses is not so easy – there are many different analyses of the same piece that are musically relevant. However, some analysis are definitely more correct than others. Some reference annotations can be used to evaluate some parts of analysis algorithms. As computer or data scientists, we would like to have computer-readable reference datasets that may be used as a ground truth to evaluate MIR/CMA algorithms or to infer new music knowledge. But as music theorists, we know that there is not only one correct analysis of a given piece: listeners, players, or theorists often disagree or at least propose several points of view. Anyway, there is consensus about some analytical elements by many music theorists, players or listeners. The fact that reaching consensus may be difficult on some points should not prevent us from trying to formalize some elements.

Available datasets


Corelli Trio Sonata

Data: https://gitlab.com/algomus.fr/corelli-trio-sonatas

  • 38 slow movements from Corelli’s church sonatas (op. 1 and 3)
  • Texture, imitations patterns, melodic sequences
  • Harmonic data from (Hentschel et al., 2021)
  • Corpus on Dezrann

Reference: J. Roux et al., Annotating Texture and Imitation Patterns in a Corpus of Slow Movements in Corelli’s Trio Sonatas, submitted (2024)

Fugues

Data: https://gitlab.com/algomus.fr/algomus-data/tree/master/fugues

  • 24 Bach fugues (WTC I, BWV 846-893) + 12 Shostakovich fugues (op.57, 1952)
  • S/CS/CS2 patterns, cadences, pedals (1000+ labels)
  • .ref (2013 – 2016) + converted .dez (2019)
  • Corpus on Dezrann

Reference: M. Giraud et al., Computational Fugue Analysis, Computer Music Journal, 39(2), 77-96, doi:10.1162/COMJ_a_00300, 2015, https://hal.archives-ouvertes.fr/hal-01113520

See also https://www.algomus.fr/fugues


Texture in string quartets

Data: https://gitlab.com/algomus.fr/algomus-data/tree/master/quartets/texture

  • 11 movements (Haydn, Mozart, Schubert)
  • 700+ texture labels
  • .ref (2014 and 2021), also translated from .dez files (2021)

See also https://www.algomus.fr/texture

References:


Texture in piano music

Data: https://gitlab.com/algomus.fr/symbolic-texture-dataset

  • 9 movements from Mozart piano sonatas (K279, K280 and K283)
  • 1164 texture labels
  • .dez, .txt (human-readable format), .tsv (machine-readable format)
  • Corpus on Dezrann

References:


Mozart String Quartets

Data: https://gitlab.com/algomus.fr/algomus-data/tree/master/quartets/mozart

  • 32 sonata-form and sonata-form-like movements
  • sonata form structure (following Hepokoski&Darcy 2006) and cadences (2000+ labels)
  • .dez encoded within Dezrann by L. Feisthauer (2017 – 2019)
  • Corpus on Dezrann

Additional data: features occurrences, models with learned probabilities (2019)

Reference: P. Allegraud et al., Learning sonata form structure on Mozart’s string quartets, TISMIR, 2019

See also https://www.algomus.fr/sonata


Orchestration

Data: https://gitlab.com/algomus.fr/orchestration

  • 24 first movements of Haydn, Mozart, and Beethoven symphonies
  • 7941 texture labels describing 8528 bars

Reference: D. V. T. Le et al., A Corpus Describing Orchestral Texture in First Movements of Classical and Early-Romantic Symphonies, DLfM 2022


Guitar positions

Data: https://gitlab.com/algomus.fr/algomus-data/tree/master/guitar/

  • 2248 tracks in 1022 songs/pieces
  • 1000 most frequent position vectors for chords (with 2-3 notes, with 4-6 notes)

Reference: J. Cournut et al., What are the most used guitar positions?, DLfM 2021

Data: https://gitlab.com/lbigo/rhythm-guitar-detection

For 102 guitar tablatures, at the bar level:

  • Manual annotations (rhythm guitar or not)
  • 31 computed features (note/chord values, variety, playing techniques, etc.)

Reference: D. Regnier et al., Identification of rhythm guitar sections in symbolic tablatures, ISMIR 2021

Slovenian Folk Songs Ballads

Data and code: https://gitlab.com/algomus.fr/slovenian-folksongs

References:

Impact of ISMIR in musicology

Data: https://gitlab.com/algomus.fr/algomus-data/tree/master/ismir-impact-musicology

  • list of 699+ citations to 114 ISMIR papers, which claim to have some musicological utility
  • further analysis of 143 citations in some musicological venue to 28 ISMIR papers, including the 51 citations which “somewhat used” the ISMIR contribution

Reference: V. N. Borsan et al., The Games We Play: Exploring The Impact of ISMIR on Musicology, ISMIR 2023


Co-creative orchestration of Angeles

Data: https://gitlab.com/algomus.fr/angeles

  • code to generate orchestration plans with a Makov model
  • orchestral scores for Inexorable and El jardin Etereo

Reference: Co-creative Orchestration of Angeles with Layer Scores and Orchestration Plans, evoMUSART 2024


Guitar Chord Diagrams

Coming soon…


Data formats

Analysis data are presented in several formats, including:

  • .dez json files, that can be read and written
  • .ref annotation files, described in syntax.ref

We publish all new data in .dez format, and will progressively convert the old files.

License

These data are made available under the Open Database License. Any rights in individual contents of the database are licensed under the Database Contents License.

Other data

AI Song Contest

https://gitlab.com/algomus.fr/algomus-data/tree/master/ai-song/2020-i-keep-counting

Reference: G. Micchi et al., I Keep Counting: An Experiment in Human/AI Co-creative Songwriting, TISMIR, in press

See also https://www.algomus.fr/i-keep-counting

Analyse et essai de reconstruction de structures arborescentes des grilles de jazz

Code : https://gitlab.com/algomus.fr/algomus-data/-/tree/master/jazz-arbres

  • analyse JHT trees properties
  • rebuild using our own algorithm and evaluate

Reference: Patrice Thibaud, Mathieu Giraud. Analyse et essai de reconstruction de structures arborescentes des grilles de jazz. Journées d’Informatique Musicale (JIM 2023), May 2023, Paris, France. pp.62-67.