Algomus datasets

This repository contains public data released by the Algomus team. Open data and corpora are essential for digital humanities studies. Systematic musicology approaches use corpora to infer new music knowledge or to confirm or challenge existing theories. Computational music analysis (CMA), or, more generally, music information retrieval (MIR) studies design algorithms that need to be evaluated.

Building corpora with reference analyses is not so easy – there are many different analyses of the same piece that are musically relevant. However, some analysis are definitely more correct than others. Some reference annotations can be used to evaluate some parts of analysis algorithms. As computer or data scientists, we would like to have computer-readable reference datasets that may be used as a ground truth to evaluate MIR/CMA algorithms or to infer new music knowledge. But as music theorists, we know that there is not only one correct analysis of a given piece: listeners, players, or theorists often disagree or at least propose several points of view. Anyway, there is consensus about some analytical elements by many music theorists, players or listeners. The fact that reaching consensus may be difficult on some points should not prevent us from trying to formalize some elements.

Available datasets


Fugues

Data: https://gitlab.com/algomus.fr/algomus-data/tree/master/fugues

  • 24 Bach fugues (WTC I, BWV 846-893) + 12 Shostakovich fugues (op.57, 1952)
  • S/CS/CS2 patterns, cadences, pedals (1000+ labels)
  • .ref (2013 -- 2016) + converted .dez (2019)

Reference: M. Giraud et al., Computational Fugue Analysis, Computer Music Journal, 39(2), 77-96, doi:10.1162/COMJ_a_00300, 2015, https://hal.archives-ouvertes.fr/hal-01113520

See also http://www.algomus.fr/fugues


Texture in string quartets

Data: https://gitlab.com/algomus.fr/algomus-data/tree/master/quartets/texture

  • 11 movements (Haydn, Mozart, Schubert)
  • 700+ texture labels
  • .ref (2014)

See also http://www.algomus.fr/texture

Reference: M. Giraud et al., Towards modeling texture in symbolic data, ISMIR 2014, https://hal.archives-ouvertes.fr/hal-01057017


Mozart String Quartets

Data: https://gitlab.com/algomus.fr/algomus-data/tree/master/quartets/mozart

  • 32 sonata-form and sonata-form-like movements
  • sonata form structure (following Hepokoski&Darcy 2006) and cadences (2000+ labels)
  • .dez encoded within Dezrann by L. Feisthauer (2017 -- 2019)

Reference: P. Allegraud et al., Learning sonata form structure on Mozart's string quartets, in revision

See also http://www.algomus.fr/sonata


Data formats

Data are presented in two formats:

  • .dez json files, that can be read and written
  • .ref annotation files, described in syntax.ref

We publish all new data in .dez format, and will progressively convert the old files.

License

These data are made available under the Open Database License. Any rights in individual contents of the database are licensed under the Database Contents License.