Analyzing the structure of Sonata Forms
Many classical works from 18th and 19th centuries are sonata forms, exhibiting a piecelevel tonal path through an exposition, a development and a recapitulation and involving two thematic zones as well as other elements. The computational music analysis of scores with such a largescale structure is a challenge for the MIR community and should gather different analysis techniques.
We propose in our papers (ISMIR 2017, TISMIR 2019) first steps in that direction, combining analysis features on patterns, harmony, and other elements into a structure estimated by a Viterbi algorithm on a Hidden Markov Model. We test this strategy on a set of first movements of Haydn and Mozart string quartets. The proposed computational analysis strategy finds some pertinent features and detects a sketch of the exposition/recapitulation structure in most of the pieces that have a simple sonata form.
Hidden Markov Model sketching a regular sonata form structure from analysis features
The initial state is P, the final states are S' and C'. The square states (MC “medial caesura”, d “transition to development”, p “retransition to primary theme”) are transient states intended to last one or a few quarters, and are characterized by break features (#, 7, AC, r, uni).
Transitions and emission probabilities of the selected symbols were first choosen by trialanderror process [Bigo 2017], Each state has a (not shown) loop transistion over itself with a high probability. The horizontal straight transitions have the second highest probabilities, and the curved dashed transitions enable to skip some states with a low probability. Only the main emissions are shown here: the states may also emit other symbols with a low probability. Tonalities, patterns, and other features are described in the main text. For clarity, auxiliary transitions and emissions are not shown in the recapitulation. They are the same than in the exposition, except that the tonality emissions focus on the main tonality I.
transition/emission probabilities from [Bigo 2017]
Transitions and emission probabilities were then learned on actual data [Allegraud 2019], improving analysis results. Full data: http://algomus.fr/data/
Structure detection on ten first movements of Haydn and Mozart string quartets [2017]
The top lines are the reference analyses and the bottom line the structure found by the HMM.
The four columns MC+S, D, P' and MC'+S' evaluate the prediction of the start of these events or sections: + (perfect or almost, that is at most 1 measure shifted from the reference), = (approximate match, between 2 and 3 measures),  (not found, or too far from the reference, at least 4 measures).
We do not evaluate S positions (.) for pieces marked with â˜…, as they do not follow a “regular” bithematic sonata form structure with a clear secondary theme.
References

L. Bigo, M. Giraud, R. Groult, N. GuiomardKagan, F. LevÃ©, Sketching sonata form structure in selected classical string quartets, ISMIR 2017

P. Allegraud et al., Learning Sonata Form Structure on Mozartâ€™s String Quartets, TISMIR, 2(1), pp. 82â€“96