Evaluation of musical phrases

In this project an attempt has been undertaken of automatic musical phrases evaluation. Estimation of esthetical values using strictly mathematical formulas doesn't really seem to be easy (if it's possible at all), that's why poor effects of this experiment should not surprise. From the same reason, that aim function is not strictly defined we felt eligible to use solutions (in some cases it were even "tricks"), which validity is hard to prove but which (probably) were leading to some approximated way of phrase estimation, taking on account particular properties.

4 functions of melody estimation were used:

  • Melo – task of this function is to estimate sound pitch in whole phrase using Fourier transform (DFT)
  • Velo – this function estimates amplitude of sound volume in whole phrase using DFT
  • Inter - simplifying – this is average size of intervals between successive sounds
  • Rthm - function estimates "richness" of rhythm

Last stage is computation of user's preference function (eval) on basis of given feature estimation (eval_index).

Phrase estimation functions

"Richness" of melody(Melo)

Function computes Discrete Fourier Transform (DFT); set of input samples is a vector of notes pitches, transformed in a way that sequence of successive, identical values is proportional to note's length which this sequence suits. Additional stage of computation fills gaps with neighbor values. Value eval_index_Melo is in approximation a sum of amplitudes of positive successive DFT constituents divided by constituent number. Additionally function Melo is only estimation function that takes melody tempo on account.

This function in its assumption was about to increase its value due to appearance of larger intervals or/and faster fragments of melody.

Differentiation of dynamics (Velo)

Function computes Discrete Fourier Transform (DFT); set of input samples is a vector of note pitches values prepared using the same method as above.

eval_index_Velo = (Sk = (1, fmax) (|F|k / k) + 0.25 * maxk = (1, fmax)|F|k ) / fmax, gdzie
|F|k is an amplitude of k-th DFT constituent.

Mean interval (Inter)

eval_index_Inter = sumeval / sumdiv

sumeval = Sk intereval (nk-nk+1) * divk,k+1
sumdiv = Sk divk,k+1
divk,k+1 = min {(dk + dk+1) / (dk + dk+1 + pk,k+1), (dk + pk,k+1) / (dk+1 + pk,k+1)}
, gdzie

nk - is k-th note pitch C),
dk – is k-th note length ,
pk,k+1 – gap (pause) between k-th and successive note.

Weight divk,k+1 introduction is justified because large interval between successive notes is less perceptible while these sounds are separated by large pause or first of them is a short pre-note (ornament) of second.

intereval (k) = 0.8 * log (|k| + 3)
+ 1.0 * log (8 – ||k| mod 12 – 6|)
+ 1.0 * tabeval |k| mod 12
+ 1.0 * [(|k| - 1) / 12]

Value of above function is a sum of four constituents. Two of them depend on logarithm from size of interval and distance of interval from size of 6 halftones. tabeval vector contains (arbitrary chosen) values dependent (in approximation) from degree of interval's dissonance. Last ingredient is equal zero only when interval is greater than octave.

"Richness" of rhythm(Rthm)

set of input samples is a vector of values increasing by 1 which each note.

eval_index_Rthm = Sk = (1, fmax) (avgk + std_devk) / fmax, where
avgk – is average value |Fk, j| computed by j (bars),
std_devk – is standard deviation among values Fk, j computed by j (bars),
Fk, j – is value of k-th cosine constituent computed for j-th bar.

Both replacing Fourier transform with cosine transform and its computation separately for each bar (together with taking into consideration variety of these values) is justified by willingness to pull out not only the richness' of rhythmic division among whole melody but also differences of local division.

Phrases estimation on basis of characteristics estimation.

For each inspected characteristic value of preference function equals:

eval = min {a/b, b/a}, gdzie

a = eval_index + add,
b = best + add.

It is quotient scaling to range [0,1]. Usage of constantadd (except of quotient scaling) prevents from incorrect arithmetical operations if best oreval_indexwas equal 0. Constant Best i add values were set arbitrarily.