In this project an attempt has been undertaken of automatic musical phrases evaluation. Estimation of esthetical values using strictly mathematical formulas doesn't really seem to be easy (if it's possible at all), that's why poor effects of this experiment should not surprise. From the same reason, that aim function is not strictly defined we felt eligible to use solutions (in some cases it were even "tricks"), which validity is hard to prove but which (probably) were leading to some approximated way of phrase estimation, taking on account particular properties.

4 functions of melody estimation were used:

*Melo* – task of this function is to estimate sound **pitch** in whole phrase using Fourier transform (DFT)
*Velo* – this function estimates amplitude of sound **volume** in whole phrase using DFT
*Inter* - simplifying – this is average size of **intervals** between successive sounds
*Rthm* - function estimates "richness" of **rhythm**

Last stage is computation of user's **preference** function (*eval*) on basis of given **feature** estimation (*eval_index*).

### Phrase estimation functions

#### "Richness" of melody(*Melo*)

Function computes Discrete Fourier Transform (DFT); set of input samples is a vector of notes pitches, transformed in a way that sequence of successive, identical values is proportional to note's length which this sequence suits. Additional stage of computation fills gaps with neighbor values. Value *eval_index_Melo* is in approximation a sum of amplitudes of positive successive DFT constituents divided by constituent number. Additionally function *Melo* is only estimation function that takes melody tempo on account.

This function in its assumption was about to increase its value due to appearance of larger intervals or/and faster fragments of melody.

#### Differentiation of dynamics (*Velo*)

Function computes Discrete Fourier Transform (DFT); set of input samples is a vector of note pitches values prepared using the same method as above.

**eval_index_Velo = (S**_{k = (1, fmax)} (|F|_{k} / k) + 0.25 * max_{k = (1, fmax)}|F|_{k} ) / f_{max}

, gdzie

**|F|**_{k} is an amplitude of k-th DFT constituent.

#### Mean interval (*Inter*)

**eval_index_Inter = sumeval / sumdiv**

**sumeval** = **S**_{k} **intereval** (n_{k}-n_{k+1}) * div_{k,k+1}

**sumdiv** = **S**_{k} div_{k,k+1}

div_{k,k+1} = min {(d_{k} + d_{k+1}) **/** (d_{k} + d_{k+1} + p_{k,k+1})**, **(d_{k} + p_{k,k+1}) **/** (d_{k+1} + p_{k,k+1})}

, gdzie

*n*_{k} - is k-th note pitch C),

**d**_{k} – is k-th note length ,

**p**_{k,k+1} – gap (pause) between k-th and successive note.

Weight *div*_{k,k+1} introduction is justified because large interval between successive notes is less perceptible while these sounds are separated by large pause or first of them is a short pre-note (ornament) of second.

**intereval** (**k**) = 0.8 * log (**|k|** + 3)

+ 1.0 * log (8 – |**|k| mod 12** – 6|)

+ 1.0 * **tabeval** _{|k| mod 12}

+ 1.0 * [(**|k|** - 1) / 12]

Value of above function is a sum of four constituents. Two of them depend on logarithm from size of interval and distance of interval from size of 6 halftones. tabeval vector contains (arbitrary chosen) values dependent (in approximation) from degree of interval's dissonance. Last ingredient is equal zero only when interval is greater than octave.

#### "Richness" of rhythm(*Rthm*)

set of input samples is a vector of values increasing by 1 which each note.

**eval_index_Rthm = S**_{k = (1, fmax)} (avg_{k} + std_dev_{k}) / f_{max}

, where

**avg**_{k} – is average value **|F**_{k, j}| computed by **j** (bars),

**std_dev**_{k} – is standard deviation among values **F**_{k, j} computed by **j** (bars),

**F**_{k, j} – is value of k-th cosine constituent computed for j-th bar.

Both replacing Fourier transform with cosine transform and its computation separately for each bar (together with taking into consideration variety of these values) is justified by willingness to pull out not only the richness' of rhythmic division among whole melody but also differences of local division.

### Phrases estimation on basis of characteristics estimation.

For each inspected characteristic value of **preference function** equals:

**eval** = min {a/b, b/a}

, gdzie

`a = `**eval_index** + add

,

`b = best + add`

.

It is quotient scaling to range [0,1]. Usage of constant*add* (except of quotient scaling) prevents from incorrect arithmetical operations if *best* or*eval_index*was equal 0. Constant *Best* i *add* values were set arbitrarily.