Emo models

Updated results table:

previous xlsr(pre-trained, not finetuned) model eval2 conf matrix:

eval_conf_matrix_2_acc=0.958.png

all of the pred-happiness, true-sadness comes from the audio:

194733f4-59f2-43ba-9f57-a140f9e847df_……

Laughter

1.) Stuttering dataset - 30_013 3s samples labelled as stuttering events.

2.) Updated validation:

1.) Robust FP test - all other class validation samples checked with step 0.1s

2.) More similar to API - all samples denoised AND diarized, step=1s

Additional architecture tests are still running with wav2vec2 small (96M - about the same param count as currently used resnext)

Previously used 487th model shows problems with robustness :

Untitled

Best models without FP in robustness test:

Untitled