VoxCeleb-PT: Difference between revisions

From HLT@INESC-ID

No edit summary
No edit summary
 
(2 intermediate revisions by the same user not shown)
Line 9: Line 9:


== Download ==
== Download ==
Please fill out [https://forms.gle/EuLPwgVLWQdBBPqYA this form ]to access the data.


* Original. The dataset follows the kaldi file system: each folder contains speech files from a given speaker and the following: ''text'', ''utt2spk'' and ''wav.scp''.  The ''spk_info.csv'' file maps the speaker id to the celebrities' age and gender.
* Original. The dataset follows the kaldi file system: each folder contains speech files from a given speaker and the following: ''text'', ''utt2spk'' and ''wav.scp''.  The ''spk_info.csv'' file maps the speaker id to the celebrities' age and gender.


* With splits
* With splits. Contains Train,dev and hidden test splits.
 
* Raw. Raw .mp4 and .wav files together with subtitle files.


== License ==  
== License ==  


This dataset is available to download for research purposes under a Creative Commons Attribution 4.0 International License. The copyright remains with the original owners of the video.
This dataset is available to download for research purposes under a [https://creativecommons.org/licenses/by/4.0/ Creative Commons Attribution 4.0 International License]. The copyright remains with the original owners of the video.


The views and opinions expressed by speakers in the dataset are those of the individual speakers and do not necessarily reflect the positions of the University of Lisbon, INESC-ID, or the authors.
The views and opinions expressed by speakers in the dataset are those of the individual speakers and do not necessarily reflect the positions of the University of Lisbon, INESC-ID, or the authors.

Latest revision as of 18:39, 28 June 2022

VoxCeleb-PT is a small dataset of voices of Portuguese celebrities that can be used as a language-specific extension of the widely used VoxCeleb corpus.

Statistics

The dataset contains 51 celebrities, of which 23 are female. Altogether, Voxceleb-PT contains 26,663 automatically transcribed utterances (.wav 16kHz, pcm_s16le). The total duration is 17:55:14, with an average of 20 min/spk, and utterance duration 2-5s.


The dataset can be obtained in its original form and with both development (train/val) and test sets, all containing the full speaker cohort. All sets contain speaker id, age, gender and manually corrected transcriptions. As such, VoxCeleb-PT should prove useful for a variety of tasks, namely ASR, Speaker Verification and Age/Gender Recognition.

Download

Please fill out this form to access the data.

  • Original. The dataset follows the kaldi file system: each folder contains speech files from a given speaker and the following: text, utt2spk and wav.scp. The spk_info.csv file maps the speaker id to the celebrities' age and gender.
  • With splits. Contains Train,dev and hidden test splits.
  • Raw. Raw .mp4 and .wav files together with subtitle files.

License

This dataset is available to download for research purposes under a Creative Commons Attribution 4.0 International License. The copyright remains with the original owners of the video.

The views and opinions expressed by speakers in the dataset are those of the individual speakers and do not necessarily reflect the positions of the University of Lisbon, INESC-ID, or the authors.