MLFDB - Multi-frame Labeled Faces Database

About Database

The dataset was created from +- 300 youtube videos and there are approx. 6,000 - 7,000 different persons (computed by face-recognition framework [2] based on Facenet [3]). Sequences from one video source were filtering using SSIM metric with threshold 0.7 (only labels were compared) to assure different scale, lighting, angles… even there is a same person (more details in „how was it created“ part). The dataset contains almost 17500 samples – (sequences with label), it is divided into few sets with approx. ratio 70/15/15%:

Database division into individual sets: table data

Different people computation using Face-recognition framework: thresholds FR

_______________________________

Publicly available sets are TRAIN and TEST together counting 14800 sequences. Each sample is organized in individual folder and in every set they are numbered from 1 – n (TRAIN: 1 – 12200, TEST: 1 – 2600, …). Each sample folder contains:

7 images in low resolution (32x32) named ‚img_1.JPG‘, …, ‚img_7.JPG‘.
Label is defined from the middle of the sequence (from img_4.JPG) with base resolution 64x64 and name label.JPG
Some of the sample folders also contain label_128.JPG or even label_256.JPG. Of course every sample contains 64x64 resolution label. There is a info_dataset.json in the root directory with informations which folders contains 128x128 and 256x256 labels.
info.txt containing source video link, label (its bounding box and frame number from source video), interpolation method and JPEG compression that were used for resizing sequence images to low resolution 32x32. Label was resized to 64x64 with the same interpolation method but compression remains the best possible – 100.

Example of the sample data:

How it was created

One of the state-of-the-art face detector – YOLO v3 [4] and its implementation [5] was used for face detection. Images that had at least 64x64 resolution were also filtered in this step. Label and its bounding box and frame number was recorded. Detections were taken from randomly chosen frames in the video. Raw sequences were created using OpenCV [6] framework. It found corresponding label frame, cropped image based on bounding box and finally took 3 frames before and 3 frames after and cropped them using same bounding box. This was intentional step to make database more realist as there are many cases when face detector do not correctly crop the face (not accurate detection, crop scaling factor,…). Amount of raw sequences is 36,123. There are few filtration steps that need to be done before final database is ready:

Each sequence is controlled whether label can be encoded by face-recognition framework because of objective measurement metric – face-recognition rate. YOLO is a face detector system, not face-recognition system, therefore it detects even hardly detected faces, where face encodings may not be computed. Reduction from 36,123 to 25,174.
It is possible that one sequence contains multiple faces – it is basically good if more faces or their parts are in one image, but problem arises when there are different people at the beginning of the sequence and at its end (can happen by movie split). Therefore label is compared using face-recognition (threshold 0.7) with all other images in sequence. Multiple faces in the image were taken into account. Reduction from 25,174 to 24,930.
Because of there are multiple faces extracted from one video, there are probably same persons in sequences from one video source. This reduction step compare only labels. The biggest problem is how to distinguish between very similar images and images with different lighting, angle etc. Face-recognition is not appropriate solution for this task due to its ability for face alignment – can perform recognition from different conditions. We decided to use a structural similarity – SSIM (MSE, PSNR etc., are not efficient metrics for this case) with threshold 0.7 using OpenCV. Images are firstly resized into 64x64 resolution. Reduction from 24,930 to 22,705.
Control whether all sequence images contain face by face-recognition framework and its method for face-detection (not recognition). Usage of face-detection in this case is explained as label is only image that is used for face-recognition rate metric, other images should contain face, but do not have to be recognizable. Reduction from 22,705 to 18,021.
The sequences are normalized to the same size – sequence images into 32x32 and label into 64x64 resolution (labels 128x128 and 256x256 were also created if it was possible). Each sequence used randomly chosen one from well-known pixel based interpolations such as: Nearest neighbor, Bilinear, Bicubic, Lanczos (over 8x8 neighborhood). Label was saved with the best possible JPEG quality (compression 100) and sequence images were saved by randomly chosen JPEG compression from scale 30 - 90. No reduction in this step.
After final normalization the corresponding resolutions, few labels lost their possibility to make face-recognition metric on them (face-encodings are not able to compute). Sequences containing these images were removed. Reduction from 18,021 to 17,426.

Terms of use

Disclaimer
Inspired by Labeled Faces in the Wild (LFW) [7] and due to specificity of the database, we decided to write the disclaimer. The main purpose of the database is for training a multi-frame systems that the best reconstruct faces into higher resolution. MLFDB is considered as a public benchmark with strictly given rules and leader board (See Results part). We do not store and we do not have any information about people in the videos to identify them. The database was automatically created by program. According to Youtube, user who uploaded the video is fully responsible for its content. Even the performance of any algorithm is promising, the commercial use is prohibited by the licence. Any face-recogniton related system will be always considered as a recommendation system (consider objective metrics and their values for real scenarios). The primary purpose of MLFDB is to move further face-recognition related area, especially using low resolution face sequences with provided high resolution label.

Youtube
Licence and Conditions for youtube.com can be seen here: conditions and terms. We selected the most important points:
"Respect copyright. Only upload videos that you made or that you're authorized to use."
"You are legally responsible for the Content you submit to the Service"
"You also grant each other user of the Service a worldwide, non-exclusive, royalty-free licence to access your Content through the Service, and to use that Content (including to reproduce, distribute, modify, display, and perform it) only as enabled by a feature of the Service."

Licence
RESPECT the Terms of Use and Licence policy of the Youtube!
The database is provided on "AS IS" basis without ANY WARRANTIES. You are solely responsible of using the database and assume any risks associated with your exercise of permissions under this License. The licence is worldwide, non-exclusive, no-charge, copyright license to use the database and prepare derivative works. In NO EVENT and under NO LEGAL THEORY, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, SHALL AUTHORS BE LIABLE to you for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of the use the database. The database must not be used commercially. It is dedicated only for academic and scientific purposes (i.e. training of AI models). Each use of database including derivative works has to be properly sourced by citation You are not eligible to create any copy and distribution of the database and do not present publicly any part of the database. You are required to immediately remove all database files if you are requested by authors.

Download

.
TRAINING DATA are publicly available and contain TRAIN and TEST set. The database is provided as .ZIP file.
You need to fill up the request form and agree with terms of use to download MLFDB database.
The training data contains info files, image sequences and labels.
.

.
I have read terms of use and declare I understand it and agree with them.

.
TESTING DATA are also available as .ZIP file and there are two sets - Test set for overall evaluation and objective metrics computation and Questionnaire data for optional subjective human evaluation.
Labels are not available! If you want to evaluate your model, follow the instructions in Results section. We provide automatic evaluation.
You need to write your email address that was used for downloading training data. Terms of use has to be again accepted.
.

I agree with terms of use and rules for TEST SET.

I agree with terms of use and rules for QUESTIONNAIRE DATA.

Small experimental test set with images for presentation (youtube CC licence) is available here. The source code of models are available here as .zip file. Notice that all models were trained using first 2000 samples of the training set.

Results

Releasing MLFDB publicly is important step for paying more attention to this research area. The database itself unfortunately does not provide a starting point to beat and compete by teams from all around the world. Therefore we decided to create an automatic benchmark for results comparison. This benchmark was created for the scientific purposes while this database was published and initial results were done by our team. These results are considered as a fundamentals for this research area and they are shown in leader-board table. The detailed results and information how to attend the competition(benchmark) are in Results section.

Leaderboard

#	Team	Members	Affiliation	Method	Score	Attempts
1	BUT_AI	M. Rajnoha, A. Mezina, R. Burget	Brno University of Technology, Czech Rebublic	U-Net+GEU3	1.297	1

Citation

RAJNOHA, M.; MEZINA, A.; BURGET, R. Multi-Frame Labeled Faces Database: Towards Face Super-Resolution from Realistic Video Sequences. Applied Sciences - Basel, 2020, vol. 10, no. 20, p. 1-27. ISSN: 2076-3417.

Bibtex:

                
                    @article{rajnoha2020mlfdb,
                        title={Multi-Frame Labeled Faces Database: Towards Face Super-Resolution from Realistic Video Sequences},
                        author={Rajnoha, Martin and Mezina, Anzhelika and Burget, Radim},
                        journal={Applied Sciences},
                        volume={10},
                        number={20},
                        pages={7213},
                        year={2020},
                        publisher={Multidisciplinary Digital Publishing Institute}
                    }

About Database

How it was created

Terms of use

Disclaimer

Youtube

Licence

Download

Results

Leaderboard

Citation

References