ANIMAL-10N Dataset

10 Classes of Animals in the Dataset

Summary of the ANIMAL-10N Dataset

Number of Training Images:	50,000	Attribute Characteristics:	Real	Missing Values:	No
Number of Testing Images:	5,000	Data Set Characteristics:	Multivariate	Date Created:	April 2019
Number of Image Labels:	10	Resolution:	64x64(RGB)	Area:	Animal

ANIMAL-10N dataset contains 5 pairs of confusing animals with a total of 55,000 images. The 5 pairs are as following: (cat, lynx), (jaguar, cheetah), (wolf, coyote), (chimpanzee, orangutan), (hamster, guinea pig).
The images are crawled from several online search engines including Bing and Google using the predifined labels as the search keyword. The images are then classified by 15 recruited participants(10 undergraduate & 5 graduate students); each participants annotated a total of 6,000 images with 600 images per class.
After removing irrelevant images, the training dataset contains 50,000 images and the test dataset contains 5,000 images. The noise rate(mislabeling ratio) of the dataset is about 8%. For more information, please refer to the paper.

ANIMAL-10N Dataset Description

Data Collection: To include human error in the image labeling process, we first defined five pairs of "confusing" animals: {(cat, lynx), (jaguar, cheetah), (wolf, coyote), (chimpanzee, orangutan), (hamster, guinea pig)}, where two animals in each pair look very similar. Then, we crawled 6,000 images for each of the ten animals on Google and Bing by using the animal name as a search keyword. Consequently, in total, 60,000 images were collected.

Data Labeling: For human labeling, we recruited 15 participants, which were composed of ten undergraduate and five graduate students, on the KAIST online community. They were educated for one hour about the characteristics of each animal before the labeling process, and each of them was asked to annotate 4,000 images with the animal names in a week, where an equal number (i.e., 400) of images were given from each animal. More specifically, we combined the images for a pair of animals into a single set and provided each participant with five sets; hence, a participant categorized 800 images as either of two animals five times. After the labeling process was complete, we paid about US $150 to each participant. Finally, excluding irrelevant images, the labels for 55,000 images were generated by the participants. Please note that these labels may involve human mistakes because we intentionally mixed confusing animals.

Data Organization: We randomly selected 5,000 images for the test set and used the remaining 50,000 images for the training set. Because the test set should be free from noisy labels, only the images whose label matches the search keyword were considered for the test set. Besides, the images are almost evenly distributed to the ten classes (or animals) in both the training and test sets, as shown in the table below.

Label	0: Cat	1: Lynx	2: Wolf	3: Coyote	4: Cheetah	5: jaguar	6: Chimpanzee	7: Orangutan	8: Hamster	9: Guinea pig
Number of Samples in Training	5466	4608	5091	4841	4981	4913	5322	4999	4970	4809
Number of Samples in Testing	557	485	423	410	509	524	620	557	440	475

Noise Rate Estimation by Accuracy: Because the ground-truth labels are unknown, we estimated the noise rate τ by the cross-validation with grid search. We trained DenseNet (L=25, k=12) using SELFIE on the 50, 000 training images and evaluated the performance on the 5, 000 testing images. We found the best noise rate τ = 0.08 from a grid noise rate τ ∈ [0.06, 0.13] when noise rate was incremented by 0.01. Therefore, we decided to set noise rate τ = 0.08 for ANIMAL-10N.

Noise Rate Estimation by Human Inspection: We also estimated the noise rate τ by human inspection to verify the result based on the grid search. To this end, we randomly sampled 6,000 images and acquired two more labels for each of these images in the same way. Meanwhile, human experts different from the 15 participants carefully examined the 6,000 images to get the ground-truth labels. Comparing the human labels and the ground-truth labels in the image below, the former in the legend represents the number of the votes for the true label, and the latter represents the number of the votes for the other label. Because three votes were ready for each image, for conservative estimation, the final human label was decided by majority. Thus, the two cases of 3:0 and 2:1 were regarded as correct labeling, and the other two cases of 1:2 and 0:3 were regarded as incorrect labeling. Overall, the proportion of incorrect human labels was 4.08 + 2.36 = 6.44% in the sample, and it is fairly close to τ = 0.08 obtained by the grid search.

Result with Realistic Noise: The table below summarizes the best test errors of the four training methods using the two architectures on ANIMAL-10N. In both architectures, SELFIE achieved the lowest test error. Specifically, SELFIE improved the absolute test error by up to 0.9pp using DenseNet (L=25, k=12) and 2.4pp using VGG-19. SELFIE maintained its dominance over other methods on realistic noise, though the performance gain was not that huge because of a light noise rate (i.e., 8%).

Method	DenseNet (L=25, k=12)	VGG-19
Default	17.9±0.02	20.6±0.14
ActiveBias	17.6±0.17	19.5±0.26
Coteaching(τ = 0.08)	17.5±0.17	19.8±0.13
SELFIE(τ = 0.08)	17.0±0.10	18.2±0.09

Data Format:


The binary version contains the files data_batch_1.bin, data_batch_2.bin, ..., as well as test_batch.bin. 
Each of these files is formatted as follows:
<id><label><depth x height x width>
...
<id><label><depth x height x width>

The reading procedure is similar to that of a popular CIFAR-10 tutorial.


# You can read our binary files as below:
ID_BYTES = 4
LABEL_BYTES = 4
RECORD_BYTES = ID_BYTES + LABEL_BYTES + width * height * depth
reader = tf.FixedLengthRecordReader(record_bytes=RECORD_BYTES)
file_name, value = reader.read(filename_queue)
byte_record = tf.decode_raw(value, tf.uint8)
image_id = tf.strided_slice(byte_record, [0], [ID_BYTES])
image_label = tf.strided_slice(byte_record, [ID_BYTES], [ID_BYTES + LABEL_BYTES])
array_image = tf.strided_slice(byte_record, [ID_BYTES + LABEL_BYTES], [RECORD_BYTES])
depth_major_image = tf.reshape(array_image, [depth, height, width])
record.image = tf.transpose(depth_major_image, [1, 2, 0])

For more information, please refer to our official GitHub page

Animal-10N Dataset

Summary of the ANIMAL-10N Dataset

Citation

ANIMAL-10N Dataset Description