This project seeks to investigate the hypothesis that some part of the consistent underperformance across vision models for object classification in pre-deployment testing is due to either mislabelling of the data, or so highly anomalous examples of technically correctly labelled objects that they might reasonably be labelled differently. This hypothesis arises from image datasets being too large to manually validate every image. It is proposed that, if various unsupervised anomaly detection algorithms were made to rank images by how anomalous they are of an example of what they are labelled as, the highest ranked could be reviewed by researchers to assess how to improve the quality of the dataset, such as by relabelling, or the method by which the model is assessed on predicting image labels.
The data collection originally proposed, that is a comparison of the images ranked most anomalous by various algorithms when encoded as RGD versus RGB, was not conducted. However, the preliminary testing with the VKITTI2 padded car images revealed that not all models were suitable for creating such a ranking, which was not anticipated.
It may be of benefit to, in future works, precede such ranking by a similar validation of the models’ anomaly score outputs as was conducted here.
Please sign in
If you are a registered user on Laidlaw Scholars Network, please sign in