IEEE Trans Pattern Anal Mach Intell
October 2024
The design of neural networks typically involves trial-and-error, a time-consuming process for obtaining an optimal architecture, even for experienced researchers. Additionally, it is widely accepted that loss functions of deep neural networks are generally non-convex with respect to the parameters to be optimised. We propose the Layer-wise Convex Theorem to ensure that the loss is convex with respect to the parameters of a given layer, achieved by constraining each layer to be an overdetermined system of non-linear equations.
View Article and Find Full Text PDFMutual knowledge distillation (MKD) is a technique used to transfer knowledge between multiple models in a collaborative manner. However, it is important to note that not all knowledge is accurate or reliable, particularly under challenging conditions such as label noise, which can lead to models that memorize undesired information. This problem can be addressed by improving the reliability of the knowledge source, as well as selectively selecting reliable knowledge for distillation.
View Article and Find Full Text PDFThe objective of deep metric learning (DML) is to learn embeddings that can capture semantic similarity and dissimilarity information among data points. Existing pairwise or tripletwise loss functions used in DML are known to suffer from slow convergence due to a large proportion of trivial pairs or triplets as the model improves. To improve this, ranking-motivated structured losses are proposed recently to incorporate multiple examples and exploit the structured information among them.
View Article and Find Full Text PDF