blog




  • Essay / NON-NEGATIVE LIGHTLY JOINT MATRIX FACTORING FOR...

    3. SHARED SPACE LEARNING VIA LOOSELY JOINT NMF (LJNMF)Our joint spatial learning method is formulated within the framework of NMF. This section will present our adaptation of NMF for shared latent space mining. We call this approach non-negative matrix factorization or LJNMF. The purpose of using the NMF framework for image annotation is to explain the underlying latent factors that exist in a collection of images that created different objects in the images by representing the occurrence of these factors . for each image. In multimodal problems, different modalities come from the same collection and therefore we expect the truths that constitute the latent factors in these modalities to be almost similar. But each modality has different characteristics, so these factors are created differently. The similarity in the representation of the factors is interpreted as a similarity in the coefficient matrix and the difference in the way the factors are created implies different base matrices. Thus, forcing the cost function to find exactly the same matrix of coefficients for the two modalities is not reasonable and also involves restrictive links which do not allow the modalities to find the best factors and thus increase the approximation error . We will therefore factor the two modalities, such that the factor matrices were different for them. But in reality, they are poorly articulated and the similarity between the coefficient matrices is favored by the reduction of the distance between their factors. The objective function for loosely joint non-negative matrix factorization or LJNMF in general mode can be written as below. (9) where dist (H1, H2) is a metric to measure the distance between two coefficients.3.1. NotationIn our problem, there are two resources for data as two modalities. One is the visual information embedded in images and...... middle of paper ......ases on the Corel 5K.6 dataset. CONCLUSIONThe problem addressed in this paper is to build a multimodal automatic image annotation system that combines two data modalities: visual features extracted from images and textual terms collected from attached tags. This is done by extracting the latent factors that explain the patterns that make up the content of the images, into a unified, vaguely common space. The two modalities are factored simultaneously while considering that a relationship exists between them. We relaxed the constraint that the coefficient matrices of the two modalities are exactly the same and allowed the representation of latent factors with certain differences. This was implemented by minimizing the nonlinear distance between two coefficient matrices. The proposed LJNMF algorithm could achieve comparable performance to state-of-the-art works while having a lower feature vector dimension..