A Cross-modal Multi-task Learning Framework for Image Annotation
作者 |
Liang Xie, Peng Pan, Yansheng Lu, Shixun Wang |
期刊 |
|
期刊名称:ACM New York, NY, USA |
出版日期:2014 |
所在页数:431-440 |
摘要 |
With the advance of internet, multi-modal data can be easily collected from many social websites such as Wikipedia, Flickr, YouTube, etc. Images shared on the web are usually associated with social tags or other textual information. Although existing multi-modal methods can make use of associated text to improve image annotation, the disadvantages of them are that associated text is also required for a new image to be predicted. In this paper, we propose the cross-modal multi-task learning (CMMTL) framework for image annotation. Labeled and unlabeled multi-modal data are both levaraged for training in CMMTL, and it finally obtains visual classifiers which can predict concepts for a single image without any associated information. CMMTL integrates graph learning, multi-task learning and cross-modal learning into a joint framework, where a shared subspace is learned to preserve both cross-modal correlation and concept correlation. The optimal solution of the proposed framework can be obtained by solving a generalized eigenvalue problem. We conduct comprehensive experiments on two real world image datasets: MIR Flickr and NUS-WIDE, to evaluate the performance of the proposed framework. Experimental results demonstrate that CMMTL obtains a significant improvement over several representative methods for cross-modal image annotation. |
关键词 |
Cross-modal learning; Image annotation; Multi-task learning; Semi-supervised learning |
|
|