Multimodal Learning

Noisy Correspondence

Noisy correspondence refers to inherently irrelevant or relevant samples that are wrongly regarded as associated (a.k.a, false positive) or unassociated (a.k.a, false negative), which is first formally revealed and studied by XLearning Group (CVPR'21, NeurIPS'21). Unlike traditional noisy label learning, which primarily addresses incorrect annotations, NC shifts attention to incorrect correspondences between paired samples.

Considering that many tasks require paired data as input, customizing task-specific methods against noisy correspondence has emerged as a promising direction across numerous applications, including but not limited to vision-language pre-training (TPAMI'24), retrieval (TIP'24), ReID (CVPR'22, IJCV'24), dialogue systems (AAAI'23), graph matching (ICCV'23), multimodal knowledge graphs (arxiv'26), video reasoning (ICLR'24), multi-view clustering (NeurIPS'24) etc. For more details, please refer to our repository Noisy Correspondence Summary.

View Research →
Virtual Cell

AI4LifeScience

XLearning Group is dedicated to addressing core interdisciplinary challenges and developing tailored solutions. Our current research efforts focus on AI4LifeScience (Nat. Comm.'23, Nat. Comm.'25) and the exploration of Virtual Cell models to advance the understanding of biological systems.

We warmly welcome collaborations and discussions with teams across diverse disciplines.

View Research →
Deep Clustering

Clustering

Clustering is a classic and fundamental problem in machine learning, focused on partitioning instances into distinct clusters based on their inherent semantics in an unsupervised manner, which is closely intertwined with unsupervised representation learning, as both seek to uncover latent structures in data. Clustering serves as a cornerstone for various real-world applications, including anomaly detection, community discovery, and bioinformatics, etc.

XLearning group proposed one of the first deep clustering methods (arxiv'15, IJCAI'16), which equips clustering algorithms with powerful discrimination ability to handle complex real-world data. In 2021, we proposed the contrastive clustering framework (AAAI'21, IJCV'22), which elegantly unifies representation learning and clustering by performing contrastive learning in the row and column spaces, respectively. Motivated by the classic k-means clustering algorithm, we designed the first interpretable unsupervised neural network, which is intrinsically explainable and transparent (JMLR'22). Recently, by looking back on the development of the clustering community, we wrote a survey that summarizes deep clustering methods from the prior perspective (Vicinagearth'24). Correspondingly, we propose a new externally guided clustering paradigm (ICML'24), seeking abundant yet regrettably overlooked external knowledge as priors to facilitate clustering.

View Research →
Computer Vision

Blind All-in-one Restoration

Blind All-in-one Restoration (BAR) is an emerging NEW research DIRECTION of image and video restoration. BAR aims to address multiple unknown types of degradations in a unified framework, rather than handling each known degradation separately as in traditional approaches.

XLearning group devotes to pushing image and video restoration towards more general application scenarios. Specifically, we have introduced several pioneering solutions: one of the first blind all-in-one image restoration network (AirNet, CVPR 2022), the first open-set image restoration method (TAO, ICML 2024), and the first blind all-in-one video restoration for time-varying degradations (AverNet, NeurIPS 2024).

View Research →