Publications | Chaoqi Chen (陈超奇)

NeurIPS'24

Reconstruct and Match: Out-of-Distribution Robustness via Topological Homogeneity

Chaoqi Chen Luyao Tang, Hui Huang.

Advances in Neural Information Processing Systems (NeurIPS), 2024

Abs Bib PDF

Since deep learning models are usually deployed in non-stationary environments, it is imperative to improve their robustness to out-of-distribution (OOD) data. A com mon approach to mitigate distribution shift is to regularize internal representations or predictors learned from in-distribution (ID) data to be domain invariant. Past studies have primarily learned pairwise invariances, ignoring the intrinsic structure and high-order dependencies of the data. Unlike machines, humans recognize objects by first dividing them into major components and then identifying the topological relation of these components. Motivated by this, we propose Recon struct and Match (REMA), a general learning framework for object recognition tasks to endow deep models with the capability of capturing the topological ho mogeneity of objects without human prior knowledge or fine-grained annotations. To identify major components from objects, REMA introduces a selective slot based reconstruction module to dynamically map dense pixels into a sparse and discrete set of slot vectors in an unsupervised manner. Then, to model high-order dependencies among these components, we propose a hypergraph-based relational reasoning module that models the intricate relations of nodes (slots) with structural constraints. Experiments on standard benchmarks show that REMA outperforms state-of-the-art methods in OOD generalization and test-time adaptation settings.
NeurIPS'23

CODA: Generalizing to Open and Unseen Domains with Compaction and Disambiguation

Chaoqi Chen Luyao Tang, Yue Huang, Xiaoguang Han, Yizhou Yu.

Advances in Neural Information Processing Systems (NeurIPS), 2023

Abs Bib PDF

The generalization capability of machine learning systems degenerates notably when the test distribution drifts from the training distribution. Recently, Domain Generalization (DG) has been gaining momentum in enabling machine learning models to generalize to unseen domains. However, most DG methods assume that training and test data share an identical label space, ignoring the potential unseen categories in many real-world applications. In this paper, we delve into a more gen eral but difficult problem termed Open Test-Time DG (OTDG), where both domain shift and open class may occur on the unseen test data. We propose Compaction and Disambiguation (CODA), a novel two-stage framework for learning compact representations and adapting to open classes in the wild. To meaningfully regu larize the model’s decision boundary, CODA introduces virtual unknown classes and optimizes a new training objective to insert unknowns into the latent space by compacting the embedding space of source known classes. To adapt target samples to the source model, we then disambiguate the decision boundaries between known and unknown classes with a test-time training objective, mitigating the adaptivity gap and catastrophic forgetting challenges. Experiments reveal that CODA can significantly outperform the previous best method on standard DG datasets and harmonize the classification accuracy between known and unknown classes.
ICCV'23

Activate and Reject: Towards Safe Domain Generalization under Category Shift

Chaoqi Chen, Luyao Tang, Leitian Tao, Hong-Yu Zhou, Yue Huang, Xiaoguang Han, Yizhou Yu.

IEEE International Conference on Computer Vision (ICCV), 2023

Abs Bib PDF

Albeit the notable performance on in-domain test points, it is non-trivial for deep neural networks to attain satis factory accuracy when deploying in the open world, where novel domains and object classes often occur. In this pa per, we study a practical problem of Domain Generaliza tion under Category Shift (DGCS), which aims to simulta neously detect unknown-class samples and classify known class samples in the target domains. Compared to prior DG works, we face twonewchallenges: 1) howtolearn the con cept of “unknown” during training with only source known class samples, and 2) how to adapt the source-trained model to unseen environments for safe model deployment. To this end, we propose a novel Activate and Reject (ART) framework to reshape the model’s decision boundary to ac commodate unknown classes and conduct post hoc modifi cation to further discriminate known and unknown classes using unlabeled test data. Specifically, during training, we promote the response to the unknown by optimizing the un known probability and then smoothing the overall output to mitigate the overconfidence issue. At test time, we in troduce a step-wise online adaptation method that predicts the label by virtue of the cross-domain nearest neighbor and class prototype information without updating the net work’s parameters or using threshold-based mechanisms. Experiments reveal that ART consistently improves the gen eralization capability of deep networks on different vision tasks. For image classification, ART improves the H-score by 6.1% on average compared to the previous best method. For object detection and semantic segmentation, we estab lish new benchmarks and achieve competitive performance.
NeurIPS'22

Mix and Reason: Reasoning over Semantic Topology with Data Mixing for Domain Generalization

Chaoqi Chen, Luyao Tang, Leitian Tao, Hong-Yu Zhou, Yue Huang, Xiaoguang Han, Yizhou Yu.

Advances in Neural Information Processing Systems (NeurIPS) , 2022

Abs Bib PDF

Domaingeneralization (DG) enables generalizing a learning machine from multiple seen source domains to an unseen target one. The general objective of DG methods is to learn semantic representations that are independent of domain labels, which is theoretically sound but empirically challenged due to the complex mixture of common and domain-specific factors. Although disentangling the representations into two disjoint parts has been gaining momentum in DG, the strong presump tion over the data limits its efficacy in many real-world scenarios. In this paper, we propose Mix and Reason (MiRe), a new DG framework that learns semantic representations via enforcing the structural invariance of semantic topology. MiRe consists of two key components, namely, Category-aware Data Mixing (CDM) and Adaptive Semantic Topology Refinement (ASTR). CDM mixes two images from different domains in virtue of activation maps generated by two complementary classification losses, making the classifier focus on the representations of semantic objects. ASTR introduces relation graphs to represent semantic topology, which is progressively refined via the interactions between local feature aggregation and global cross-domain relational reasoning. Experiments on multiple DG benchmarks validate the effectiveness and robustness of the proposed MiRe.
TPAMI'22

Relation Matters: Foreground-Aware Graph-Based Relational Reasoning for Domain Adaptive Object Detection

Chaoqi Chen, Jiongcheng Li, Hong-Yu Zhou, Xiaoguang Han, Yue Huang, Xinghao Ding, Yizhou Yu.

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) , 2022

Abs Bib PDF

Domain Adaptive Object Detection (DAOD) focuses on improving the generalization ability of object detectors via knowledge transfer. Recent advances in DAOD strive to change the emphasis of the adaptation process from global to local in virtue of fine-grained feature alignment methods. However, both the global and local alignment approaches fail to capture the topological relations among different foreground objects as the explicit dependencies and interactions between and within domains are neglected. In this case, only seeking one-vs-one alignment does not necessarily ensure the precise knowledge transfer. Moreover, conventional alignment-based approaches may be vulnerable to catastrophic overfitting regarding those less transferable regions (e.g., backgrounds) due to the accumulation of inaccurate localization results in the target domain. To remedy these issues, we first formulate DAOD as an open-set domain adaptation problem, in which the foregrounds and backgrounds are seen as the “known classes” and “unknown class” respectively. Accordingly, we propose a new and general framework for DAOD, named Foreground-aware Graph-based Relational Reasoning (FGRR), which incorporates graph structures into the detection pipeline to explicitly model the intra- and inter-domain foreground object relations on both pixel and semantic spaces, thereby endowing the DAOD model with the capability of relational reasoning beyond the popular alignment-based paradigm. FGRR first identifies the foreground pixels and regions by searching reliable correspondence and cross-domain similarity regularization respectively. The inter-domain visual and semantic correlations are hierarchically modeled via bipartite graph structures, and the intra-domain relations are encoded via graph attention mechanisms. Through message-passing, each node aggregates semantic and contextual information from the same and opposite domain to substantially enhance its expressive power. Empirical results demonstrate that the proposed FGRR exceeds the state-of-the-art performance on four DAOD benchmarks.
CVPR'22

Compound Domain Generalization via Meta-Knowledge Encoding

Chaoqi Chen, Jiongcheng Li, Xiaoguang Han, Xiaoqing Liu, Yizhou Yu.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2022

Abs Bib PDF

Domain generalization (DG) aims to improve the generalization performance for an unseen target domain by using the knowledge of multiple seen source domains. Mainstream DG methods typically assume that the domain label of each source sample is known a priori, which is challenged to be satisfied in many real-world applications. In this paper, we study a practical problem of compound DG, which relaxes the discrete domain assumption to the mixed source domains setting. On the other hand, current DG algorithms prioritize the focus on semantic invariance across domains (one-vs-one), while paying less attention to the holistic semantic structure (many-vs-many). Such holistic semantic structure, referred to as meta-knowledge here, is crucial for learning generalizable representations. To this end, we present Compound Domain Generalization via Meta-Knowledge Encoding (COMEN), a general approach to automatically discover and model latent domains in two steps. Firstly, we introduce Style-induced Domain-specific Normalization (SDNorm) to re-normalize the multi-modal underlying distributions, thereby dividing the mixture of source domains into latent clusters. Secondly, we harness the prototype representations, the centroids of classes, to perform relational modeling in the embedding space with two parallel and complementary modules, which explicitly encode the semantic structure for the out-of-distribution generalization. Experiments on four standard DG benchmarks reveal that COMEN exceeds the state-of-the-art performance without the need of domain supervision.
ICCV'21

Dual Bipartite Graph Learning: A General Approach for Domain Adaptive Object Detection

Chaoqi Chen, Jiongcheng Li, Zebiao Zheng, Xinghao Ding, Yue Huang, Yizhou Yu.

IEEE International Conference on Computer Vision (ICCV) , 2021

Abs Bib PDF

Domain Adaptive Object Detection (DAOD) relieves the reliance on large-scale annotated data by transferring the knowledge learned from a labeled source domain to a new unlabeled target domain. Recent DAOD approaches resort to local feature alignment in virtue of domain adversarial training in conjunction with the ad-hoc detection pipelines to achieve feature adaptation. However, these methods are limited to adapt the specific types of object detectors and do not explore the cross-domain topological relations. In this paper, we first formulate DAOD as an open-set domain adaptation problem in which foregrounds (pixel or region) can be seen as the “known class”, while backgrounds (pixel or region) are referred to as the “unknown class”. To this end, we present a new and general perspective for DAOD named Dual Bipartite Graph Learning (DBGL), which cap tures the cross-domain interactions on both pixel-level and semantic-level via increasing the distinction between fore grounds and backgrounds and modeling the cross-domain dependencies among different semantic categories. Exper iments reveal that the proposed DBGL in conjunction with one-stage and two-stage detectors exceeds the state-of-the art performance on standard DAOD benchmarks.
CVPR'21

I3Net: Implicit Instance-Invariant Network for Adapting One-Stage Object Detectors

Chaoqi Chen, Zebiao Zheng, Xinghao Ding, Yue Huang, Yizhou Yu.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2021

Abs Bib PDF

Recent works on two-stage cross-domain detection have widely explored the local feature patterns to achieve more accurate adaptation results. These methods heavily rely on the region proposal mechanisms and ROI-based instance level features to design fine-grained feature alignment modules with respect to the foreground objects. How ever, for one-stage detectors, it is hard or even impossi ble to obtain explicit instance-level features in the detec tion pipelines. Motivated by this, we propose an Implicit Instance-Invariant Network (I3Net), which is tailored for adapting one-stage detectors and implicitly learns instance invariant features via exploiting the natural characteristics of deep features in different layers. Specifically, we facil itate the adaptation from three aspects: (1) Dynamic and Class-Balanced Reweighting (DCBR) strategy, which con siders the coexistence of intra-domain and intra-class vari ations to assign larger weights to those sample-scarce cate gories and easy-to-adapt samples; (2) Category-aware Ob ject Pattern Matching (COPM) module, which boosts the cross-domain foreground objects matching guided by the categorical information and suppresses the uninformative background features; (3) Regularized Joint Category Align ment (RJCA) module, which jointly enforces the category alignment at different domain-specific layers with a consis tency regularization. Experiments reveal that I3Net exceeds the state-of-the-art performance on benchmark datasets.
CVPR'20

Harmonizing Transferability and Discriminability for Adapting Object Detectors

Chaoqi Chen, Zebiao Zheng, Xinghao Ding, Yue Huang, Qi Dou.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2020

Abs Bib PDF

Recent advances in adaptive object detection have achieved compelling results in virtue of adversarial feature adaptation to mitigate the distributional shifts along the de tection pipeline. Whilst adversarial adaptation significant ly enhances the transferability of feature representations, the feature discriminability of object detectors remains less investigated. Moreover, transferability and discriminabili ty may come at a contradiction in adversarial adaptation given the complex combinations of objects and the differ entiated scene layouts between domains. In this paper, we propose a Hierarchical Transferability Calibration Network (HTCN) that hierarchically (local-region/image/instance) calibrates the transferability of feature representations for harmonizing transferability and discriminability. The pro posed model consists of three components: (1) Impor tance Weighted Adversarial Training with input Interpola tion (IWAT-I), which strengthens the global discriminability by re-weighting the interpolated image-level features; (2) Context-aware Instance-Level Alignment (CILA) module, which enhances the local discriminability by capturing the underlying complementary effect between the instance-level feature and the global context information for the instance level feature alignment; (3) local feature masks that cali brate the local transferability to provide semantic guidance for the following discriminative pattern alignment. Exper imental results show that HTCN significantly outperforms the state-of-the-art methods on benchmark datasets.
CVPR'19

Progressive Feature Alignment for Unsupervised Domain Adaptation

Chaoqi Chen, Weiping Xie, Wenbing Huang, Yu Rong, Xinghao Ding, Yue Huang, Tingyang Xu, Junzhou Huang.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2019

Abs Bib PDF

Unsupervised domain adaptation (UDA) transfers knowledge from a label-rich source domain to a fully unlabeled target domain. To tackle this task, recent ap proaches resort to discriminative domain transfer in virtue of pseudo-labels to enforce the class-level distribution alignment across the source and target domains. These methods, however, are vulnerable to the error accumula tion and thus incapable of preserving cross-domain catego ry consistency, as the pseudo-labeling accuracy is not guar anteed explicitly. In this paper, we propose the Progressive Feature Alignment Network (PFAN) to align the discrimina tive features across domains progressively and effectively, via exploiting the intra-class variation in the target domain. To be specific, we first develop an Easy-to-Hard Transfer S trategy (EHTS) and an Adaptive Prototype Alignment (APA) step to train our model iteratively and alternatively. More over, upon observing that a good domain adaptation usu ally requires a non-saturated source classifier, we consider a simple yet efficient way to retard the convergence speed of the source classification loss by further involving a tem perature variate into the soft-max function. The extensive experimental results reveal that the proposed PFAN exceed s the state-of-the-art performance on three UDA datasets.