Skip to the content.

1. Abstract

Cross-Domain Recommendation (CDR) is an effective approach for mitigating the cold-start problem. However, previous research largely overlooks the issue of fairness: undesired bias associated with sensitive attributes in CDR scenarios. Specifically, through our data-driven experiments, we find that existing CDR-based methods are particularly vulnerable to the distribution bias in sparse cross-domain data, and such bias easily results in unfair treatment of cold-start users in the target domain, eventually exacerbating the recommendation feedback loop. Additionally, existing fairness-aware recommendation algorithms mainly focus on scenarios where unfairness arises during the ‘‘warm-start’’ phase. However, there is a lack of research dedicated to addressing user unfairness during the ‘‘cold-start’’ phase. To tackle the above problems, in this paper, we propose a novel fairness-aware algorithm for CDR, called FairCDR. Specifically, FairCDR granularly models the individual influence of each sample on both fairness and accuracy to facilitate data reweighting, effectively striking a balance between fairness and recommendation utility. To further mitigate the issues of data sparsity and bias, we design an interaction-augmented mapping function. In particular, FairCDR generates proxy users for warm-start users in the target domain who do not overlap with the source domain. This enables the recommender model to utilize the rich knowledge of well-represented users to learn the high-quality mapping function for cold-start users. Extensive experiments are conducted to demonstrate the effectiveness of our proposed method.

2. Contributions

3. Dataset Information

Table 1: Statistics of the datasets used in our experiments.

We process raw data for Tenrec-QB dataset based on the following strategy:

We process raw data for Tenrec-QK dataset based on the following strategy:

4. DownLoad

Datasets and Code

5. Main Results

Table 2: Overall comparison between the baselines and our models.

6. Code Description

7. Usage

  1. Download the code and datasets.
  2. Process the raw data. You can change the beta and other parameters in the “data-code/QB_process_data.py” or “data-code/QK_process_data.py”. Note that we will generate a distinct id for every model so that we can reuse the model.
      python data-code/QB_process_data.py
      python data-code/QK_process_data.py
  1. Pretrain the user and item embeddings for the source domain and the target domain, respectively.
      python model-code/pretrain/quick_start.py --model MF --learning_rate 0.01 --embedding_size 128 --weight_decay 0.000001 --optimizer_type adam --init normal --data_type 'QB' --data_domain 'source' --beta 0.3 --use_tensorboard --use_gpu --gpu_id 0 --log --saved

      python model-code/pretrain/quick_start.py --model MF --learning_rate 0.01 --embedding_size 128 --weight_decay 0.000001 --optimizer_type adam --init normal --data_type 'QB' --data_domain 'target' --beta 0.3 --use_tensorboard --use_gpu --gpu_id 0 --log --saved
  1. Train the FairCDR without weights to init the model.
      python model-code/faircdr/quick_start.py --learning_rate 0.01 -bs 128 --simK 3 --gamma 1.0 --tau 1.0 --lamb 0.5 --alpha 1.0 --weight_decay 0.000001 --optimizer_type adam --init normal --data_type 'QB' --beta 0.3 --use_tensorboard --use_gpu --gpu_id 0 --log --saved --source_id your_source_id --target_id your_target_id --pretrain_model MF 
  1. Calculate the influence scores for CDR data with model trained in 4.
      python model-code/calculate-weight/quick_start.py --data_type 'QB' --beta 0.3 --damp 0.01 --scale 25 --batch_size 128 --recursion_depth 1000 --faircdr_id your_faircdr_id --use_gpu --gpu_id 0 --pretrain_model MF
  1. Retrain the FairCDR with IF-based reweights
      python model-code/faircdr/quick_start.py --learning_rate 0.01 -bs 128 --simK 3 --gamma 1.0 --tau 1.0 --lamb 0.5 --alpha 1.0 --weight_decay 0.000001 --optimizer_type adam --init normal --data_type 'QB' --beta 0.3 --use_tensorboard --use_gpu --gpu_id 0 --log --saved --source_id your_source_id --target_id your_target_id --pretrain_model MF --weight_method IF --damp 0.01 --scale 25 --batch_size 128 --recursion_depth 1000

8. Detailed parameter search ranges

We fixed the embedding size as 128. The batch sizes are set 128. The search ranges of hyper-parameters are listed as follows.

Hyper Parameters Tuning range
weight_decay {0,1e-6,1e-5,1e-4}
learning_rate {0.00001,0.0001,0.001,0.01,0.1}
lamb {0.1,0.3,0.5,0.7,0.9}
optimizer_type {‘adam’, ‘sgd’}
init {‘normal’, ‘uniform’}
simK {1,3,5,10,15,20}
gamma {25,10,5,1,0.5,0.1,0.01}
tau {25,10,5,1,0.5,0.1,0.01}
activation {‘relu’,’tanh’}

9. Runtime Environment