推荐系统-DeepFM模型理论详解与实现

Posted by mudux on February 15, 2020

1. 介绍

 在CTR预估中,特征交互(feature interactions)对模型的性能有很大影响。在DeepFM模型出现之前,现有模型通常偏向低阶或者高阶交叉,或者需要特征工程。DeepFM paper提出了一个端到端的同时考虑低阶和高阶特征交互的模型DeepFM。这个模型结合了FM模型学习二阶交叉和DNN模型特征学习的优点。并和FNN,PNN,Wide&Deep模型做了对比。实验结果显示了DeepFM模型的effectiveness和efficiency。

2. DeepFM模型

2.1 数据集

 设数据集为。其中$x$是一个m-fields数据。$y\in{0,1}$表示标签,1为用户点击,0为不点击。$x$可能包括类别fields(如gender,location)和连续fields(如age),每一个类别field表示为一个one-hot编码向量,连续field用它本身表示或者离散化后的one-hot向量表示。所以$x=\left[x_{field_1},x_{field_2},\cdots,x_{field_j},\cdots, x_{field_m}\right]$是一个$d$维向量,其中$x_{field_j}$是第$j$个field的向量表示。

2.2 DeepFM

 所以想同时学习低阶和高阶特征交互,提出了DeepFM模型,模型结构如下图所示: DeepFM arch
 DeepFM模型由FM component和deep component两部分组成,这两部分共享输入。
 对特征$i$,标量$w_i$用来表示1阶重要性,隐向量$\rm V_i$表示它和其他特征的交互作用。把$\rm V_i$输入FM component获取二阶特征交互,输入deep component获取高阶特征交互。 所有参数包括$w_i$,$\rm V_i$和网络参数($W^{(l)},b^{(l)}$)按照下述模型进行联合训练:

 其中$\widehat y\in(0,1)$是预测的CTR,$y_{FM}$是FM component的输出,$y_{DNN}$是deep component的输出。

2.3 FM component

FM component
 如上图所示,FM component的输出是Addition单元和Inner Product单元的和。

 其中$w\in R^{d},V_i\in R^k$,$k$是隐向量维度。Addition单元($\left<w,x\right>$)表示特征一阶重要性,Inner Product单元反映特征二阶交互。

2.4 Deep Component

Deep component
 deep component就是一个前向神经网络,用来学习高阶特征交互。但CTR数据集通常是高度稀疏的,所以在送入隐含层前需要一个embedding layer把输入向量压缩到低维稠密实值向量,否则可能过拟合。 embedding
 上图显示了embedding层的结构。这里有两点需要注意:

  • 尽管不同输入field向量的长度不一样,但embedding长度都是k;
  • FM中隐向量V就相当于网络权重,用来把输入field向量压缩到embedding向量。

 设embedding层的输出向量为:

 其中$e_i$是第i个field的embedding,$m$是field数量。然后把$a^{(0)}$输入DNN,

最终输出为

 其中, 是隐含层数量。

 注意在DeepFM模型中FM component和deep component共享相同的特征embedding,这有两个优点:

  • 从raw feature中学习低阶和高阶特征交互;
  • 无需做特征工程。

2.5 与其它NN关系

comparasion FNN
 FNN(见上图左)是一个用FM初始化的DNN。FNN的预训练有两个缺点:

  • embedding参数很受FM影响;
  • 预训练影响efficiency。
  • FNN仅考虑高阶特征交互。

 相比之下,DeepFM不需要预训练并且同时学习低阶和高阶特征交互。

PNN
 为了学习高阶特征交互,PNN(见上图中)在embedding层和第一个隐层之间添加了一个product层。按照乘积操作的不同,基于內积的IPNN,基于外积的OPNN,基于內积和外积的PNN*。和FNN一样,PNN也仅考虑高阶特征交互。

Wide & Deep
 Wide & Deep(见上图右)同时学习低阶和高阶特征交互。如图所示,wide部分输入需要特征工程。相比之下,DeepFM不需要做特征工程,直接从raw feature学习(可以用FM替换LR部分,但DeepFM的FM部分和deep部分共享特征embedding)。

 几个模型对比如下表所示: compTab
 总的来说,DeepFM不需要预训练,不需要做特征工程,同时学习低阶和高阶特征交互

3.DeepFM tensorflow实现

 参考Chenglong Chen-tensorflow DeepFM代码。
 这个函数用来初始化网络参数,这里的embeddings是生成feature_size个长度为embedding_size的隐向量。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
    def _initialize_weights(self):
        weights = dict()

        # embedding
        weights['feature_embeddings'] = tf.Variable(
            tf.random_normal([self.feature_size, self.embedding_size], 0.0, 0.01),
            name='feature_embeddings')  # feature_size * K
        weights['feature_bias'] = tf.Variable(
            tf.random_uniform([self.feature_size, 1], 0.0, 1.0))

        # deep layers
        num_layer = len(self.deep_layers)
        input_size = self.field_size * self.embedding_size
        glorot = np.sqrt(2.0 / (input_size + self.deep_layers[0]))
        weights['layer_0'] = tf.Variable(
            np.random.normal(loc=0, scale=glorot, size=(input_size, self.deep_layers[0])), dtype=np.float32)
        weights['bias_0'] = tf.Variable(np.random.normal(loc=0, scale=glorot, size=(1, self.deep_layers[0])),
                                        dtype=np.float32)
        for i in range(1, num_layer):
            glorot = np.sqrt(2.0 / (self.deep_layers[i-1] + self.deep_layers[i]))
            weights['layer_%d' % i] = tf.Variable(
                np.random.normal(loc=0, scale=glorot, size=(self.deep_layers[i-1], self.deep_layers[i])),
                dtype=np.float32)   # layers[i-1] * layers[i]
            weights['bias_%d' % i] = tf.Variable(
                np.random.normal(loc=0, scale=glorot, size=(1, self.deep_layers[i])),
                dtype=np.float32)   # 1 * layers[i]

        # final concat projection layer
        if self.use_fm and self.use_deep:
            input_size = self.field_size + self.embedding_size + self.deep_layers[-1]
        elif self.use_fm:
            input_size = self.field_size + self.deep_layers[-1]
        elif self.use_deep:
            input_size = self.deep_layers[-1]

        glorot = np.sqrt(2.0 / (input_size + 1))
        weights['concat_projection'] = tf.Variable(
            np.random.normal(loc=0, scale=glorot, size=(input_size, 1)),
            dtype=np.float32)
        weights['concat_bias'] = tf.Variable(tf.constant(0.01), dtype=np.float32)

        return weights

 这个代码块用来构造网络结构,这里按照feat_index寻找feature embedding,feat_index其实相当于所有离散特征经过one-hot之后的索引。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
def _init_graph(self):
  self.graph = tf.Graph()
  with self.graph.as_default():
      tf.set_random_seed(self.random_seed)
      self.feat_index = tf.placeholder(tf.int32, shape=[None, None],
                                        name='feat_index')
      self.feat_value = tf.placeholder(tf.float32, shape=[None, None],
                                        name='feat_value')
      self.label = tf.placeholder(tf.float32, shape=[None, 1], name='label')
      self.dropout_keep_fm = tf.placeholder(tf.float32, shape=[None], name='dropout_keep_fm')
      self.dropout_keep_deep = tf.placeholder(tf.float32, shape=[None], name='dropout_keep_deep')
      self.train_phase = tf.placeholder(tf.bool, name='train_phase')
      
      self.weights = self._initialize_weights()

      # model
      self.embeddings = tf.nn.embedding_lookup(self.weights['feature_embeddings'],
                                                self.feat_index)   # None * F * K
      feat_value = tf.reshape(self.feat_value, shape=[-1, self.field_size, 1])
      self.embeddings = tf.multiply(self.embeddings, feat_value)

      # ------------ first order term -----------
      self.y_first_order = tf.nn.embedding_lookup(self.weights['feature_bias'],
                                                  self.feat_index)    # None * F * 1
      self.y_first_order = tf.reduce_sum(tf.multiply(self.y_first_order, feat_value), 2) # None * F
      self.y_first_order = tf.nn.dropout(self.y_first_order, keep_prob=self.dropout_keep_fm[0]) # None * F

      # ------------ second order term ------------
      # sum_square part
      self.summed_features_emb = tf.reduce_sum(self.embeddings, 1)    # None * K
      self.summed_features_emb_square = tf.square(self.summed_features_emb)   # None * K

      # square_sum part
      self.squared_features_emb = tf.square(self.embeddings)
      self.squared_sum_features_emb = tf.reduce_sum(self.squared_features_emb, 1)  # None * K

      # second order
      self.y_second_order = 0.5 * tf.substract(self.summed_features_emb_square, self.squared_sum_features_emb)  # None * K
      self.y_second_order = tf.nn.dropout(self.y_second_order, keep_prob=self.dropout_keep_fm[1]) # None * K

      # --------------- Deep component --------------
      self.y_deep = tf.reshape(self.embeddings, shape=[-1, self.field_size * self.embedding_size])    # None * (F * K)
      self.y_deep = tf.nn.dropout(self.y_deep, keep_prob=self.dropout_keep_deep[0])
      for i in range(0, len(self.deep_layers)):
          self.y_deep = tf.add(tf.matmul(self.y_deep, self.weights['layer_%d' % i]), self.weights['bias_%d'%i])
          self.y_deep = self.deep_layers_activation(self.y_deep)
          self.y_deep = tf.nn.dropout(self.y_deep, keep_prob=self.dropout_keep_deep[1+i])

      # --------------- DeepFM ---------------
      if self.use_fm and self.use_deep:
          concat_input = tf.concat([self.y_first_order, self.y_second_order, self.y_deep], axis=1)
      elif self.use_fm:
          concat_input = tf.concat([self.y_first_order, self.y_second_order], axis=1)
      elif self.use_deep:
          concat_input = self.y_deep
      self.out = tf.add(tf.matmul(concat_input, self.weights['concat_projection']), self.weights['concat_bias'])

      # loss
      if self.loss_type == 'logloss':
          self.out = tf.nn.sigmoid(self.out)
          self.loss = tf.losses.log_loss(self.label, self.out)
      elif self.loss_type == 'mse':
          self.loss = tf.nn.l2_loss(tf.subtract(self.label, self.out))
      # l2 regularization on weights
      if self.l2_reg > 0:
          self.loss += tf.contrib.layers.l2_regularizer(self.l2_reg)(
              self.weights['concat_projection'])
          if self.use_deep:
              for i in range(len(self.deep_layers)):
                  self.loss += tf.contrib.layers.l2_regularizer(self.l2_reg)(
                      self.weights['layer_%d'%i])

4. Reference

1. DeepFM-paper
2. DeepFM算法解析及Python实现
3. DeepFM-tensorflow code-github