推荐系统-FFM模型理论详解与实现

Posted by mudux on February 11, 2020

1. 应用背景

 在CTR预估中,经常会遇到one-hot类型的变量,one-hot类型变量会导致严重的数据特征稀疏的情况,为了解决这一问题,在上一讲中,我们介绍了FM算法。这一讲我们介绍一种在FM基础上发展出来的算法-FFM(Field-aware Factorization Machine)。

2. 介绍

 在点击率预测中logistic回归广泛使用。假设数据集是$D={(x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),…,(x^{(m)},y^{(m)})}$。模型参数$w$通过优化下面公式求解:

 其中:

 这里采用了一个人造数据集:

  Publisher Advertiser
+80 -20 ESPN Nike
+10 -90 ESPN Gucci
+0 -1 ESPN Adidas
+15 -85 Vogue Nike
+90 -10 Vogue Gucci
+10 -90 Vogue Adidas
+85 -15 NBC Nike
+0 -0 NBC Gucci
+90 -10 NBC Adidas

注:+(-)表示点击(没点击)

 如果模型能够学习特征交互作用,应该能提高模型表现。如上述数据集中,来自Gucci的广告在Vogue数据集上应该有高的CTR。但是LR模型只有单一特征作用。为了解决这个问题,常见的方法有二次多项式模型(Poly2),FM模型,这些之前都介绍过。

 在上述数据集中,(ESPN, Adidas)pair只有负训练数据。对Poly2模型,$w_{ESPN, Adidas}$可能学习到一个很大的负数。对FM模型,(ESPN, Adidas)由$w_{ESPN} \cdot w_{Adidas}$决定,因为$w_{ESPN}$和$w_{Adidas}$学习包括其他pair如(ESPN, Nike),(NBC, Adidas),所以FM模型更准确。

3. FFM模型

 在CTR中,特征能够按域(field)分组。如上述数据集,ESPN,Vogue和NBC属于Publiser域,Nike,Gucci和Adidas属于Advertiser域。FFM相比于FM就是利用了这一信息,所以叫做Field-aware Factorization Machines。
 FFM模型和PITF模型有关,那里只有User, Item和Tag三个field,这里就不做介绍了。
 这里以下面样本为例:

Clicked Publiser(P) Advertiser(A) Gender(G)
Yes ESPN Nike Male

在FM中,$\phi_{FM}(w, x)$是

 在FM中,每个特征仅对应一个隐向量。以ESPN为例,$w_{ESPN}$对Nike($w_{ESPN} \cdot w_{Nike}$)和Male($w_{ESPN} \cdot w_{Male}$)都有latent effect。但是Nike和Male属于不同的field,latent effect (ESPN, Nike)和(ESPN, Male)可能不一样。
 所以在FFM中,每个特征有几个对应的隐向量(latent vector)。按照其余特征的域来选择其中对应的隐向量来做点积。如$\phi_{FFM}(w,x)$表示为:

 为了学习(ESPN, NIKE)的latent effect,采用$w_{ESPN, A}$因为Nike对应的field是Advertiser,采用$w_{Nike, P}$因为ESPN对应的域是Publiser。同时,为了学习(ESPN, Male),采用$w_{ESPN, G}$因为Male对应的field是Gender,采用$w_{Male, P}$因为ESPN对应的field是Publiser。数学表示为:

 其中,$f_1$和$f_2$是$j_1$和$j_2$对应的field。设fields数目为$f$,FFM的参数总共有$nfk$个,计算复杂度为$O(n_2k)$。对比如下:

  #variables complexity
LM $n$ O($n$)
Poly2 $B$ O($n^2$)
FM $nk$ O($nk$)
FFM $nfk$ O($n^2k$)

4. FFM求解

 论文中采用了SGD算法中的AdaGrad方法。

 梯度为:

 其中

 其次,对于每一个$d=1,2,…,k$,平方梯度和为

 最终,更新$(w_{j_1,f_2})$和$(w_{j_2,f_1})$

 其中, $\eta$为学习率。

5. 添加Field信息

 在LIBSVM中,数据格式为:

 每一个(feat, val)pair表示特征索引和特征值。对FFM,扩展这个格式为:

  • Categorical Features
    对数据

    对应的LIBSVM格式为

    注意一个类别特征的互异值个数有多少个就生成几个二值特征,且仅有一个为1。在LIBSVM中,这个格式仅存储非零值。论文中也采用这一设置:

  • Numerical Features
    numericFea

     第一种方式为把每个特征当做一个dummy field,所以产生的数据为:

 这种方式不够informative。
 第二种方式为把数值特征离散化为类别特征。

 这种方式难以确定最好的离散化方式。

  • Single-field Features
    NLP中常见。

6. tensorflow实现

主要参考小小挖掘机的代码

util.py文件:生成数据,读取数据。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import numpy as np
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import LabelEncoder


def gen_data(data_size, input_x_size):
    labels = [-1, 1]
    y = [np.random.choice(labels, 1)[0] for _ in range(data_size)]
    x_field = [i // 10 for i in range(input_x_size)]
    x = np.random.randint(0, 2, size=(data_size, input_x_size))
    return x, y, x_field


def read_data(path, cols:list, continuous_feature:list, discrete_feature:list, target):
    data = pd.read_csv(path, names=cols, header=None)
    print(data)
    x = None;
    y = np.array([]);
    field = []
    cnt = 0
    for col in data.columns:
        if col == target:
            encoder = LabelEncoder()
            y = encoder.fit_transform(data[col].values)
        elif col in discrete_feature:
            singleFeature = data[col].values.reshape(-1, 1)
            fitter = OneHotEncoder(sparse=False)
            t = fitter.fit_transform(singleFeature)
            if x is None:
                x = t
            else:
                x = np.concatenate([x, t], axis=1)
            for _ in range(len(t[0])):
                field.append(cnt)
            cnt += 1
        elif col in continuous_feature:
            singleFeature = data[col].values.reshape(-1, 1)
            scaler = MinMaxScaler()
            t = scaler.fit_transform(singleFeature)
            if x is None:
                x = t
            else:
                x = np.concatenate([x, t], axis=1)
            field.append(cnt)
            cnt += 1
    return x, y, field, cnt

FFM.py文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
import tensorflow as tf
import pandas as pd
import numpy as np
import os

class FFM():
    def __init__(self, input_x_size, field_size, field, k=5, epoch=10, batch_size=48, learning_rate=0.01):
        self.input_x_size = input_x_size
        self.field_size = field_size
        self.input_x_field = field
        self.k = k
        self.epoch = epoch
        self.batch_size = batch_size
        self.lr = learning_rate
        self.model_save_path = "TFModel"
        self.model_name = "FFM"


    def createInteractionWeight(self):
        self.interaction_weights = tf.get_variable(name='interaction_weights', shape=[self.input_x_size, self.field_size, self.k],
                                    dtype=tf.float32, initializer=tf.truncated_normal_initializer(stddev=0.01))

    def createOneDimensionWeight(self):
        self.one_dimension_weights = tf.get_variable(name='one_dimension_weights', shape=[self.input_x_size],
                                dtype=tf.float32, initializer=tf.truncated_normal_initializer(stddev=0.01))

    def createBiasWeight(self):
        self.bias = tf.get_variable(name='bias', shape=[1],
                                dtype=tf.float32, initializer=tf.zeros_initializer())                                

        
    def inference(self, inputX):
        secondValue = tf.reduce_sum(tf.multiply(self.one_dimension_weights, inputX))
        firstTwoValue = tf.add(self.bias, secondValue)
        
        thirdValue = tf.Variable(0.0, dtype=tf.float32)
        
        for i in range(self.input_x_size):
            featureIndex1 = i
            fieldIndex1 = int(self.input_x_field[i])
            for j in range(i + 1, self.input_x_size):
                featureIndex2 = j
                fieldIndex2 = int(self.input_x_field[j])
                vectorLeft = tf.convert_to_tensor([[featureIndex1, fieldIndex2, i] for i in range(self.k)])
                weightLeft = tf.gather_nd(self.interaction_weights, vectorLeft)
                weightLeftAfterCut = tf.squeeze(weightLeft)
                
                vectorRight = tf.convert_to_tensor([[featureIndex2, fieldIndex1, i] for i in range(self.k)])
                weightRight = tf.gather_nd(self.interaction_weights, vectorRight)
                weightRightAfterCut = tf.squeeze(weightRight)
                
                tempValue = tf.reduce_sum(tf.multiply(weightLeftAfterCut, weightRightAfterCut))

                xi = tf.squeeze(tf.gather_nd(inputX, [i]))
                xj = tf.squeeze(tf.gather_nd(inputX, [j]))
                product = tf.reduce_sum(tf.multiply(xi, xj))

                secondItemVal = tf.multiply(tempValue, product)
                tf.assign(thirdValue, tf.add(thirdValue, secondItemVal))

        return tf.add(firstTwoValue, thirdValue)                


    def build_model(self):
        self.global_step = tf.Variable(0, trainable=False)
        self.input_x = tf.placeholder(tf.float32, [self.input_x_size])
        self.input_y = tf.placeholder(tf.float32)

        self.lambda_w = tf.constant(0.001, name='lambda_w')
        self.lambda_v = tf.constant(0.001, name='lambda_v')

        self.createBiasWeight()
        self.createOneDimensionWeight()
        self.createInteractionWeight()

        self.y_ = self.inference(self.input_x)

        self.l2_norm = tf.reduce_sum(tf.add(tf.multiply(self.lambda_w, tf.pow(self.one_dimension_weights, 2)),
                    tf.reduce_sum(tf.multiply(self.lambda_v, tf.pow(self.interaction_weights, 2)), axis=[1, 2])))
    
        self.loss = tf.reduce_sum(tf.log(1 + tf.exp(-self.input_y * self.y_))) + self.l2_norm
        # self.optimizer = tf.train.GradientDescentOptimizer(self.lr).minimize(self.loss)
        self.optimizer = tf.train.AdagradOptimizer(self.lr).minimize(self.loss)


    def train(self, input_x, input_y):
        self.trainX = input_x
        self.trainY = input_y
        self.saver = tf.train.Saver()
        self.sess = tf.Session()
        self.sess.run(tf.global_variables_initializer())
        for ite in range(self.epoch):
            for i in range(len(self.trainX)):
                self.global_step += 1
                batchX = self.trainX[i]
                batchY = self.trainY[i]
                # batchX = np.squeeze(batchX)
                # batchY = np.squeeze(batchY)
                loss_, steps, _ = self.sess.run([self.loss, self.global_step, self.optimizer],
                                    feed_dict={self.input_x: batchX, self.input_y: batchY})
                if i % 20 == 0:
                    print("training step: %d , loss: %.3f" % (steps, loss_))
                    self.saver.save(self.sess, os.path.join(self.model_save_path, self.model_name), global_step=steps)
                    writer = tf.summary.FileWriter(os.path.join(self.model_save_path, self.model_name), tf.get_default_graph())
                    writer.close()


    def predict(self, test_data):
        num = len(test_data)
        pred = np.zeros((num, 1))
        for i in range(num):
            y_ = self.sess.run(self.y_, feed_dict={self.input_x: test_data[i]})
            if y_ > 0:
                pred[i] = 1
            else:
                pred[i] = -1
        return pred

    def evaluate(self, testX, testY):
        pred = self.predict(testX)
        accu = np.mean(pred == testY)
        return accu

7. Reference

1. ffm-paper
2. 深入FFM原理与实践-美团技术团队
3. http://www.cs.cmu.edu/~wcohen/10-605/2015-guest-lecture/FM.pdf
4.https://www.csie.ntu.edu.tw/~r01922136/slides/ffm.pdf
5. libffm-github
6. ffmcode-github