추천 시스템의 바탕이 되는 Matrix factorization 이다.

결국 N 차원을 d 차원으로 축소했다가 다시 N 차원으로 만드는 과정이다.

하지만, N차원을 d차원으로 축소 한것이 N 차원을 잘 표현(?)해야한다...

PCA(차원 축소) 오토 인코더와 비슷한점 차이점을 알면 좋을 것 같지만,, 차차 알아가보도록 하겠다.

이론적인 부분을 아래의 블로그를 참고했다.

yeomko.tistory.com/5

갈아먹는 추천 알고리즘 [3] Matrix Factorization

지난 글 갈아먹는 추천 알고리즘 [1] 추천 알고리즘의 종류 갈아먹는 추천 알고리즘 [2] Collaborative Filtering 들어가며 지난 글에서 Collaborative Filtering에 대하여 자세히 알아보았습니다. 세부적인 알

yeomko.tistory.com

1. Matrix Factorization for Network Embedding¶

In [1]:

import pandas as pd
import numpy as np
import random
import networkx as nx
from matplotlib import pyplot as plt 
np.random.seed(15)

In [2]:

#Load data
adjlist = nx.read_adjlist("karate_club.adjlist", nodetype=int)
karate_label = np.loadtxt("karate_label.txt")

In [6]:

adj = nx.to_numpy_array(adjlist)
label = karate_label[:,-1]

print(adj.shape)
print(label.shape)

(34, 34)
(34,)

In [7]:

#defining P, Q for matrix factorizaiton
d= 4
P = np.random.random((4,34))
Q = np.random.random((4,34))

In [8]:

zuzv = np.dot(P.T,Q)
zuzv.shape

Out[8]:

(34, 34)

In [9]:

# loss function
def loss(a,b):
    return np.sum((a-b)**2)

In [10]:

loss(zuzv,adj)

Out[10]:

1057.929366546701

In [11]:

epoch = 500
lr = 0.001

In [12]:

#Updating params 
loss_list = [0 for _ in range(epoch)]
for i in range(epoch):
    P -= lr *  np.dot(zuzv-adj,Q.T).T
    Q -= lr *  np.dot(zuzv-adj,P.T).T
    
    
    loss_list[i] = loss(zuzv,adj)
    zuzv = np.dot(P.T,Q)

In [13]:

#plotting the loss
plt.plot(loss_list)

Out[13]:

[<matplotlib.lines.Line2D at 0x17cb59a6b50>]

T-SNE¶

the membership number are located nearly when they have many relationship
it differs quite a lot when perplexity changes
unlike the figure, label doesn't mean anyting (expressed by the color)

In [14]:

ans = np.dot(adj,P.T)

In [15]:

import matplotlib.pyplot as plt
from sklearn.manifold import TSNE

model = TSNE(learning_rate=100,perplexity=3)
transformed = model.fit_transform(ans)
xs = transformed[:,0]
ys = transformed[:,1]

for i in range(len(xs)):
    plt.scatter(xs[i],ys[i],c = label[i])
    plt.text(xs[i],ys[i],i)
plt.scatter(xs,ys,c=label)
#plt.text(xs,ys)

plt.show()

In [ ]:

'Data Science > NLP' 카테고리의 다른 글

[NLP] Deepwalk + logistic Regression with python (0)	2021.04.29
[NLP] Matrix Factorization + logistic Regression with python (0)	2021.04.29
[NLP] 파이썬으로 backpropagation 구현하기 (with different hidden layers) (0)	2021.03.31
[NLP] 파이썬으로 backpropagation 구현하기 (without bias) (0)	2021.03.31

Data Science

[NLP] Matrix Factorization 구현하기 with 파이썬

1. Matrix Factorization for Network Embedding¶

T-SNE¶

'Data Science > NLP' 카테고리의 다른 글

댓글

티스토리툴바

[NLP] Matrix Factorization 구현하기 with 파이썬

1. Matrix Factorization for Network Embedding¶

T-SNE¶

'Data Science > NLP' 카테고리의 다른 글

관련글

댓글

티스토리툴바