协同过滤外文文献翻译.docx-资源下载

协同过滤外文文献翻译.docx

1、协同过滤外文文献翻译外文：Introduction to Recommender SystemApproaches of Collaborative Filtering: Nearest Neighborhood and Matrix Factorization“We are leaving the age of information and entering the age of recommendation.”Like many machine learning techniques, a recommender system makes prediction based on user

2、s historical behaviors. Specifically, its to predict user preference for a set of items based on past experience. To build a recommender system, the most two popular approaches are Content-based and Collaborative Filtering.Content-basedapproach requires a good amount of information of items own feat

3、ures, rather than using users interactions and feedbacks. For example, it can be movie attributes such as genre, year, director, actor etc., or textual content of articles that can extracted by applying Natural Language Processing.Collaborative Filtering, on the other hand, doesnt need anything else

4、 except users historical preference on a set of items. Because its based on historical data, the core assumption here is that the users who have agreed in the past tend to also agree in the future. In terms of user preference, it usually expressed by two categories.Explicit Rating, is a rate given b

5、y a user to an item on a sliding scale, like 5 stars for Titanic. This is the most direct feedback from users to show how much they like an item.Implicit Rating, suggests users preference indirectly, such as page views, clicks, purchase records, whether or not listen to a music track, and so on. In

6、this article, I will take a close look at collaborative filtering that is a traditional and powerful tool for recommender systems.Nearest NeighborhoodThe standard method of Collaborative Filtering is known asNearest Neighborhoodalgorithm. There are user-based CF and item-based CF. Lets first look at

7、User-based CF. We have an n m matrix of ratings, with user u, i = 1, .n and item p, j=1, m. Now we want to predict the rating r if target user i did not watch/rate an item j. The process is to calculate the similarities between target user i and all other users, select the top X similar users, and t

8、ake the weighted average of ratings from these X users with similarities as weights.While different people may have different baselines when giving ratings, some people tend to give high scores generally, some are pretty strict even though they are satisfied with items. To avoid this bias, we can su

9、btract each users average rating of all items when computing weighted average, and add it back for target user, shown as below.Two ways to calculate similarity arePearson CorrelationandCosine Similarity.Basically, the idea is to find the most similar users to your target user (nearest neighbors) and

10、 weight their ratings of an item as the prediction of the rating of this item for target user.Without knowing anything about items and users themselves, we think two users are similar when they give the same item similar ratings . Analogously, forItem-based CF, we say two items are similar when they

11、 received similar ratings from a same user. Then, we will make prediction for a target user on an item by calculating weighted average of ratings on most X similar items from this user. One key advantage of Item-based CF is the stability which is that the ratings on a given item will not change sign

12、ificantly overtime, unlike the tastes of human beings.There are quite a few limitations of this method. It doesnt handle sparsity well when no one in the neighborhood rated an item that is what you are trying to predict for target user. Also, its not computational efficient as the growth of the numb

13、er of users and products.Matrix FactorizationSince sparsity and scalability are the two biggest challenges for standard CF method, it comes a more advanced method that decompose the original sparse matrix to low-dimensional matrices with latent factors/features and less sparsity. That is Matrix Fact

14、orization.Beside solving the issues of sparsity and scalability, theres an intuitive explanation of why we need low-dimensional matrices to represent users preference. A user gave good ratings to movie Avatar, Gravity, and Inception. They are not necessarily 3 separate opinions but showing that this

15、 users might be in favor of Sci-Fi movies and there may be many more Sci-Fi movies that this user would like. Unlike specific movies, latent features is expressed by higher-level attributes, and Sci-Fi category is one of latent features in this case.What matrix factorization eventually gives us is h

16、ow much a user is aligned with a set of latent features, and how much a movie fits into this set of latent features.The advantage of it over standard nearest neighborhood is that even though two users havent rated any same movies, its still possible to find the similarity between them if they share

17、the similar underlying tastes, again latent features.To see how a matrix being factorized, first thing to understand isSingular Value Decomposition(SVD). Based on Linear Algebra, any real matrix R can be decomposed into 3 matrices U, , and V. Continuing using movie example, U is an n r user-latent f

18、eature matrix, V is an m r movie-latent feature matrix. is an r r diagonal matrix containing the singular values of original matrix, simply representing how important a specific feature is to predict user preference.To sort the values of by decreasing absolute value and truncate matrix to first k di

19、mensions( k singular values), we can reconstruct the matrix as matrix A. The selection of k should make sure that A is able to capture the most of variance within the original matrix R, so that A is the approximation of R, A R. The difference between A and R is the error that is expected to be minim

20、ized. This is exactly the thought of Principle Component Analysis.When matrix R is dense, U and V could be easily factorized analytically. However, a matrix of movie ratings is super sparse. Although there are some imputation methods to fill in missing values , we will turn to a programming approach

21、 to just live with those missing values and find factor matrices U and V. Instead of factorizing R via SVD, we are trying find U and V directly with the goal that when U and V multiplied back together the output matrix R is the closest approximation of R and no more a sparse matrix. This numerical a

22、pproximation is usually achieved withNon-Negative Matrix Factorizationfor recommender systems since there is no negative values in ratings.See the formula below. Looking at the predicted rating for specific user and item, item i is noted as a vector q, and user u is noted as a vector p such that the

23、 dot product of these two vectors is the predicted rating for user u on item i. This value is presented in the matrix R at row u and column i.How do we find optimal q and p? Like most of machine learning task, a loss function is defined to minimize the cost of errors.r is the true ratings from origi

24、nal user-item matrix. Optimization process is to find the optimal matrix P composed by vector p and matrix Q composed by vector q in order to minimize the sum square error between predicted ratings r and the true ratings r. Also, L2 regularization has been added to prevent overfitting of user and it

25、em vectors. Its also quite common to add bias term which usually has 3 major components: average rating of all items , average rating of item i minus (noted as b), average rating given by user u minus u(noted as b).OptimizationA few optimization algorithms have been popular to solve Non-Negative Fac

26、torization.Alternative Least Squareis one of them. Since the loss function is non-convex in this case, theres no way to reach a global minimum, while it still can reach a great approximation by finding local minimums. Alternative Least Square is to hold user factor matrix constant, adjust item facto

27、r matrix by taking derivatives of loss function and setting it equal to 0, and then set item factor matrix constant while adjusting user factor matrix. Repeat the process by switching and adjusting matrices back and forth until convergence. If you apply Scikit-learn NMF model, you will see ALS is th

28、e default solver to use, which is also called Coordinate Descent. Pyspark also offers pretty neat decomposition packages that provides more tuning flexibility of ALS itself.Some ThoughtsCollaborative Filtering provides strong predictive power for recommender systems, and requires the least informati

29、on at the same time. However, it has a few limitations in some particular situations.First, the underlying tastes expressed by latent features are actually not interpretable because there is no content-related properties of metadata. For movie example, it doesnt necessarily to be genre like Sci-Fi i

30、n my example. It can be how motivational the soundtrack is, how good the plot is, and so on. Collaborative Filtering is lack of transparency and explainability of this level of information.On the other hand, Collaborative Filtering is faced with cold start. When a new item coming in, until it has to

31、 be rated by substantial number of users, the model is not able to make any personalized recommendations . Similarly, for items from the tail that didnt get too much data, the model tends to give less weight on them and have popularity bias by recommending more popular items.Its usually a good idea

32、to have ensemble algorithms to build a more comprehensive machine learning model such as combining content-based filtering by adding some dimensions of keywords that are explainable, but we should always consider the tradeoff between model/computational complexity and the effectiveness of performance improvement.中文翻译推荐系统介绍协同过滤的方法：最近邻域和矩阵分解“我们正在离开信息时代，而进入推荐时代。”像许多机器学习技术一样，推荐系统根据用户的历史行为进行预测。具体来说，是根据过去的经验来预测用户对一组商品的偏好。要构建推荐系统，最流行的两种方法是基于内容的过滤和协作过滤。基于内容的方法需要大量项目自身功能的信息，而不是使用用户的交互和反馈。例如，它可以是电影属性（例如流派，年份，导演，演员等）或可以通过应用自然语言

邮箱/手机：
温馨提示：	快捷下载时，用户名和密码都是您填写的邮箱或者手机号，方便查询和重复下载（系统自动生成）。如填写123，账号就是123，密码也是123。
特别说明：	请自助下载，系统不会自动发送文件的哦；如果您已付费，想二次下载，请登录后访问：我的下载记录
支付方式：
验证码：	换一换

账号：
密码：
验证码：	换一换
当日自动登录忘记密码？