Sistema de Recomendação (projecto)

From Wiki**3

Este projecto destina-.se a realizar um sistema de recomendação baseado em filtros colaborativos.

Collaborative Filtering Recommendation System

Automatic recommendation algorithms are important in electronic commerce for matching users with products that may be of interest to them. A simple but effective way of achieving this match is called collaborative filtering. There are several versions of collaborative filtering recommendation systems, depending on the exact nature of the recommendation goal. In general, these algorithms start from a matrix of users by products and produce recommendations or rating predictions, by matching users (or products), using similarity measures between users (or products).

In our scenario, the users-products matrix encodes the associations between users and products they acquire. Furthermore, users-products pairs are unique, i.e., a user may acquire a product more than once, but there will be only one record connecting that user with that product. Although this binary representation has some limitations (it does not reflect product preferences, for instance), it makes processing simpler.

When computing suggestions for a given user, the algorithm first finds users that are similar to the one under consideration and, then, from the most popular products acquired by the most similar users, selects the products to recommend. User similarity is computed in terms of the products that they acquire and can be computed in various ways. A popular way is to consider the cosine distance, measured between the vectors formed by the products acquired by each user: each vector position corresponds to a product (considering all known products acquired by all users) and is, in our scenario, either 0 (the user has not acquired the product yet), or 1 (the user has already acquired the product).

Sim-cosine.png

In this formula u1 and u2 are two users, vecu1 and vecu2 are the corresponding product vectors, Pu1 and Pu2 are the sets of products each user has acquired. ||·|| is the vector norm operator and |·| is the set cardinality operator.

Running the Program

The program takes property db to specify the file containing the users-products database and a command line argument for specifying the user to provide recommendations for.

The output consists of a list of 10 products (one per line), ordered by their fit to the user or alphabetically if tied.

You may assume that the database does not contain errors and that there are no mistakes when invoking the program. As an example, if the database file is last.fm.100k.txt and the user to get recommendations for is user_000577, the command to invoke the recommender is (App is the class containing the main method):

 java -Ddb=last.fm.100k.txt App user_000577

Solution

Class Store implements the store and the recommendation algorithm.

Store.java
{{{2}}}

Class Distance is used to encode distances between users and products.

Distance.java
{{{2}}}

Class App runs the program: it asks the user for a name and provides recommendations, based on the history of the store.

App.java