Sistema de Recomendação (projecto)

From Wiki**3

Revision as of 17:16, 2 November 2016 by Root (talk | contribs) (Created page with "Este projecto destina-.se a realizar um sistema de recomendação baseado em filtros colaborativos. == Collaborative Filtering Recommendation System == Automatic recommendat...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Este projecto destina-.se a realizar um sistema de recomendação baseado em filtros colaborativos.

Collaborative Filtering Recommendation System

Automatic recommendation algorithms are important in electronic commerce for matching users with products that may be of interest to them. A simple but effective way of achieving this match is called collaborative filtering. There are several versions of collaborative filtering recommendation systems, depending on the exact nature of the recommendation goal. In general, these algorithms start from a matrix of users by products and produce recommendations or rating predictions, by matching users (or products), using similarity measures between users (or products).

In our scenario, the users-products matrix encodes the associations between users and products they acquire. Furthermore, users-products pairs are unique, i.e., a user may acquire a product more than once, but there will be only one record connecting that user with that product. Although this binary representation has some limitations (it does not reflect product preferences, for instance), it makes processing simpler.

When computing suggestions for a given user, the algorithm first finds users that are similar to the one under consideration and, then, from the most popular products acquired by the most similar users, selects the products to recommend. User similarity is computed in terms of the products that they acquire and can be computed in various ways. A popular way is to consider the cosine distance, measured between the vectors formed by the products acquired by each user: each vector position corresponds to a product (considering all known products acquired by all users) and is, in our scenario, either 0 (the user has not acquired the product yet), or 1 (the user has already acquired the product).

Sim-cosine.png

In this formula u1 and u2 are two users, vecu1 and vecu2 are the corresponding product vectors, Pu1 and Pu2 are the sets of products each user has acquired. ||·|| is the vector norm operator and |·| is the set cardinality operator.

Running the Program

The program takes property db to specify the file containing the users-products database and a command line argument for specifying the user to provide recommendations for.

The output consists of a list of 10 products (one per line), ordered by their fit to the user or alphabetically if tied.

You may assume that the database does not contain errors and that there are no mistakes when invoking the program. As an example, if the database file is last.fm.100k.txt and the user to get recommendations for is user_000577, the command to invoke the recommender is (App is the class containing the main method):

 java -Ddb=last.fm.100k.txt App user_000577

Solution

<java5> package scollafil;

import java.io.BufferedReader; import java.io.FileReader; import java.io.IOException; import java.util.ArrayList; import java.util.Collections; import java.util.HashMap; import java.util.HashSet; import java.util.List; import java.util.Map; import java.util.Set;

/**

* This class represents a store (with products and clients).
* 
* The store keeps track of what was bought by whom and vice-versa.
* 
* @author David Martins de Matos
*/

@SuppressWarnings("nls") public class Store {

 /**
  * This constant controls the size of the top-N list for candidate selection.
  * It is used both to select the closest clients and the most interesting
  * products.
  */
 private final int TOP_N = 10;
 /** Each entry lists buyers of a given product. */
 private Map<String, HashSet<String>> _clientsByProduct = new HashMap<>();
 
 /** Each entry lists products bought by a given client. */
 private Map<String, HashSet<String>> _productsByClient = new HashMap<>();
 /**
  * @param db
  */
 public Store(String db) {
   try {
     BufferedReader reader = new BufferedReader(new FileReader(db));
     String line = "";
     while ((line = reader.readLine()) != null) {
       String[] fields = line.split("\t");
       String client = fields[0], product = fields[1];
       HashSet<String> cbps = _clientsByProduct.get(product);
       if (cbps == null) {
         cbps = new HashSet<>();
         _clientsByProduct.put(product, cbps);
       }
       cbps.add(client);
       HashSet<String> pbcs = _productsByClient.get(client);
       if (pbcs == null) {
         pbcs = new HashSet<>();
         _productsByClient.put(client, pbcs);
       }
       pbcs.add(product); 
     }
     reader.close();
   } catch (IOException e) {
     System.err.println("Error occurred while opening " + db + ": " + e);
     System.exit(-1);
   }
 }
 /**
  * @param client
  * @return recommendation list (a list of product names)
  */
 public List<String> getRecommendationsFor(String client) {
   ArrayList<Distance> distances = new ArrayList<>();
   for (String other : _productsByClient.keySet()) {
     if (other.equals(client))
       continue;
     distances.add(cosine(client, other));
   }
   Collections.sort(distances);
   System.out.println("Closest " + TOP_N + " to " + client + ":");
   HashMap<String, Integer> counts = new HashMap<>();
   for (int i = 0; i < distances.size() && i < TOP_N; i++) {
     Distance d = distances.get(i);
     if (d == null)
       break;
     String target = d.getTarget(); // similar client
     System.out.printf("[%d]@%.2g%% = %s\n", i, d.getDistance() * 100, target);
     for (String p : _productsByClient.get(target)) {
       Integer cc = counts.get(p);
       counts.put(p, cc == null ? 0 : cc + 1);
     }
   }
   Set<String> ks = counts.keySet();
   ks.removeAll(_productsByClient.get(client));
   ArrayList<Distance> bestProducts = new ArrayList<>();
   for (String k : ks)
     bestProducts.add(new Distance(client, k, counts.get(k)));
   Collections.sort(bestProducts);
   ArrayList<String> recomm = new ArrayList<>();
   for (int i = 0; i < bestProducts.size() && i < TOP_N; i++) {
     Distance d = bestProducts.get(i);
     if (d == null)
       break;
     recomm.add(d.getTarget());
     System.out.printf("[%d]@%g = %s\n", i, d.getDistance(), d.getTarget());
   }
   return recomm;
 }
 /**
  * @param c1
  * @param c2
  * @return "cosine" of two sets
  */
 public Distance cosine(String c1, String c2) {
   Set<String> v1 = _productsByClient.get(c1), v2 = _productsByClient.get(c2);
   Set<String> intersection = new HashSet<>(v1);
   intersection.retainAll(v2);
   double d = intersection.size() / Math.sqrt(v1.size()) / Math.sqrt(v2.size());
   return new Distance(c1, c2, d);
 }

} </java5>

<java5> package scollafil;

/**

* Distance between two points: source and target.
* 
* @author David Martins de Matos
*/

class Distance implements Comparable<Distance> {

 /** Source. */
 private String _source;
 
 /** Target. */
 private String _target;
 
 /** Distance. */
 private double _distance;
 /**
  * @param source
  * @param target
  * @param distance
  */
 public Distance(String source, String target, double distance) {
   _source = source;
   _target = target;
   _distance = distance;
 }
 /**
  * Natural order is according to distance, except when distances are same. In
  * this case, the order is alphabetical (target).
  * 
  * @see java.lang.Comparable#compareTo(java.lang.Object)
  */
 @Override
 public int compareTo(Distance d) {
   return _distance > d._distance ? -1
       : (_distance < d._distance ? 1 : _target.compareTo(d._target));
 }
 /** @return source */
 public String getSource() {
   return _source;
 }
 /** @return target */
 public String getTarget() {
   return _target;
 }
 /** @return distance */
 public double getDistance() {
   return _distance;
 }

} </java5>

<java5> package scollafil;

import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader;

/**

* This class implements a simple collaborative filtering recommender.
* 
* The program uses property "db" to read a shopping database. After
* initialization, client names are requested and a product list is produced for
* each one. Only the top-3 items are presented.
* 
* @author David Martins de Matos
*/

public class App {

 /**
  * @param args
  *          command line arguments
  */
 @SuppressWarnings("nls")
 public static void main(String[] args) {
   String db = System.getProperty("db");
   if (db == null) {
     System.err.println("Please, provide a shopping database!");
     return;
   }
   Store store = new Store(db);
   BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
   String client = "";
   try {
     do {
       System.out.print("Client: ");
       if ((client = in.readLine()) == null || client.equals(""))
         break;
       System.out.println(store.getRecommendationsFor(client));
     } while (true);
   } catch (IOException e) {
     System.err.println("Error while reading standard input!");
     e.printStackTrace();
     System.exit(1);
   }
 }

} </java5>