HYBRID MOVIE RECOMMENDERS BASED ON NEURAL NETWORKS AND DECISION TREES

Rashidifar, Mohammad Amin

HYBRID MOVIE RECOMMENDERS BASED ON NEURAL NETWORKS AND DECISION TREES

Summary

The internet provides a lot of information to users. To help users find the items of their interest in this information overload, recommender systems have been developed. In this book we explored movie recommender systems based on three recommendation methods: content-based, collaborative filtering and a hybrid recommendation one based on the previous two. The algorithms that we used are the decision tree learning and the neural networks. The algorithms were implemented by using the data mining software Weka. To test these recommender systems, we combined the movie data from the Internet Movie Database and the rating data provided by Netflix. The results show that the proposed hybrid recommender systems does not perform better or worse than the content-based recommender systems and collaborative filtering recommender systems.

Excerpt

CHAPTER 1

INTRODUCTION

1.1 Background

Nowadays the World Wide Web provides a new way of communication and has

a great impact on both academic research and daily life. A lot of information

can be found on the Internet and is easily accessible. In order to help users to

deal with the information overload and find the information or items of their

interest, so-called recommender systems have been developed. These

recommender systems are used for several purposes, like proposing web pages,

movies, restaurants, interesting articles and so on. There are various

recommendation methods that can be used to find the preferences of a user and

each recommendation method has its strengths and weaknesses. To reduce these

weaknesses and take advantage of the strengths of different recommendation

methods, these methods are combined in hybrid recommender systems. In this

book, recommender systems for movies will be examined. The properties,

advantages and disadvantages of the movie recommenders and their

recommendation methods will be explored. We will also consider some

recommendation methods that have not been used (yet) for a movie

recommender.

In addition, this book will propose hybrid recommender systems for movies that

use both a content-based (CB) recommendation method and a collaborative

filtering (CF) method. By combining these two recommendation methods, we

hope to build systems with a higher accuracy of predictions. These methods

need to be based on data mining algorithms like neural networks or decision

trees. We also hope to improve the prediction quality of recommenders based

on these prediction algorithms compared to other systems proposed in the

literature. The predictive accuracy of all these recommender systems will be

tested on real life movie data: content information and rating information of

movies. The content information will be extracted from a movie and TV site,

the Internet Movie Database (IMDb) [1], and the rating information is from

Netflix [2], which is an online movie-renting site. Each of these recommenders

will predict the number of stars given to the movies by a user, so the prediction

can tell to which extent the user will like or dislike the movie.

1.2 Motivation

In our previous work [3] we explored the hybridization method combining the

content-based method and the collaborative filtering, both based on the naïve

Bayesian classifier. The proposed recommendation methods in that work used

two classes: it predicts if a user would like or dislike a movie. In this paper we

want to explore hybridization methods that also combine the content-based

method and the collaborative filtering, but these methods will be based on

neural networks or decision trees. We wanted to explore these combinations,

because these combinations has not been researched in the literature before and

it will be interesting to see how these hybridization methods will perform

compared to other recommendation methods. In addition, in this paper we want

to examine the preference of a user more accurate. In other words, we will also

look to which extent a user will like or dislike a movie. So instead of only

predicting if a user will like or dislike a movie, as we did in our previous work,

the prediction will also be divided into five different ratings, from one star till

five stars. Here a rating of one star means that the user did not like the movie at

all and a rating of five stars means that the user liked the movie very much.

1.3 Goal

In this book we will examine the performance of the proposed hybrid

recommender system. The following research question will be answered:

How does a hybrid recommender for movies based on neural network or

decision tree perform, that combines a content-based recommender for movies,

which uses text mining, with a collaborative filtering recommender for movies,

which uses user ratings?

To answer the research question, the following sub questions need to be

answered first:

How can these two algorithms be used individually for a content-

based recommender or collaborative filtering for movies.?

How can one devise a hybrid recommender based on each of these

algorithms, that combines a content-based with a collaborative

filtering, both based on one of these algorithms?

For the first sub question, we will work with content-based and collaborative

filtering systems separately, so we will not work with hybrid systems. For both

of these recommendation methods we will create a recommender based on

neural network and decision tree, so we will have four different recommender

systems:

A content-based recommender system based on neural network (CB-NN).

A content-based recommender system based on decision tree (CB-DT).

A collaborative filtering system based on neural network (CF-NN).

A collaborative filtering system based on decision tree (CF-DT).

The second sub question means that we will work with hybrid recommender

systems based on the aforementioned algorithms separately, for both the

content-based part and the collaborative filtering part of the system. In addition,

we have the following two hybrid recommender systems:

A hybrid recommender system based on neural network, combining

content-based and collaborative filtering (H-NN).

6) A hybrid recommender system based on decision tree, combining

content-based and collaborative filtering (H-DT)

The performance of the hybrid recommender H-NN will be compared with the

recommenders CB-NN and CF-NN and the hybrid recommender H-DT will be

compared with the recommender CB-DT and CF-DT. These recommenders will

also be compared with the content-based, collaborative filtering and the hybrid

recommender systems based on naïve Bayesian classifier.

1.4 Methodology

In order to answer the research question and the sub questions, we have taken

the following steps:

1. Study literature

2. Collect datasets

3. Implement algorithms

4. Experiment

1. Study literature

To answer the research questions, some literature about recommender systems

and data mining algorithms have to be studied first. Especially recommender

systems for movies will be examined. In the literature, various recommendation

methods and algorithms have been discusses. The recommendation methods

used in this book are the content-based method and collaborative filtering. The

algorithms that are used are neural network and decision tree. Beside these

recommendation methods and algorithms, some literature about hybrid

recommenders will be studied to find a combination of recommendation

methods to improve the prediction of accuracy of the individual methods.

2. Collect datasets

The performance of the prediction of these recommender systems will be tested

on movie data and user ratings from the Internet. The dataset consists of user

rating-data from Netflix, which is an online movie-renting site, and the movie

data from IMDb, which is a movie and TV site. The Netflix dataset contains

movie titles with their ratings given by users and the movie data from IMDb

contains information about movies, like the genre of a specific movie, the actors

and directors etc. Both data were collected and combined in [3] to get data that

contains both rating information and content information of movies. The user

rating-data from Netflix was made available to support participants in the

Netflix Prize, where users can compete to improve the current recommender

system of Netflix: CinematchSM. The movie data from IMDb were extracted

from their site.

3. Implement algorithms

The recommender systems for this research will be built in JAVA, which is an

object-oriented programming language developed by Sun Microsystems. For

the implementation of the algorithms we will use Weka [4], which is a data

mining software in JAVA. It is a collection of machine learning algorithms for

data mining tasks and some of these algorithms will be used to do the

predictions of the ratings.

4. Experiment

After collecting the datasets and building the systems, we test each system on

the collected datasets. The performances of the systems will be evaluated by

computing the Mean Absolute Error (MAE) and the accuracy of the predictions.

The MAE is the average of the difference between each prediction and the

actual rating. At the end, the results of the evaluation will be compared with

each other to answer the research questions.

1.5 Structure

This book contains the following chapters:

Chapter 2 Related work. In this chapter we briefly discuss the

difference between various recommendation methods and their

properties. We will also discuss the hybrid recommendation methods

distinguished in the literature. Further, some movie recommender systems

used by other researchers are presented and the algorithms neural

networks and decision trees are briefly introduced.

Chapter 3 Neural Networks - Backpropagation describes the

backpropagation algorithm and presents the implementation of this

algorithm for the content-based method, the collaborative filtering, and

the hybrid method. For each of these methods an example of the

implementation will be given.

Chapter 4 Decision tree learning C4.5 describes the C4.5 algorithm

and will also present the implementation of the decision tree for the

content-based method, the collaborative filtering, and the hybrid

recommendation. This chapter also gives an example of the

implementation for these three methods.

Chapter 5 Implementing algorithm - Weka. This chapter describes

the data mining software Weka and the algorithms used for the

recommendations: Multilayer Perceptron for the backproagation

algorithm and J48 for the C4.5 algorithm.

Chapter 6 Experiments & Results discusses the datasets used for this

book and provides the experiments and results of the proposed

recommendation methods. Further, a comparison with the

recommendation methods proposed in [3] will be shown.

Chapter 7 Conclusion. In the final chapter we summarize the book and

answer the sub questions and the research question. This chapter ends

with future work that can be further explored.

Chapter 2

Related work

This chapter gives a short explanation of the recommender system and describes

the various recommendation methods. We will discuss the hybrid recommender

system and explains some combination methods identified in the literature.

Some examples of other movie recommender systems will be given by

providing their recommendation methods, the algorithms used to make

predictions, and which data were used to evaluate the recommender systems.

Finally, this chapter introduces the algorithms that are used in this book, namely

neural networks and decision tree learning. These algorithms will be further

discussed in detail in the following two chapters.

2.1 Recommender systems

Recommender systems are employed to help users find their items based on

their preferences. They produce individualized recommendations as output or

have the effect of guiding the user in a personalized way to find interesting or

useful items in a large amount of other items [5]. To produce recommendations,

these systems need background data, input data and an algorithm. Background

data is the information that the system has before it produces any

recommendation. Input data is the information that is communicated to the

system by the user in order to produce recommendations. An algorithm in the

system is needed to combine the input data and the background data to produce

a recommendation. Based on these three points, Burke [5] distinguished five

different recommendation methods:

1) A collaborative recommender system collects ratings of items, recognizes

similarities between users based on their ratings, and produces new

recommendations based on inter-user comparisons.

2) Content-based recommender systems produce recommendation based on the

associated features of an item: it learns a user's interests profile based on the

features present in items that the user has rated before.

3) A recommender system based on demographic categorizes users based on

personal attributes and finds interesting items based on demographic classes.

4) Utility-based systems evaluate the match between a user's need and the set of

options available: it recommends items based on a computation of the utility of

each item for the user. 5) Knowledge-based recommenders also make such

evaluations, but they have knowledge about how a particular item meets a

particular user's need.

2.2 Hybrid recommender systems

Hybrid recommender systems are recommender systems that combine two or

more recommendation methods into one recommender system for a better

performance. The following combination methods are identified by Burke [5]:

1) A weighted hybrid recommender system calculates the score of a

recommended item from the results of the recommendation methods that the

system uses.

2) Switching hybrid recommender systems uses some criterion to switch

between the recommendations methods used in the system to do the

recommendation.

3) In a mixed hybrid recommender, recommendations from the different

recommendation methods are presented together.

4) Hybrid recommender systems based on feature combination combine the

features of the different recommendation methods in the system and use these

features in a single recommendation algorithm to produce recommendations.

5) In a cascade hybrid recommender system, one recommendation method is

used first to produce a ranking of recommended items and a second

recommendation method refines this ranking of items.

6) A hybrid recommender based on feature augmentation method uses the

output of one recommendation method as input for another recommendation

method used in the recommender system.

7) Meta-level hybrid recommenders use the model learned by the first

recommendation model as input to another recommendation method.

2.3 Movie recommender systems

There are various movie recommender systems proposed and discussed in the

literature. This section shows some examples of movie recommender systems

with their recommendation method, the used algorithms and data.

Christakou and Stafylopatis [6] proposed a hybrid movie recommender system

based on neural networks. They combined content-based and collaborative

filtering to provide more precise recommendations concerning. The content-

based part of the system was based on neural network and for the collaborative

filtering part they used the Pearson formula to find the correlation between a

user and other users. To test their proposed hybrid recommender they used the

MovieLens data set.

Our proposed hybrid movie recommenders [3] also combined the content-based

method with collaborative filtering to get a higher accuracy of performance.

Both methods were based on a naïve Bayesian classifier. For the evaluation of

the recommenders, we combined the movie data from IMDb and the rating data

from Netflix.

Symeonidis et al. [7] constructed a feature-weighted user profile to disclose the

duality between users and features. The outline of their approach consisted of

four steps: 1) constructing a content-based user profile from both collaborative

and content features; 2) quantifying the affect of each feature inside the user's

profile and among the users; 3) creating the user's neighbourhood by

calculating the similarity between each user to provide recommendations; 4)

providing a Top-N recommendation list for each test user based on the most

frequent feature in his neighbourhood. The experimental results were performed

with IMDb and MovieLens data sets.

Golbeck and Hendler [8] proposed FilmTrust, a website that integrates

Semantic Web-based social networks, augmented with trust, to create predictive

movie recommendations. For their work, they applied collaborative filtering

where the recommendations were generated to suggest how much a given user

may be interested in a movie that the user already found.

2.4 Neural networks

One of the algorithms we have used for this research is neural networks. Neural

networks, or artificial neural networks, consist of layers of connected nodes,

where each node produces a non-linear function of its input. The input to a node

may come from other nodes or directly from the input data. Some nodes are

also identified with the output of the network. The complete network therefore

represents a very complex set of interdependencies which may incorporate any

degree of nonlinearity, allowing very general functions to be modelled [9].

Artificial neural networks are designed to solve a variety of problem in pattern

recognition, clustering/categorization, function approximation,

prediction/forecasting, optimization, associative memory, and control [10]. The

goal of pattern recognition is to classify an input pattern represented by a

feature vector in one of the specified classes. The task of

clustering/categorization is to explore the similarity between patterns and to put

similar patterns in a cluster. Function approximation finds an estimate of an

unknown function. Prediction/forecasting algorithms predict a sample at some

future time. The task of an optimization algorithm is to find a solution that

satisfies a set of constraints such that the function of an objective is maximized

or minimized. In associative memory, the goal is to access the memory by their

content where the content in the memory can be recalled even by a partial input

or distorted content. In model-reference adaptive control, the task is to generate

a control input such that the system follows a desired trajectory that is

determined by the reference model. In this book, the neural networks are

designed to solve the problem in pattern recognition. Further description of

neural networks used for this research is explained in chapter 3.

2.5 Decision trees learning

The other algorithm we have used for this research is decision tree learning.

Decision tree learning is among the most widely used and practical algorithm

for inductive inference [11]. It is an algorithm for approximating discrete-value

target function, where a decision tree represents the learned function. The

decision trees can also be seen as a set of if-then rules for a better human

readability. Decision trees sort instances down the tree from the root to a leaf

node that classifies the instances. In each node of the tree an if-then rule of an

attribute of the instance is applied and each branch descending from that node

represents one of the possible values for this attribute. To classify an instance,

one starts at the root of the tree and tests the attribute specified by this node and

then moves down the branch of the tree that represents the value of the attribute

applicable for this instance. In general, decision trees can be seen as a

disjunction of conjunction of attribute values of instances, where each path from

the root is a conjunction of attribute tests and the tree itself is a disjunction of

these conjunctions. Many decision trees have been developed and they are all

best suited to problems with these characteristics:

- instances are represented by pairs of attribute-value

- the target function has discrete output values

- disjunctive descriptions may be required

- the training data may contain errors

- the training data may contain missing attribute values

Chapter 4 will further discuss the decision tree learning algorithm used for this

book.

CHAPTER 3

NEURAL NETWORKS -

BACKPROPAGATION

One of the algorithms we used in this book is the neural networks. This chapter

will give a more detailed description of the algorithm and presents the

implementation of this algorithm in the recommender system. We will go

further into details of the neural networks algorithm we have used for both the

content-based part of the recommender system and the collaborative filtering

part, the backpropagation algorithm. Further, this algorithm is explained by

examples for both of the recommendation methods. Finally the hybrid

recommendation method based of these two recommendation methods using the

backpropagation algorithm will be presented and explained by an example.

3.1 Backpropagation algorithm

The neural networks we have used are an acyclic directed graph of sigmoid

units based on backpropagation algorithm. Table 1 shows the backpropagation

algorithm [6] we will use for the networks. The sigmoid units are like

perceptions, but they are based on a smoothed, differentiable threshold function.

A sigmoid unit first computes a linear combination of its input, and then applies

a threshold to result, where the threshold is a continuous function of its input.

The sigmoid unit computes its output o as follows:

= ( )

(3.1)

Where

( ) =

1 +

Here is called the sigmoid function. Its output ranges between 0 and 1,

increasing monotonically with its input.

Table 3.1: Backpropagation algorithm for feedforward networks containing two

layers of sigmoid units.

BACKPROPAGATION(training_examples, , n

, n

hidden

, n

out

)

Each training example is a pair of the form

, where is the vector of

network input values, and is the vector of target network output values.

is the learning rate, n

is the number of network inputs, n

hidden

the

number of units in the hidden layer, and n

out

the number of output units.

The input from unit i into unit j is denoted x

, and the weight from unit i to

unit j is denoted w

· Create a feed-forward network with n

inputs, n

hidden

hidden units, and n

out

output units.

· Initialize all network weights to small random numbers

· Until the termination condition is met, Do

A. For each

in training_examples, Do

Propagate the input forward through the network:

1. Input the instance to the network and compute the output o

of every

unit u in the network.

Propagate the errors backward through the network:

2. For each network output unit k, calculate its error term

(1 o

)(t

) (3.2)

3. For each hidden unit h, calculate its error term

(1 o

)

(3.3)

4. Update each network weight w

(3.4)

where

= x

The network structure we will use is a layered network of two layers (one

hidden layer and one output layer) with feedforward connections from every

unit in one layer to every unit in the next. Each network will have 5 outputs

which will be the five rating categories. The output with the highest value,

which we will denote as h, will be taken as the network prediction, which is

often called a 1-of-n output encoding. The number of hidden nodes will depend

on the accuracy and the training time of each network and will be further

examined in chapter 6. The number of inputs depends on the available types of

characteristics features in the dataset. For example, if the whole dataset contains

only 5 different types of a particular characteristic, then the network will

contain only 5 inputs. The learning rate, the momentum and other parameters of

the algorithm will also be discussed in chapter 5.

3.2 Neural Networks for content-based recommendation

A neural network of the content-based recommendation will be constructed for

each user. The content-based method will use the genre, contributors and the

movie plot of a movie combined, which we will call movie-description. To train

and classify the network for a user, we use a matrix that contains vectors of the

movies that the user has rated with the characteristic features that are available

in the dataset with their ratings. First we create a set of different words, the

vocabulary, found in the movie-descriptions of the movies that the user has

rated. Then, for each rated movie, we look if a word in the vocabulary appeared

in the movie-description of the movie. The presence of the word found in each

movie represents the characteristic features for content-based recommendation.

The matrix for the content-based network for a particular user will look like:

Movies rated by

user

Distinct words in the

movie-descriptions of the

movies rated by user

Ratings given by user

... ...

...

The neural network will accept an input for each of the different words in a

dataset and the value of an input is the presence of the word in the movie-

description, where present is marked as 1 and not present is marked as 0. The

next paragraph shows how this matrix is filled with 0's and 1's.

3.2.1 Example content-based recommendation

Consider a user, user1, who has rated the following three simplified movies

from a training set with the movie-descriptions and ratings:

Movies

rated by

user1

Movie-descriptions of the movie rated by

user1

Ratings

given by

user1

Movie1

actor1, actor2, director1, genre1, plot1, plot2,

plot3

4 stars

Movie2

actor1, actor3, director2, genre1, plot1, plot3,

plot4

4 stars

Movie3

actor1, actor3, director2, genre2, plot2, plot3,

plot4

2 stars

And a test set with the movie-description and ratings:

Movies

rated by

Movie-descriptions of the movies rated by

user1

Ratings

given by

user1

Movie4

actor3, actor4, director3, genre3, plot1, plot3,

plot5

5 stars

Movie5

actor2, actor3, director1, genre2, plot2, plot4,

plot5

3 stars

The vocabulary with the distinct words found in the training set will be:

actor1, actor2, actor3, director1, director2, genre1, genre2, plot1, plot2, plot3,

plot4

Notice that the words actor4, director3, genre3, plot5 are not considered in the

vocabulary with the distinct words, since the neural networks only trains with

the words that are encountered in the training set.

When we look at the presence of the words found in the movie-description, the

matrix with vectors of the movies that user1 has rated in the training set will

become:

Movies

rated by

user1

Presence of the words founds in movie-description

Ratings

given by

user1

a1 a2 a3 d1 d2 g1 g2 p1 p2 p3 p4

Movie1

1 1 0 1 0 1 0 1 1 1 0 4

Movie2

1 0 1 0 1 1 0 1 0 1 1 4

Movie3

1 0 1 0 1 0 1 0 1 1 1 2

And the matrix with vectors of the movies in the test set will become:

Movies

rated by

user1

Presence of the words founds in movie-description

Ratings

given by

user1

a1 a2 a3 d1 d2 g1 g2 p1 p2 p3 p4

Movie4

0 0 0 1 0 0 0 1 0 1 0 5

Movie5

0 1 1 0 1 0 1 0 1 0 1 3

Details

Pages
Type of Edition: Originalausgabe
Publication Year: 2015
ISBN (PDF): 9783954899371
File size: 830 KB
Language: English
Publication date: 2015 (June)
Keywords: Computer Science hybrid recommendation Weka
Product Safety: Anchor Academic Publishing

Author

Mohammad Amin Rashidifar (Author)

HYBRID MOVIE RECOMMENDERS BASED ON NEURAL NETWORKS AND DECISION TREES

Summary

Excerpt

Table Of Contents

Details

Author

Mohammad Amin Rashidifar (Author)