YouTube Video Recommendation System using Machine Learning

FREE Online Courses: Transform Your Career – Enroll for Free!

With this Machine Learning Project, we will build a youtube recommendation system. This project is similar to the Movie Recommendation system. In this project, we will be using Collaborative Filtering.

So, let’s build this system.

Youtube Recommendation System

A recommendation system, sometimes known as a recommendation engine, is a model for information filtering that aims to anticipate user preferences and make recommendations in accordance with those predictions. These technologies are currently widely used in various industries, including movies, music, books, videos, apparel, restaurants, food, locations, and other utilities. These systems gather data on a user’s preferences and behavior, which they then employ to enhance their future suggestions.

The ability to filter through enormous information spaces and choose the items that are most likely to be interesting and appealing to a user is what distinguishes recommender systems (RS) from other types of software. The three main categories of recommendation approaches are collaborative filtering, content-based, and hybrid methods. The popular content-based techniques suggest products with content traits resembling those of products a user has previously enjoyed.

For instance, news suggestions look for similarities between words or keywords in articles. It is important to have access to data regarding the content attributes of the object’s requirement for content-based filtering. Such attributes are linked to the items as either structured or unstructured meta-information in most current systems. For instance, several RS in the movie domain take into account movie genre, director, cast, or plot, tags, and textual reviews (unstructured information). Unlike other approaches, our method relies on “implicit” content characteristics of objects so that the traits of the items must be “extracted” computationally from them.

A crucial strategy in recommender systems is a content-based recommendation. The core premise is to recommend products comparable to those users previously enjoyed. A content-based recommender system’s primary goal is to determine how similar two things are. There are numerous approaches for the model objects, with the Vector Space Model being the most well-known. The model extracts the item’s keywords and determines the weight using TF-IDF.

Many recommender systems use keywords for the model items. This is a really crucial step. However, collecting keywords from an item can be challenging, particularly in the media industry where it can be challenging to extract text keywords from videos. There are two major approaches to solving this kind of issue. One involves letting users tag the things, while the other involves specialists. For example, we use Pandora for music and for movies, Jinni, expert tagging systems. Take Jinni as an example; its researchers designated more than 900 tags as “movie genes” and allowed movie industry professionals to create tags for them. These tags represent various categories, such as movie genre, narrative, time, location, and cast. The Kung Fu Panda movie’s tags are Jinni. As seen in the figure, Kung Fu Panda’s tags are divided into a total of ten categories: mood, plot, genres, time, place, audience, praise, style, attitudes, and look. All relevant movie information is included in these tags, making it possible to describe a film accurately.

Approach

A weighted similarity measure based on evolutionary algorithms and fuzzy K-means clustering were combined in the hybrid approach to present an integrative way of building a movie recommendation system. The proposed movie recommendation system provides more accurate similarity measurements and higher-quality recommendations than the current movie recommendation system, although it requires more computing time. But this issue can be solved by using the clustered data points as an input dataset. The suggested solution aims to enhance the quality and scalability of the movie recommendation system. By combining Content-Based Filtering and Collaborative Filtering, we use a hybrid technique that allows the two approaches to complement one another. We employed the cosine similarity measure to quickly and effectively compute the degree of similarity between the various movies in the dataset and to shorten the computation time of the movie recommender engine.

Collaborative Filtering

Collaborative filtering (CF), is a well-known recommendation algorithm that based its forecasts and suggestions on the evaluations or actions of other system users. The baseline of this approach is that it is possible to pick and combine the opinions of other users in a way that yields a reliable forecast of the active user’s choice. They use the intuitive assumption that if users have the same opinions on some goods, they will likely have the same opinions on other items as well. There are additional ways to provide recommendations, like using metadata to locate goods that are textually similar to those that a user has previously enjoyed (content-based filtering or CBF). Although content-based filtering will occasionally appear in our discussion regarding solving a particular recommender system challenge, this survey focuses on learning how to use collaborative filtering for video recommendation.

Based on commonalities with other users, collaborative filtering algorithms assess users’ behavior and preferences to anticipate what they would like. Collaborative filtering systems come in two varieties: user-based recommenders and item-based recommenders. Use-based filtering is used when we are creating tailored systems, user preferences are frequently taken into account. This strategy is based on the preferences of the user. Users first-rate some movies (1–5) before the procedure begins. Both implicit and explicit ratings are possible. When a person expressly ranks an item on a scale or gives it a thumbs-up or thumbs-down, the rating is known as an explicit rating. Often, it is difficult to acquire explicit reviews because not all users are keen on leaving comments. We collect implicit ratings based on their actions in various instances.

For instance, a user’s repeated purchases of a product show a favorable preference. In relation to movie systems, we can infer that a user has some likeability to the film if they watch the entire thing. Keep in mind that there are no precise guidelines for establishing implicit ratings. Next, we identify a specific number of nearest neighbors for each user. Using the Pearson Connection method, we determine the correlation between user ratings. When we recommend things to consumers, it is assumed that if two users’ evaluations are significantly connected, they must have similar tastes in goods.

Item-based filtering: item-based filtering concentrates on the similarities between the items that users prefer rather than the users themselves not like user-based filtering.

Project Prerequisites

The requirement for this project is Python 3.6 installed on your computer. I have used Jupyter notebook for this project. You can use whatever you want.
The required modules for this project are –

Graphlab(2.1.0) – pip install graphlab
Pandas(1.5.0) – pip install pandas

That’s all we need for our project.

YouTube Video Recommendation Project

We provide the dataset and source code for the youtube video recommendation project. We have two csv files that contain the names of the videos; these files also contain the parameters like the number of likes, views on the video, dislikes on the video, etc. Please download youtube video recommendation project from the following link: YouTube Video Recommendation Project

Steps to Implement

1. Import the Modules and import the dataset.

from nltk.corpus import wordnet as wn
print("start")
listnames = []
 
for i,j in enumerate(wn.synsets('funny')):
    listnames.append(j.lemma_names())
print listnames 
 
 
import pandas as pd 
import graphlab

2. Here, we are reading our dataset and dropping the column title.

dataframe = pd.read_csv('dataset.csv', encoding='utf-8')
dataframe = dataframe.drop_duplicates(['v_title'])
dataframe.head()

3. Here, we are reading another dataset file. Then we use a shuffle function to shuffle the dataset.

 
import pandas as pd 
import graphlab # Need to register for this library but it comes for free to university students 
 
newdf = pd.read_csv('users5times100Videos.csv', encoding='utf-8')
from sklearn.utils import shuffle
newdf = shuffle(newdf) #Shuffling the dataset in order to randomize the data
newdf

4. We are doing this to create a random testing and training dataset every time. After this, we are dividing the dataset into training and testing datasets. Also, we are changing the columns in the dataset so that it will be easy to read the dataset.

train_set = df2.ix[:250,:]#creating the training dataset
test_set = df2.ix[250:,:]#creating the testing dataset
data_train = graphlab.SFrame(train)#creating the training dataset
data_test = graphlab.SFrame(test)#creating the testing dataset

5. Here we are creating a popularity recommender model and we are passing our parameters to the model. Then we create an array of usernames.

user_names = ['Kathir','MSD','MarkZ','Chris','Sundhar','Patrick']
popularity_recomm = popularity_model.recommend(users=user_names,k=10)
popularity_recomm.print_rows(num_rows=25)

6. Here we are using youtube video recommendation model to recommend us the videos and we are printing it.

famous_recom = famous_video.recommend(users=names,k=10)

7. Here we are printing the training data by title in descending order.

train_set.groupby(by='v_title').head(20)#printing the dataset in descending order

8. Here we are creating youtube video recommendation model and training the model. We are passing the training dataset to the function and we are using the cosine function in collaborative filtering. There are other options as well. You can also use Pearson Similarity and Jaccard Similarity as well in this project. But we are using the Cosine Similarity. You can also check out the other similarity types and see how your model is performing with different similarity types.

 
recom= graphlab.item_similarity_recommender.create(train_set, user_id='users', item_id='v_title', target='Liked', similarity_type='cosine')#creating a model of recommender and passing some parameters
 
recom = recom.recommend(users=user_names,k=10)#making the recommendations
recom.print_rows(num_rows=25)#printing the rows
 
model = graphlab.compare(test_data, [famous_model, recom])
graphlab.show_comparison(model ,[famous_model, recom])

Summary

In this Machine Learning project, we built youtube recommendation system. This is similar to the movie recommendation system. We have used Collaborative Filtering in this project.