The Good Tech Companies - Stop Scrolling, Start Building: Create Your Own AI Movie Recommender

Starting point is 00:00:00 This audio is presented by Hacker Noon, where anyone can learn anything about any technology. Stop scrolling, start building, create your own AI movie recommender. By Superlinked, greater than, I said I wanted a B-movie damn it, an end to endless scrolling, and arguments over what to watch. Tired of endlessly scrolling through Netflix, unsure of what to watch next? What if you could build your own custom, AI-driven recommendation system that predicts your next favorite movie with precision? And in this tutorial,

Starting point is 00:00:30 we'll guide you through the process of creating a movie recommendation system using Vector databases, Vector DBs. You'll learn how modern AI recommendation engines work and get hands-on experience building your own system with Superlinked. And, want to jump straight to the code? Check out our repo on GitHub here.

Starting point is 00:00:47 Ready to try recommender systems for your own use case? Get a demo here. Let's get recommending. We'll be following this notebook throughout the article. You can also run the code straight from your browser using Colab. Netflix's recommendation algorithm does a pretty good job of suggesting relevant content, given the sheer volume of options, approximately 16k movies and TV programs in 2023, and how quickly it has to propose shows to users. How does Netflix do it? In a word, semantic search. Semantic search comprehends the meaning and context, both attributes and consumption patterns,

Starting point is 00:01:22 behind user queries in movie, TV show descriptions, and can therefore provide better personalization in its queries and recommendations than traditional keyword-based approaches. But semantic search poses certain challenges, foremost among them. 1. Ensuring accurate search results. 2. Interpretability.

Starting point is 00:01:40 And 3. Scalability. Challenges any successful content recommendation strategy will have to address. Using Superlinked's library, you can overcome these difficulties. In this article, we'll show you how to use the Superlinked library to set up your own semantic search and generate a list of relevant movies based on your preferences. Semantic Search, Challenges Semantic Search conveys a lot of value in vector search but poses three significant vector embedding challenges for developers.

Starting point is 00:02:08 Quality and relevance. Ensuring that your embeddings accurately capture the semantic meaning of your data requires careful selection of embedding techniques, training data, and hyperparameters. Poor quality embeddings can lead to inaccurate search results and irrelevant recommendations. Interpretability. High-dimensional vector spaces are too complicated to be easily understood. To gain insights into the relationships and similarities encoded within them, data scientists have to develop methods to visualize and analyze them. Scalability. Managing and processing high-dimensional embeddings,

Starting point is 00:02:42 especially in large datasets, constrain computational resources and increase latency. Efficient methods for indexing, retrieval, and similarity computation are essential to ensure scalability and real-time performance in production environments. The SuperLink library enables you to address these challenges. Below, we'll build a content recommender, specifically for movies, starting with InformationWay have about a given movie, embed this information as a multimodal vector, build out a searchable vector index for all our movies, and then use query weights to tweak our results and arrive at good movie recommendations.

Starting point is 00:03:17 Let's get into it, creating a fast and reliable search experience with Superlinked. Below, you'll perform a semantic search on the Netflix movie dataset using the following elements of the Superlinked library. Recency space, to understand the freshness, currency and relevancy of your data, identifying newer movies. Text similarity space, to interpret the various pieces of metadata you have about the movie, such as description, title, and genre. Query time weights, letting you choose what's most important in your data when you run the query, thereby optimizing without needing to re-embed the whole dataset, do post-processing, or employ a custom re-ranking model, i.e. reducing latency, the Netflix dataset, and

Starting point is 00:03:59 what we'll do with IT success fully recommending movies is difficult mostly because there are so many options, greater than 9000 titles in 2023, and users want recommendations on demand, immediately. Let's take a data-driven approach to find something we want to watch. In our dataset of movies, we know the, description, genre, title, release underscore year, we can embed these inputs, and put together a vector index on top of our embeddings, creating a space we can search semantically. Once we have our indexed vector space, we will, first, browse the movies, filtered by an idea, heartfelt romantic comedy. Next, tweak the results, giving more importance to matches in certain input fields, i.e.

Starting point is 00:04:41 waiting, then, search in description, genre, and title with different search terms for each. And, after finding a movie that's a close but not exact match, also search around using that movie as a reference. Installation and dataset preparation Your first step is to install the library and import the requisite classes. Note. Below, change to if you're running this in Google Colab. Keep MIME type if you're

Starting point is 00:05:06 executing in GitHub. We also need to prep the data set, define time constants, set the URL location of the data, create a data store dictionary, read the CSV into a pandas data frame, clean the data frame and data so it can be searched properly, and do a quick verification and overview. See cells 3 and 4 for details. Now that the dataset is prepared, you can optimize your retrieval using the Superlink library. Building out the index for VectorSearch Superlinked's library contains a set of core building blocks that we use to construct an index and manage retrieval. You can read about these building blocks in more detail here. First, you need to define your schema to tell the system about your data.

Starting point is 00:05:47 Next, you use spaces to say how you want to treat each part of the data when embedding. Which spaces are used depends on your data type. Each space is optimized to embed the data so as to return the highest possible quality of retrieval results. In space definitions, we describe how the inputs should be embedded in order to reflect the semantic relationships in our data. Once you've set up your spaces and created your index, you use the source and executor parts of the library to set up your queries. See cells 10 to 13 in the notebook. Now that the queries are prepared, let's move on to running queries and optimizing retrieval by adjusting weights. Understanding recency, and how to use it in super-length-to-recency space lets you alter the results of your query by preferentially pulling in older or newer releases from your dataset. We use 4, 10, and 40 years ASR period times so that we can give years with more titles more focus.

Starting point is 00:06:39 See cell 5. Notice the breaks in the score at 4, 10, and 40 years. Titles older than 40 years get a score. Reviewing and optimizing search results using different query time weightslays define a quick util function to present our results in the notebook. Simple and advanced queries the SuperLink library lets you perform various kinds of queries, here we define two. Both of our query types of query, simple and advanced. Let me weigh individual spaces, description, title, genre, and of course recency, according to my preferences.

Starting point is 00:07:13 The difference between them is that with a simple query, I set one query text and then surface similar results in the description, title, and genre spaces. With an advanced query, I have more fine-grained control. If I want, I can enter different query texts in each of the description, title, and genre spaces. Here's the query code, simple query in simple queries. I set my query text and apply different weights depending on their importance to me. Our results contain some titles I've already seen.

Starting point is 00:07:42 I can deal with this by weighting recency to bias my results towards recent titles. Weights are normalized to have a unit sum, i.e. all weights are adjusted so they always sum up to a total of 1, so you don't have to worry about how you set them. My results, above, are now all post 2021. Using the simple query, I can weight any specific space, description, title, genre, or recency, to make it count more when returning results. Let's experiment with this. Below, we'll give more weight to the genre and down weight title. My query text is basically just a genre with some additional context. I keep my recency as is because I'd still like my results to be biased towards recent movies. This query pushes the release year back a little to give me more genre-weighted results,

Starting point is 00:08:29 below. Advanced query The advanced query gives me even more fine-grained control. I retain control over recency, but can also specify search text for description, title, and genre, and assign each a specific weight according to my preferences, for below, and sells 19-21. Search using a specific movie say in my last movie results, I found a movie I've already seen and would like to see something similar. Let's assume I like White Christmas, a 1954 romantic comedy, it equals TM 16479, about singer-dancers coming together for a stage show to draw guests to a struggling Vermont Inn. By adding an extra clause, with app parameter, to advanced underscore query, with underscore

Starting point is 00:09:12 movie underscore query lets me search using this movie, or any movie I like, and gives me all the fine-grained control of separate subsearch query text and waiting. First, we add our movie underscore id parameter, and then I can set my other subsearch query text and waiting. First, we add our movie underscore id parameter, and then I can set my other subsearch queries either to empty or whatever's most relevant, along with any weightings that make sense. Let's say my first query returns results that reflect the stage performance, band aspect of White Christmas, see cell 24, but I want to watch a movie that's more family oriented.

Starting point is 00:09:43 I can enter a description underscore query underscore text to skew my results in the desired direction. But now that I see my results, I realize I'm actually more in the mood for something lighthearted and funny. Let's adjust my query accordingly okay, those results are better. I'll pick one of these. Put the popcorn on, conclusion. Superlink makes it easy to test, iterate, and improve your retrieval quality.

Starting point is 00:10:06 Above, we've walked you through how to use the Superlink library to do a semantic search on a vector space, the way Netflix does, and return accurate, relevant movie results. We've also seen how to fine-tune our results, tweaking weights and search terms until we get to just the right outcome. Now, try out the notebook yourself, and see what you can achieve, try it yourself, get the code and demo. Floppy disk grab the code, check out the full implementation in our GitHub repo here. Fork it, tweak it, and make it your own.

Starting point is 00:10:37 Rocket see it in action. Want to see this working in a real world setup? Book a quick demo, and explore howLength can supercharge your recommendations. Get a demo now! Recommendation engines are shaping the way we discover content. Whether it is movies, music, or products, Vector Search is the future, and now you have the tools to build your own. Author. More Capronze thank you for listening to this Hacker Noon story, read by Artificial Intelligence. Visit HackerNoon.com to read, write, learn and publish.

The Good Tech Companies - Stop Scrolling, Start Building: Create Your Own AI Movie Recommender

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.