How were the qualifying answer sets were formed?
The number of ratings (from 1 to 5) given in 2006 per movie and per user was pulled from the Netflix ratings database, restricted to ratings given by people in the Prize dataset to movies in the Prize dataset. The set of movies were split randomly into two sets, one per task, resulting in 6822 movies for the "Who Rated What in 2006" task, and 8863 movies for the "How Many Ratings in 2006" task. For the "Who Rated What in 2006" task, a set of 100,000 (user_id, movie_id) pairs were generated by picking movie and user ids at random, restricted to the 6822 movie_ids in that task's set but for all the users in the Netflix Prize dataset. The probability of picking any given movie was proportional to the number of ratings that movie received in 2006; the probability of picking any given user was proportional to the number of ratings that user gave in 2006. Pairs that corresponded to ratings in the existing Netflix Prize dataset were discarded. Each selected (user_id, movie_id) pair was then looked up in the Netflix ratings database to see if the user rated that movie at any time during 2006.
Are we allowed to use external sources of information about the movies?
Do I have to submit an algorithm description?
Only some top-ranked teams will be invited to submit workshop papers describing their algorithms.