Scalable Query N-Gram Embedding for Improving Matching and Relevance in Sponsored Search
Xiao Bai (Yahoo Research); Erik Ordentlich (Yahoo Research); Yuanyuan Zhang (Yahoo Research); Andy Feng (NVIDIA); Adwait Ratnaparkhi (Yahoo Research); Reena Somvanshi (Oath); Aldi Tjahjadi (Oath)
Sponsored search has been the major source of revenue for commercial web search engines. It is crucial for a sponsored search engine to retrieve ads that are relevant to user queries to attract clicks as advertisers only pay when their ads get clicked. Retrieving relevant ads for a query typically involves in first matching related ads to the query and then filtering out irrelevant ones. Both require understanding the semantic relationship between a query and an ad. In this work, we propose a novel embedding of queries and ads in sponsored search. The query embeddings are generated from constituent word n-gram embeddings that are trained to optimize an event level word2vec objective over a large volume of search data. We show through a query rewriting task that the proposed query n-gram embedding model outperforms the state-of-the-art word embedding models for capturing query semantics. This allows us to apply the proposed query n-gram embedding model to improve query-ad matching and relevance in sponsored search. First, we use the similarity between a query and an ad derived from the query n-gram embeddings as an additional feature in the query-ad relevance model used in Yahoo Search. We show through online A/B test that using the new relevance model to filter irrelevant ads offline leads to 0.47% CTR and 0.32% revenue increase. Second, we propose a novel online query to ads matching system, built on an open-source big-data serving engine , using the learned query n-gram embeddings. Online A/B test shows that the new matching technique increases the search revenue by 2.32% as it significantly increases the ad coverage for tail queries.