Facebook AI Effort Looks to Extend Search Engines
Facebook is rolling out a "deep-learning based text understanding engine" that leverages neural network architectures and promises to challenge the Google search engine by bringing "near-human accuracy" to the process of sorting through the daily flood of social media posts on its web site.
Facebook's (NASDAQ: FB) AI initiative called Deep Text announced in a blog post last week also could advance the state of the art in terms of organizing the flood of unstructured text data generated by the leading social media site. The approach also seeks to expand current natural language processing (NLP) techniques that are frequently tripped up by slang.
Among the early goals of the search effort is ferretting out relevant information from social media posts that would link sellers with buyers.
The company said Deep Text uses convolutional and recurrent neural network approaches to perform word- and character-level learning. The scheme also employs a mathematical concept that preserves the semantic relationship among words. So, when calculated properly, we can see that the word embeddings of brother and bro are close in space. This type of representation allows us to capture the deeper semantic meaning of words. Torch to train neural networks along with its "AI backbone called Fb Learner Flow. Trained models are served via its FbLearner Predictor platform.
The social media giant noted last month in rolling out FbLearner that the machine learning models are currently used for ranking and personalizing news feed stories, filtering out offensive content, highlighting trending topics and ranking search results. The company said FBLearner Flow is used by more than 25 percent of its engineering team, with more than 1 million models have been trained so far. The prediction service is now capable of more than 6 million predictions per second, Facebook claims.
"Text understanding on Facebook requires solving tricky scaling and language challenges where traditional NLP techniques are not effective," Facebook researchers noted. "Using deep learning, we are able to understand text better across multiple languages and use labeled data much more efficiently than traditional NLP techniques."
Alan Packer, director of engineering for Facebook's Language Technology team, heads its AI efforts that include machine translation, speech recognition and natural language understanding. Packer told the Structured Data conference in March that Facebook's language capability covering more than 40 languages was among its early AI-based services.
Part of the Deep Text effort involves building its capability to understand more languages by moving beyond traditional natural language processing. "Using deep learning, we can reduce the reliance on language-dependent knowledge, as the system can learn from text with no or little preprocessing," the company said. "This helps us span multiple languages quickly, with minimal engineering effort."
NLP converts words into formats that computer algorithms can learn. Hence, each word requires an exact spelling in order for training data to be understood. That makes understanding slang and "word-sense disambiguation" problematic.
The Deep Text approach instead uses what Facebook calls "word embeddings," a mathematical concept that seeks to preserve the semantic relationship among words. "When calculated properly, we can see that the word embeddings of 'brother' and 'bro' are close in space. This type of representation allows us to capture the deeper semantic meaning of words," Facebook engineers noted.
Along with language translation, Deep Text is deployed in Facebook features such as Messenger. In terms of AI-based search functions, the company wants to leverage Deep Text for applications that would extract a Facebook user's intentions, including for example the desire to sell something.
One technical hurdle to developing AI-based search functions is the requirement for massive amounts of labeled data. "While such data sets are hard to produce manually, we are testing the ability to generate large data sets with semi-supervised labels using public Facebook pages," the company noted.
It also is exploring new deep neural network architectures. One promising approach called bidirectional recurrent neural networks is designed to capture contextual dependencies between words. Error rates have in some cases been reduced to as low as 20 percent, outperforming current convolutional or recurrent neural nets.