Advanced Computing in the Age of AI | Thursday, June 8, 2023

Glean Taps into GPT-4 and Other LLMs for AI Enterprise Search Product 

OpenAI’s ChatGPT chatbot thrilled users when it was introduced in November, but things turned around fast: now top IT leaders are calling to halt the development of powerful AI systems.

That hasn’t stopped CIOs from plotting the use of generative AI to bring smarts and automation to enterprise environments, especially with IT budgets tightening up in an inflationary environment.

A high level of commercial interest is driving responsible AI development for corporations, which must meet compliance and regulatory rules. There is no room for error, and no space for moody AIs as was seen with ChatGPT and Microsoft's Bing with AI.

A new AI-based enterprise search product from startup Glean, which was launched on Tuesday, reflects the serious side of AI development, and trustworthy use of large-language models like GPT-4, which was introduced last month by OpenAI.

Glean’s search engine indexes documents in an enterprise, and has an AI-based interface where employees can ask questions and get responses in the form of answers or links to relevant corporate documents.

It will do "the hard work of figuring out... where information lives in your company, what information is still meaningful, and things like that, and really bring information back to you in a safe manner," said Arvind Jain, CEO of Glean.

The interface looks much like a Google search interface, and people can type in HR or project status questions in a search box. For employees needing help on specific topics, the tool can provide contacts to internal experts.

Source: Glean

The generative part of Glean's AI search engine provides context instead of answers pulled from documents. The large language models are being "tasked with actually synthesizing and summarizing a coherent answer," said Eddie Zhou, the founding engineer.

Automation bots have been around for a while. For example, Zendesk uses bots to email leads to salespeople by analyzing data search logs and website visitor geographies. Functionality like that is possible in Glean's interface, with the results showing up in the form of suggestions. That technology can be easily integrated.

Glean is available as a web app or a Chrome extension, Zhou said.

But Why?

Enterprises need bespoke search that respects their privacy, said Naveen Rao, CEO of MosaicML, a company that provides cost metrics on deploying AI models.

"All of these enterprise search companies – Glean, Neeva – are doing that. The challenge with using GPT-4 is maintaining this privacy. I do not think enterprises will be so ok with this," Rao said.

Employees are overloaded with multiple content and knowledge stores that do not have good native search tools, so AI search to find information would have a positive impact on productivity, said Yousuf Khan, partner at Ridge Ventures, a venture capital firm.

"The problem that needs to be solved is being able to help organize and present information to employees in a company using a generative AI search solution. The bottom line is that 'true' enterprise search does not exist," said Khan, who previously was CIO at Automation Anywhere.

CIOs have tried some approaches in the past in enterprise search, but with limited success.

"This will become a priority for CIOs, but only after solutions can provide a coherent message about why now is different – powerful computing, robust APIs, a growing set of data stores and most importantly, much-improved search algorithms available," Khan said.

How Glean Uses GPT-4 and LLMs

Glean's interface and technologies were inspired by Google, where both Jain and Zhou worked on core web search and AI technologies. Glean is using multiple language models for its search product, including the recent GPT-4 from OpenAI. The company for years has used transformer models that include BERT, which was created by Google.

The company’s deep-learning models train from documents across multiple sources, which could be Slack, Google Drive, Confluence, or data in AWS, Microsoft Azure, or Google Cloud. Glean has connectors to many popular SaaS applications.

The search product has an API model where it does a deep crawl of all the content and activity of data sources, and centralizes that into a customer’s cloud environment. The company then runs models tied to its search product on top of it.

"The search results or the recommendations – the results I get are only documents I actually have access to," Zhou said.

Customers can bring proprietary third-party data sets into the system to strengthen the learning model. That can be done while keeping the dataset walled off.

"They could simply index it as a custom data source with zero permissions to anyone so no one can ever find them, but it helps the model learn and that might be a use case," Zhou said.

This is relevant for banks, which are strengthening AI models with proprietary data sets on trends related to credit card usage or purchasing patterns from retailers. The data sets can be introduced as metadata to strengthen the learning model.

Glean can take the documents or passages, and make a live API call to a large language model of a customer's choice – it could be GPT-4 or Google's PaLM – and ask the resource to summarize or synthesize answers. PaLM has up to 540 billion parameters.

"That's where we're actually using the publicly available large language models. That is powering ... this experience on the top here," Jain said.

GPT-4 was announced by OpenAI last month, and accepts inputs in the form of text and images, with responses in the form of text. Glean is just getting started with GPT-4, and Zhou said that while its multimodal capabilities are exciting, the company also wants to understand how customers are getting work done.

"Right now, [customers] ask questions in a search bar, and how they interact with that might slowly change our source of knowledge," Zhou said, adding that the sources could be text, images, or video within enterprises.

"The source of knowledge that we connect to will always be multimodal. The kind of interface to the user might also vary, but we are just going to deliver an experience that kind of combines those two," Zhou said.

Quick Set Up

Companies can set up Glean environments on internal systems or hybrid cloud environments, and select what data goes into it system. Companies can also customize the look and feel of the environment.

Jain characterized Glean as a "category creation play," but other AI-based search products are already available. One such product, Kendra, is available from Amazon, which can understand questions and generate related answers. Kendra has an API approach and connects data sources, and is used for applications that include providing customer services to clients.

Glean wants to get customers started off the bat with Glean when compared to products like custom enterprise search engines or Amazon's Kendra.

"Yes, as a large enterprise, you will do some customization, but many large companies are able to actually use Glean without writing a single line of code," Jain said.

Large companies are dealing with privacy regulations such as GDPR, which requires data residing in specific regions. The finance and healthcare sectors also deal with regulations and compliance issues.

“We've been expanding to more customers, meeting more requirements, and regulations. It is a challenge we're definitely willing to meet,” Zhou said.