Brave Announces AI Search Engine – Shares Insights For SEO via @sejournal, @martinibuster

Brave announced their new privacy-focused AI search engine called Answer with AI that works with its own search index of billions of websites. Their current search engine already serves 10 billion search queries per year which means that Brave’s AI-powered search engine is now one of the largest AI search engines online.

Many in the search marketing and ecommerce communities have expressed anxiety about the future of the web because of AI search engines. Brave’s AI search engine still shows links and most importantly it does not by default answer commercial or transactional queries with AI, which should be good news for SEOs and online businesses. Brave values the web ecosystem and will be monitoring website visit patterns.

Search Engine Journal spoke with Josep M. Pujol, Chief of Search at Brave who answered questions about the search index, how it works with AI and most importantly, he shared what SEOs and business owners need to know in order to improve rankings.

Answer With AI Is Powered By Brave

Unlike other AI search solutions, Brave’s AI search engine is powered completely by its own search index of crawled and ranked websites. The entire underlying technology, from the search index to the Large Language Models (LLMs) and even the Retrieval Augmented Generation (RAG) technology is all developed by Brave. This is especially good from a standpoint of privacy and it also makes the Brave search results unique, further distinguishing it from other me-too search engine alternatives.

Search Technology

The search engine itself is all done in-house. According to Josep M. Pujol, Chief of Search at Brave:

“We have query-time access to all our indexes, more than 20 billion pages, which means we are extracting arbitrary information in real-time (schemas, tables, snippets, descriptions, etc.). Also, we go very granular on what data to use, from whole paragraphs or texts on a page to single sentences or rows in a table.

Given that we have an entire search engine at our disposal, the focus is not on retrieval, but selection and ranking. Additionally, to pages in our index, we do have access to the same information used to rank, such as scores, popularity, etc. This is vital to help select which sources are more relevant.”

Retrieval Augmented Generation (RAG)

The way the search engine works is it has a search index and large language models plus Retrieval Augmented Generation (RAG) technology in between that keeps the answers fresh and fact-based. I asked about RAG and Josep confirmed that’s how it works.

He answered:

“You are correct that our new feature is using RAG. As a matter of fact, we’ve already been using this technique on our previous Summarizer feature released in March 2023. However, in this new feature, we are expanding both the quantity and quality of the data used in the content of the prompt.”

Large Language Models Used

I asked about the language models in use in the new AI search engine and how they’re deployed.

“Models are deployed on AWS p4 instances with VLLM.

We use a combination of Mixtral 8x7B and Mistral 7B as the main LLM model.

However, we also run multiple custom trained transformer models for auxiliary tasks such as semantic matching and question answering. Those models are much smaller due to strict latency requirements (10-20 ms).

Those auxiliary tasks are crucial for our feature, since those are the ones that do the selection of data that will end up being on the final LLM prompt; this data can be query-depending snippets of text, schemas, tabular data, or internal structured data coming from our rich snippets. It is not a matter of being able to retrieve a lot of data, but to select the candidates to be added to the prompt context.

For instance, the query “presidents of france by party” processes 220KB of raw data, including 462 rows selected from 47 tables, 7 schemas. The prompt size is around 6500 tokens, and the final response is a mere 876 bytes.

In short, one could say that with “Answer with AI” we go from 20 billion pages to a few thousand tokens.”

How AI Works With Local Search Results

I next asked about how the new search engine will surface local search. I asked Josep if he could share some scenarios and example queries where the AI answer engine will surface local businesses. For example, if I query for best burgers in San Francisco will the AI answer engine provide an answer for that and links to it? Will this be useful for people making business or vacation travel plans?

Josep answered:

“The Brave Search index has more than 1 billion location-based schemas, from which we can extract more than 100 million businesses and other points of interest.

Answer with AI is an umbrella term for Search + LLMs + multiple specialized machine learning models and services to retrieve, rank, clean, combine and represent information. We mention this because LLMs do not make all the decisions. As of now, we use them predominantly to synthesize unstructured and structured information, which happens in offline operations as well as in query-time ones.

Sometimes the end result feels very LLM-influenced (this is the case when we believe the answer to the user question is a single Point of Interest, e.g. “checkin faro cuisine”, and other times their work is more subtle (e.g.”best burgers sf”), generating a business description across different web references or consolidating a category for the business in a consistent taxonomy.”

Tips For Ranking Well

I next asked if using Schema.org structured data was useful for helping a site rank better in Brave and if he had any other tips for SEO and online businesses.

He answered:

“Definitely, we pay special attention to schema.org structured data when building the context of the LLM prompt. The best is to have structured data about their business (standard schemas from schema.org). The more comprehensive those schemas are, the more accurate the answer will be.

That said, our Answer with AI will be able to surface data about the business not in those schemas too, but it is always advisable to repeat information in different formats.

Some businesses only rely on aggregators (Yelp, Tripadvisor, Yellow Pages) for their business information. There are advantages to adding schemas to the business web site even if only for crawling bots.”

Plans For AI Search In The Brave Browser

Brave shared that at some point in the near future they will integrate the new AI search functionality directly in the Brave Browser.

Josep explained:

“We plan to integrate the AI answer engine with Brave Leo (the AI assistant embedded in the Brave browser) very soon. Users will have the option to send the answer to Leo and continue the session there.”

Other Facts

Brave’s announcement also shared these facts about the new search engine:

“Brave Search’s generative answers are not just text. The deep integration between the index and model makes it possible for us to combine online, contextual, named entities enrichments (a process that adds more context to a person, place, or thing) as the answer is generated. This means that answers combine generative text with other media types, including informational cards and images.

The Brave Search answer engine can even combine data from the index and geo local results to provide rich information on points of interest. To date, the Brave Search index has more than 1 billion location-based schemas, from which we can extract more than 100 million businesses and other points of interest. These listings—larger than any public dataset—mean the answer engine can provide rich, instant results for points of interest all over the world.”

Read the official announcement:

Brave Unveils New Privacy-Focused AI Answer Engine, Set to Handle Nearly 10 Billion Annual Queries

Try out the new AI search at http://search.brave.com/

Leave a Reply

Your email address will not be published. Required fields are marked *