SEO: Entity Search, Brand Mentions, Semantics & the Panda Patent

“Brand Authority” has been a hot topic in SEO for a long time. But how can we measure it in a meaningful way?

Every good search engine marketer already uses targeted, valuable and engaging content as a means to generate the signals that Google want to see. In other words they are arguably thinking about growing the authority of the brand as a whole; and only doing SEO as a by-product.

The idea of writing web content for the reader, not Google; the process of generating online buzz from social shares and high-quality, relevant back-links; the knowledge that editorially discrete web directories are more valuable in SEO than inactive ‘link farms’… all of these philosophies and approaches have been born out of the understanding that good brand marketing is the best foundation for good search engine marketing.

But the precise mechanics within Google’s complex algorithm that can measure something as woolly and intangible as ‘brand presence’ have never fully revealed themselves to those in the SEO industry. We’ve had to roll with hunches and educated guesses. However, we have just been given a clear (albeit indirect) glimpse through a recently filed Google patent.

The Panda Patent?

Many people are calling it the Panda Patent because its focus appears to be the the same as the infamous 2011 Panda update: to improve search engine results by better determining the quality (or conversely the ‘spammyness’) of websites. Google haven’t outright stated that the patent has anything to do with Panda. Instead, the patent itself is humbly titled ‘Ranking Search Results’.

Of course you have to understand that the best weapon Google has against search result spammers (i.e. those employing dodgy, black hat techniques to manipulate search results) is total secrecy. They can’t just give away the keys to their precious algorithm. Google have filed many patents and no doubt the Panda algorithm technology is split between several of them. As I’ve said though, this most recent patent really gets to the core of Panda’s end-goals. In Google’s own words, Panda was designed to:

“Reduce rankings for low-quality sites—sites which are low-value add [sic] for users, copy content from other websites or sites that are just not very useful. At the same time, it will provide better rankings for high-quality sites—sites with original content and information such as research, in-depth reports, thoughtful analysis and so on.”

— Google Blog, February 2011

The recent patent is wordy and complicated, written in much more confusing language. However there are several sections which talk directly about:

  • “a respective count of reference queries”
  • “a respective count of independent incoming links”
  • “a group-specific modification factor … based on the count of independent links and the count of reference queries”

In other words, this is a patent for scoring websites based on keyword density, the volume of inbound links from unique and unrelated sources, the ratio between those two and the relevance between them also. Logically, a high volume of links to low keyword density content would indicate paid link-building. Too many keyword mentions and not enough links would indicate keyword stuffing. So then… this is basically describing the bare bones of Panda with a little bit of Penguin mixed in for good measure.

What does this mean for you? The only way forward is to earn links by creating valuable content and marketing it in a relevant context. Nothing new in that regard. But that’s not all the patent hints at. Let’s take a closer look at two particular phrases they use: “independent incoming links” and “reference queries”…

Entity Search

The most eye-opening thing in the patent is that Google frames the whole thing around “a plurality of groups of resources”. In plain English, this means Google’s focus is on ranking sets of related web content rather than single websites or pages. At first glance, that seems quite innocuous. However, reading between the lines, they are telling us that different pieces of content from different websites are being grouped together in the algorithm according to the ‘entities’ that own and manage them.

It is important to understand that an entity can be a commercial brand, a person, an anchor for content release or a focal point for market attention… a ‘node’ if you will. Think of your business website for example… how many social media profiles, directory listings, map listings and other ‘web estate’ properties link to it? All of those share a correlation with the ‘entity’ of your main website — your brand. However, many are representative of entities in their own right too. You are an entity. Your co-workers are entities. Each social network or directory is an entity.

“Independent links” and Entities

An ‘independent’ incoming link is defined as coming from a separate entity, not necessarily just a separate website. Google might identify a swathe of content as being from one entity if the pages in question all have the same or similar CSS/formatting, repetition of a brand name, shared hosting, shared content or other obvious parallels. User activity will also no doubt throw up some connections between different pieces of content( i.e. certain search queries resulting in certain click-throughs and behaviour).

“Reference queries” and Entities

People often search for terms that have no place in a web page’s content. To use an example, people don’t always look for our client Moddershall Oaks Country Spa Retreat by searching that full name; they search instead for truncated phrases like “moddershall oaks” even just “moddershall spa”. But Moddershall is a place in its own right, not just a brand name. But because people who search those truncated terms frequently end up engaged on the Moddershall Oaks website, Google learns that “moddershall spa” and similar phrases are ‘reference queries’ for the Moddershall Oaks brand entity.

The connections between different independent entities and their reference queries are therefore fairly easily understood. But one question still remains: How can Google perceive the relative authority and trust earned by different entities and rank them accordingly? The old-school reliance on a ‘link-based economy’ is fundamentally flawed by aggressive link-building and social media spamming which both skew results. Plus, not all buzz around an entity necessarily generates back-links. The resulting balance is a situation where spammy sites can earn unfair ranking advantages while honest sites underperform in search.

Semantics, Entities and Brand Mentions

To close down the loop-holes in the link-based search economy, Google’s recent patent appears to discuss a more semantic approach. Their last algorithm overhaul—when they introduced Hummingbird—was an important milestone in semantic search technology.  We can be sure that semantics will play a key role in Google’s future developments.

So what does it mean to be semantic? Well, semantics is defined as:

“The branch of linguistics and logic concerned with meaning”

A semantic search engine is therefore one which understands context and meaning in search queries. Semantic technology uses a concept called a ‘Triple’ to explain logical connections. A Triple contains the three pieces of information required to form a cohesive statement of logic, for example a sentence — these are the entity (sound familiar?), the object and the predicate.

“Attain Design create amazing websites”. In that sentence, ‘Attain Design’ is the entity, ‘amazing websites’ are the object and ‘creates’ is the predicate showing the relationship between the two. Google’s semantic algorithm picks up on all of that and saves the Triple in a database called a Triplestore. Any of those three bits of information can then be cross-referenced to help Google determine the context of search queries.

Using this semantic technique, combined with the definition of separate entities and reference queries as previously described, Google is now able to return results pages which closer resemble answers to questions rather than simple lists of keyword-matched web pages.

Brand Mentions & The Nofollow Theory

So Google is now picking up on a multitude of unofficial phrases people are using to refer to brands (or ‘entities’). Semantically speaking, ‘Attain Web Design’ and ‘Attain Design’ are seen as the same query with regards to our website, returning us as the first result. However, the results around us are different for each term. That’s entity search (aka brand search) in action.

The reason this is so interesting is that a mention of a brand name (including a ‘reference query’ name) can therefore be picked up by Google even without a back-link to definitively attribute it to the brand’s website. Implied within the patent is the notion that Google is now looking at these brand mentions as ranking factors. After all, as I mentioned before, web pages won’t always link to you just because they are talking about you.

This reminds me of a contentious subject that has been argued to death in the SEO industry over the last few years: the ‘nofollow’ link. Basically, when a person creates a link to website X on their website (let’s call it website A), they can mark it as ‘no follow’ to tell search engines that the content of the website X should not be considered when indexing and ranking website A. More specifically, it tells the ‘search bots’ (which ‘crawl’ the site) to not get distracted by following that link but rather to skip over it and keep crawling the rest of site A as if the link wasn’t there.

For example;

<a href="" rel="nofollow">anchor_text</a>

While I don’t usually get bogged down with the relative importance of individual tags and HTML elements as ranking factors (opting instead to just market websites properly, generating SEO signals as a by-product) I have to say I’ve been fascinated with the experiments people have done to see exactly how valuable a ‘nofollow’ link actually is. The conclusion I’ve come to is that they add nothing from a keyword-specific perspective, but they do count towards a general kind of overall authority. It’s just logical that any link to your site—’nofollow’ or not—is still an indication that somebody feels your content is valuable, even if it’s not relevant.

The idea that someone can help you in search results just by mentioning your brand name without linking to you is just an extension of that same theory. The beauty is that it takes the pressure of businesses to go on crazed link-building crusades and allows them to simply market themselves with little or no SEO agenda, but still get the SEO benefits.

But there is a very important caveat here. The mantra for every area of SEO is: “all things in moderation”. Too many nofollow links won’t look great. That would tell Google that nobody wanted to be associated with you even if they do have to link to you. The same presumably goes for brand mentions in that if people talk about you, some should surely link to you too.


The term that resonated with me the most in the patent was ‘implied links‘. Not actual links, but ‘implied’. Since the first blogs I wrote here, I’ve been banging the drum for honest, transparent, valuable marketing aimed at the audience rather than the search engines. Everything discussed here just backs that up; except this time there is enough evidence to elevate that sentiment from theory to fact.

Assuming you’re already using honest, transparent techniques to target valuable content at relevant market segments, your strategy doesn’t need to change. However, your perspective should. The key to success moving forward will be understanding the goals of semantic technology and the concept of the ‘entity‘ in search. Only when you look through the same lens as Google can you expect to see the web in the same way.

Posted by Paul

Paul is our Head of Marketing and strategy specialist. He has worked on award-winning campaigns for household-name brands, SMEs and public sector organisations since 2003.

