In a recent episode of Google’s Search Off the Record podcast, Allan Scott from the “Dups” team explained how Google decides which URL to consider as the main one when there are duplicate pages.
He revealed that Google looks at about 40 different signals to pick the main URL from a group of similar pages.
Around 40 Signals For Canonical URL Selection
Duplicate content is a common problem for search engines because many websites have multiple pages with the same or similar content.
To solve this, Google uses a process called canonicalization. This process allows Google to pick one URL as the main version to index and show in search results.
Google has discussed the importance of using signals like rel=”canonical” tags, sitemaps, and 301 redirects for canonicalization. However, the number of signals involved in this process is more than you may expect.
Scott revealed during the podcast:
“I’m not sure what the exact number is right now because it goes up and down, but I suspect it’s somewhere in the neighborhood of 40.”
Some of the known signals mentioned include:
- rel=”canonical” tags
- 301 redirects
- HTTPS vs. HTTP
- Sitemaps
- Internal linking
- URL length
The weight and importance of each signal may vary, and some signals, like rel=”canonical” tags, can influence both the clustering and canonicalization process.
Balancing Signals
With so many signals at play, Allan acknowledged the challenges in determining the canonical URL when signals conflict.
He stated:
“If your signals conflict with each other, what’s going to happen is the system will start falling back on lesser signals.”
This means that while strong signals like rel=”canonical” tags and 301 redirects are crucial, other factors can come into play when these signals are unclear or contradictory.
As a result, Google’s canonicalization process involves a delicate balancing act to determine the most appropriate canonical URL.
Best Practices For Canonicalization
Clear signals help Google identify the preferred canonical URL.
Best practices include:
- Use rel=”canonical” tags correctly.
- Implement 301 redirects for permanently moved content.
- Ensure HTTPS versions of pages are accessible and linked.
- Submit sitemaps with preferred canonical URLs.
- Keep internal linking consistent.
These signals help Google find the correct canonical URLs, improving your site’s crawling, indexing, and search visibility.
Mistakes To Avoid
Here are a few common mistakes to watch out for.
1. Incorrect or conflicting canonical tags:
- Pointing to non-existent or 404 pages
- Multiple canonical tags with different URLs on one page
- Pointing to a different domain entirely
Fix: Double-check canonical tags, use only one per page, and use absolute URLs.
2. Canonical chains or loops
When Page A points to Page B as canonical, but Page B points back to A or another page, creating a loop.
Fix: Ensure canonical URLs always point to the final, preferred version of the page.
3. Using noindex and canonical tags together
Sending mixed signals to search engines. Noindex means don’t index the page at all, making canonicals irrelevant.
Fix: Use canonical tags for consolidation and noindex for exclusion.
4. Canonicalizing to redirect or noindex pages
Pointing canonicals to redirected or noindex pages confuses search engines.
Fix: Canonical URLs should be 200 status and indexable.
5. Ignoring case sensitivity
Inconsistent URL casing can cause duplicate content issues.
Fix: Keep URL and canonical tag casing consistent.
6. Overlooking pagination and parameters
Paginated content and parameter-heavy URLs can cause duplication if mishandled.
Fix: Use canonical tags pointing to the first page or “View All” for pagination, and keep parameters consistent.
Key Takeaways
It’s unlikely the complete list of 40+ signals used to determine canonical URLs will be made publicly available.
However, this was still an insightful discussion worth highlighting.
Here are the key takeaways:
- Google uses approximately 40 different signals to determine canonical URLs, with rel=”canonical” tags and 301 redirects being among the strongest indicators
- When signals conflict, Google falls back on secondary signals to make its determination
- Clear, consistent implementation of canonicalization signals (tags, redirects, sitemaps, internal linking) is crucial
- Common mistakes like canonical chains, mixed signals, or incorrect implementations can confuse search engines
Hear the full discussion in the video below:
[embedded content]
Featured Image: chatiyanon/Shutterstock