Compare two drafts and estimate duplication risk instantly.
Combines word-level and phrase-level overlap into a single percentage. Under 50% is generally safe for SEO; 50-80% suggests significant overlap that needs attention; above 80% indicates near-duplicate content that can trigger Google's duplicate content filters.
Calculates the percentage of shared unique words between both texts using Jaccard similarity. This catches topical overlap even when sentence structure differs — useful for detecting if two pages compete for the same keywords.
Analyzes shared 3-word phrases (shingles) between texts. This is more precise than word overlap because it detects copied or barely-rewritten passages. High phrase overlap with low word overlap suggests structural copying with word substitution.
Google's algorithms actively filter duplicate content from search results. When multiple pages have substantially similar content, Google chooses one version to index and suppresses the others. This means duplicate pages waste crawl budget, dilute link equity, and create keyword cannibalization — where your own pages compete against each other instead of against competitors.
Multiple pages on the same site targeting similar keywords with nearly identical content causes keyword cannibalization. Google cannot determine which page to rank, so both pages perform worse than a single consolidated page would.
E-commerce sites often have product pages that differ only by color or size with 90%+ identical descriptions. Google may index only one variant and ignore the rest. Write unique descriptions highlighting what makes each variant different.
Republishing content from other sources (or having your content republished) without proper canonical tags creates duplicate content across domains. The original source may lose ranking credit to the copy.
Large blocks of identical text (legal disclaimers, company descriptions, location pages with only the city name changed) dilute the unique content ratio. Keep boilerplate under 20% of total page content.
The tool uses Jaccard similarity on two levels: individual unique words and 3-word phrases (shingles). The final score averages both metrics. Word overlap catches topical similarity while phrase overlap detects copied passages.
If similarity exceeds 50%, consider: (1) Consolidating the pages into one comprehensive page with a 301 redirect, (2) Rewriting one page with a different angle, unique data, or distinct examples, (3) Adding a canonical tag pointing to the preferred version.
Google does not issue manual penalties for unintentional duplicate content, but it does filter duplicates from search results. Only one version gets indexed, and it may not be the one you prefer. Deliberate manipulation through scraped or spun content can trigger manual actions.
Under 50% overlap is generally safe and expected for content in the same niche. Between 50-80% suggests the pages may be competing with each other. Above 80% means the content is essentially duplicate and one version will likely be suppressed by Google.
Keyword cannibalization occurs when multiple pages on your site target the same keyword with similar content. Instead of one strong page ranking well, Google splits ranking signals between them. The result is multiple pages ranking poorly instead of one page ranking strongly.
This tool compares two specific texts you provide, calculating mathematical similarity. Plagiarism checkers search the entire web for matching content. Use this tool for internal content audits and pre-publication checks; use plagiarism checkers for external duplicate detection.
Yes. If you have legitimately similar pages (e.g., printer-friendly versions, sorted product listings), use rel="canonical" to point to the preferred version. This tells Google which URL to index and consolidates ranking signals to one page.
Aim for at least 60-70% unique content on every page. If more than 30-40% of a page is boilerplate (navigation, footers, sidebars, repeated disclaimers), the unique content ratio may be too low for Google to consider the page valuable.