Webpage Similarity Scan with Rick

[!tldr] This article introduces a new “Web Checker” feature, which allows you to quickly identify if a website has been copied or has similar content. The tool compares the provided webpage with known URLs and presents the top 3 most relevant results and a quick AI report.

Intro

Introducing the new ‘Webpage Similarity Scanner’. This feature allows you to determine if a website has been copied.

You can scan the majority of websites and identify similar sites or copies in an instant. Rick swiftly compares the provided webpage with all known URLs, presenting the top three most relevant results, time since they were first detected, a short AI report and hyperlinks found on the page.

Keep in mind that the tool simply helps you identify similar websites, you should always verify the results. The scoring algo is still experimental and might be altered at a later stage.

[!bug] This feature is currently in beta and considered experimental.

Usage

This feature is available with the following command:

:telegram: /web https://...
:discord: .web https://...

Not all domains are supported, websites like Twitter are banned from the scanner, as they add no value to the dataset.

Scoring

Rick will present you with the 3 websites that have the highest scores according to our algorithm. This does not necessarily mean the website content is copied. As a general guideline, longer pages tend to have higher scores.

:brain: There is a slight learning curve, as the dataset is constantly evolving, causing the scores to adapt accordingly. The more you utilize this command, the more comfortable you will become interpreting the scores.

Flags

To help you during this decision-making progress, Rick may raise several flags:

  • :rotating_light: = high likelihood of copied text
  • :dna: = possibly a similar template/website builder
  • :robot: = AI match, high likelihood of very similar or exact content

If one or more flags are raised, expect a significantly higher score. Please note that the scoring algo and flags are experimental and are continuously being refined as the dataset evolves.

FAQ

Is there a simple way to tell if a website is copied?
Yes, (almost) exact copies will likely show up with 2 or more flags. Without these flags it’s unlikely to be a full copy, but certain parts might be copied contributing to a higher score.

Examples

The returned sites all have a similar score: probably not a copy

The top result score is significantly higher vs. the rest of the results: very likely to be a copy - additionally, some flags may be raised.

Sometimes sites are not exactly the same but have small differences, this usually goes paired with slighly higher scores on every result.