@noscrape/noscrape

protect your content from scraping

Usage no npm install needed!

<script type="module">
  import noscrapeNoscrape from 'https://cdn.skypack.dev/@noscrape/noscrape';
</script>

README



Project Goal

this project should help you to prevent anyone from scraping your content




Concept

The key behind is to use any true-type font as basis, shuffle glyphs (unicodes) and remove everthing from inside that makes it possible to calculate the original unicode and generate a new obfuscation-font from that. Translate given strings/objects by using the new shuffled unicodes.
On client-side, users are able to read everything well if obfuscated values are rendered with our new calculated font. For any scraper it should only be a great confusion.

What we cannot remove are the glyph - paths. At the moment the paths are obfuscated by shifting them randomly a little bit ( @see obfuscation strength multiplier ) that makes it hard to calculate them back but not impossible or maybe "guessable" by a ML-Algorithm.
Would be nice if someone come up with a better solution or help to improve this 😅




IMPORTANT NOTE

Bots are not able to process obfuscated text or it comes to unpredictable analytics results etc.
So please beware of using this technology on relevant content for indexed pages!

Doing the whole obfuscation stuff tooks time (something around 50-60ms on my machine 😉).
This should not be problem with prerendered pages. For API-Requests, one sould consider putting obfuscation logic into a cronjob like task and use them multiple times instead of calculate everything again for every request.


Example

// server-side obfuscation
const object = { title: "noscrape", text: "obfuscation" }
const { font, value }  = obfuscate(object, 'path/to/your/font.ttf')

⬇⬇⬇⬇ provide data ⬇⬇⬇⬇

// font will be provided as buffer
const b64 = font.toString(`base64`)
<!-- client-side visualization-->


<style> 
    @font-face {        
        font-family: 'noscrape-obfuscated';        
        src: url('data:font/truetype;charset=utf-8;base64,${b64}');    
    }
</style>

...

<span style="font-family: noscrape-obfuscated">
    <div>{ value.title }</div>
    <div>{ value.text }</div>
</span>    

example-code

live demo


Options


strength

 * obfuscation strength multiplier ( default: 1 )
 * all under 0.1 makes no sense ( paths can be simply back calculated )
 * all over 10 makes no sense ( looks like 💩 )


characterRange

character range used for encryption

  • PRIVATE_USE_AREA       DEFAULT
  • LATIN
  • GREEK
  • CYRILLIC
  • HIRAGANA
  • KATAKANA



Contributions

Contributions, issues and feature requests are very welcome. If you are using this package and fixed a bug for yourself, please consider submitting a PR!




License

MIT @ Bernhard Schönberger