cross-posted from: https://beehaw.org/post/21152300
- YouTube video (first 7 minutes): https://youtu.be/rIR3PpQ82yE
- SkipVids (same video, but without ad): https://skipvids.com/?v=rIR3PpQ82yE
- New site: https://www.thisweekinvideogames.com/ (at the time of the posting the website opens extremely slow, they might get hit with lot of visitors)
The first 7 minutes segment explains it. Its kinda self advertisement, but I think this is important. One of my favorite Gaming YouTube channels “Skill Up” launched a new website for gaming articles. The goal is to have articles without Ai, no advertisements, no sponsored articles, no CEO optimized content, to maintain a high quality content. I think this is really really important and a good step.
As someone who works for a paywallled website, that’s hardly a deterrent. If the site is important enough, they will pay for accounts and crawl until the server melts
Is there any true way to block it? Does the crawler literally use the same access (443) as us to scrape content? If so, the only other thing I can think of is to block all known IP’s that AI crap originates from, but that sounds daunting and impossible to catch everything.
There’s no fullproof way. Even if you somehow block every crawling automation, there’s still puppeteering where the bot behaves just like a normal user.