• Panda@lemmy.today
    link
    fedilink
    arrow-up
    7
    ·
    4 hours ago

    I’ve seen this pop up on websites a lot lately. Usually it takes a few seconds to load the website but there have been occasions where it seemed to hang as it was stuck on that screen for minutes and I ended up closing my browser tab because the website just wouldn’t load.

    Is this a (known) issue or is it intended to be like this?

    • lime!@feddit.nu
      link
      fedilink
      English
      arrow-up
      7
      ·
      3 hours ago

      anubis is basically a bitcoin miner, with the difficulty turned way down (and obviously not resulting in any coins), so it’s inherently random. if it takes minutes it does seem like something is wrong though. maybe a network error?

      • isolatedscotch@discuss.tchncs.de
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        1 hour ago

        adding to this, some sites set the difficulty way higher then others, nerdvpn’s invidious and redlib instances take about 5 seconds and some ~20k hashes, while privacyredirect’s inatances are almost instant with less then 50 hashes each time

  • refalo@programming.dev
    link
    fedilink
    arrow-up
    10
    ·
    edit-2
    8 hours ago

    I don’t understand how/why this got so popular out of nowhere… the same solution has already existed for years in the form of haproxy-protection and a couple others… but nobody seems to care about those.

    • Flipper@feddit.org
      link
      fedilink
      arrow-up
      27
      ·
      8 hours ago

      Probably because the creator had a blog post that got shared around at a point in time where this exact problem was resonating with users.

      It’s not always about being first but about marketing.

  • unexposedhazard@discuss.tchncs.de
    link
    fedilink
    arrow-up
    73
    ·
    edit-2
    14 hours ago

    Non paywalled link https://archive.is/VcoE1

    It basically boils down to making the browser do some cpu heavy calculations before allowing access. This is no problem for a single user, but for a bot farm this would increase the amount of compute power they need 100x or more.

  • Jankatarch@lemmy.world
    link
    fedilink
    arrow-up
    24
    arrow-down
    2
    ·
    12 hours ago

    Everytime I see anubis I get happy because I know the website has some quality information.

  • grysbok@lemmy.sdf.org
    link
    fedilink
    English
    arrow-up
    26
    ·
    14 hours ago

    My archive’s server uses Anubis and after initial configuration it’s been pain-free. Also, I’m no longer getting multiple automated emails a day about how the server’s timing out. It’s great.

    We went from about 3000 unique “pinky swear I’m not a bot” visitors per (iirc) half a day to 20 such visitors. Twenty is much more in-line with expectations.

    • deadcade@lemmy.deadca.de
      link
      fedilink
      arrow-up
      8
      ·
      11 hours ago

      “Yes”, for any bits the user sees. The frontend UI can be behind Anubis without issues. The API, including both user and federation, cannot. We expect “bots” to use an API, so you can’t put human verification in front of it. These "bots* also include applications that aren’t aware of Anubis, or unable to pass it, like all third party Lemmy apps.

      That does stop almost all generic AI scraping, though it does not prevent targeted abuse.

    • seang96@spgrn.com
      link
      fedilink
      arrow-up
      4
      ·
      15 hours ago

      As long as its not configured improperly. When forgejo devs added it it broke downloading images with Kubernetes for a moment. Basically would need to make sure user agent header for federation is allowed.

  • fuzzy_tinker@lemmy.world
    link
    fedilink
    arrow-up
    70
    ·
    18 hours ago

    This is fantastic and I appreciate that it scales well on the server side.

    Ai scraping is a scourge and I would love to know the collective amount of power wasted due to the necessity of countermeasures like this and add this to the total wasted by ai.

      • adr1an@programming.dev
        link
        fedilink
        arrow-up
        1
        ·
        29 minutes ago

        That’s awful, it means I would get my photo id stolen hundreds of times per day, or there’s also thisfacedoesntexists… and won’t work. For many reasons. Not all websites require an account. And even those that do, when they ask for “personal verification” (like dating apps) have a hard time to implement just that. Most “serious” cases use human review of the photo and a video that has your face and you move in and out of an oval shape…

  • medem@lemmy.wtf
    link
    fedilink
    arrow-up
    21
    arrow-down
    3
    ·
    17 hours ago

    <Stupidquestion>

    What advantage does this software provide over simply banning bots via robots.txt?

    </Stupidquestion>

    • irotsoma@lemmy.blahaj.zone
      link
      fedilink
      arrow-up
      16
      ·
      13 hours ago

      TL;DR: You should have both due to the explicit breaking of the robots.txt contract by AI companies.

      AI generally doesn’t obey robots.txt. That file is just notifying scrapers what they shouldn’t scrape, but relies on good faith of the scrapers. Many AI companies have explicitly chosen not no to comply with robots.txt, thus breaking the contract, so this is a system that causes those scrapers that are not willing to comply to get stuck in a black hole of junk and waste their time. This is a countermeasure, but not a solution. It’s just way less complex than other options that just block these connections, but then make you get pounded with retries. This way the scraper bot gets stuck for a while and doesn’t waste as many of your resources blocking them over and over again.

    • kcweller@feddit.nl
      link
      fedilink
      arrow-up
      67
      ·
      17 hours ago

      Robots.txt expects that the client is respecting the rules, for instance, marking that they are a scraper.

      AI scrapers don’t respect this trust, and thus robots.txt is meaningless.

    • medem@lemmy.wtf
      link
      fedilink
      arrow-up
      36
      ·
      16 hours ago

      Well, now that y’all put it that way, I think it was pretty naive from me to think that these companies, whose business model is basically theft, would honour a lousy robots.txt file…

    • thingsiplay@beehaw.org
      link
      fedilink
      arrow-up
      10
      ·
      14 hours ago

      The difference is:

      • robots.txt is a promise without a door
      • Anubis is a physical closed door, that opens up after some time
    • Mwa@thelemmy.club
      link
      fedilink
      English
      arrow-up
      8
      ·
      17 hours ago

      The problem is Ai doesn’t follow robots.txt,so Cloudflare are Anubis developed a solution.

  • Kazumara@discuss.tchncs.de
    link
    fedilink
    arrow-up
    9
    ·
    15 hours ago

    Just recently there was a guy on the NANOG List ranting about Anubis being the wrong approach and people should just cache properly then their servers would handle thousands of users and the bots wouldn’t matter. Anyone who puts git online has no-one to blame but themselves, e-commerce should just be made cacheable etc. Seemed a bit idealistic, a bit detached from the current reality.

    Ah found it, here

    • deadcade@lemmy.deadca.de
      link
      fedilink
      arrow-up
      6
      ·
      11 hours ago

      Someone making an argument like that clearly does not understand the situation. Just 4 years ago, a robots.txt was enough to keep most bots away, and hosting personal git on the web required very little resources. With AI companies actively profiting off stealing everything, a robots.txt doesn’t mean anything. Now, even a relatively small git web host takes an insane amount of resources. I’d know - I host a Forgejo instance. Caching doesn’t matter, because diffs berween two random commits are likely unique. Ratelimiting doesn’t matter, they will use different IP (ranges) and user agents. It would also heavily impact actual users “because the site is busy”.

      A proof-of-work solution like Anubis is the best we have currently. The least possible impact to end users, while keeping most (if not all) AI scrapers off the site.

      • interdimensionalmeme@lemmy.ml
        link
        fedilink
        arrow-up
        1
        ·
        3 hours ago

        This would not be a problem if one bot scraped once, and the result was then mirrored to all on Big Tech’s dime (cloudflare, tailscale) but since they are all competing now, they think their edge is going to be their own more better scraper setup and they won’t share.

        Maybe there should just be a web to torrent bridge sovtge data is pushed out once by the server and tge swarm does the heavy lifting as a cache.

        • deadcade@lemmy.deadca.de
          link
          fedilink
          arrow-up
          1
          ·
          2 hours ago

          No, it’d still be a problem; every diff between commits is expensive to render to web, even if “only one company” is scraping it, “only one time”. Many of these applications are designed for humans, not scrapers.

  • not_amm@lemmy.ml
    link
    fedilink
    arrow-up
    7
    ·
    17 hours ago

    I had seen that prompt, but never searched about it. I found it a little annoying, mostly because I didn’t know what it was for, but now I won’t mind. I hope more solutions are developed :D

  • fox2263@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    35
    ·
    17 hours ago

    I’d like to use Anubis but the strange hentai character as a mascot is not too professional

    • sleepydragn1@lemmy.world
      link
      fedilink
      arrow-up
      38
      arrow-down
      2
      ·
      edit-2
      16 hours ago

      I actually really like the developer’s rationale for why they use an anime character as the mascot.

      The whole blog post is worth reading, but the TL;DR is this:

      Of course, nothing is stopping you from forking the software to replace the art assets. Instead of doing that, I would rather you support the project and purchase a license for the commercial variant of Anubis named BotStopper. Doing this will make sure that the project is sustainable and that I don’t burn myself out to a crisp in the process of keeping small internet websites open to the public.

      At some level, I use the presence of the Anubis mascot as a “shopping cart test”. If you either pay me for the unbranded version or leave the character intact, I’m going to take any bug reports more seriously. It’s a positive sign that you are willing to invest in the project’s success and help make sure that people developing vital infrastructure are not neglected.

      • CosmicTurtle0@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        13
        ·
        15 hours ago

        This is a great compromise honestly. More OSS devs need to be paid for their work and if an anime character helps do that, I’m all for it.

    • TimeSquirrel@kbin.melroy.org
      link
      fedilink
      arrow-up
      33
      arrow-down
      3
      ·
      16 hours ago

      Honestly, good. Getting sick of the “professional” world being so goddamn stiff and boring. Push back against sanitized corporate aesthetics.

    • Captain Beyond@linkage.ds8.zone
      link
      fedilink
      arrow-up
      18
      arrow-down
      1
      ·
      15 hours ago

      hentai character

      anime != hentai

      I smile whenever I encounter the Anubis character in the wild. She’s holding up the free software internet on her shoulders after all.

    • TomAwezome@lemmy.world
      link
      fedilink
      arrow-up
      14
      arrow-down
      1
      ·
      17 hours ago

      It’s just image files, you can remove them or replace the images with something more corporate. The author does state they’d prefer you didn’t change the pictures, but the license doesn’t require adhering to their personal request. I know at least 2 sites I’ve visited previously had Anubis running with a generic checkmark or X that replaced the mascot