Nepenthes tarpit

I recently discovered Nepenthes, a tarpit for LLM web crawlers. It’s essentially a reverse Slowloris that spits out markov-chain nonsense. A nice way to catch unwanted guests and feed them what they are searching for.

I had my fair share with those nasty LLM crawlers and so I setup my own tarpit. The robots.txt file disallows everything, so unless you are not respecting the big STOP sign you have nothing to fear.

Setup

Follow the instructions from https://zadzmo.org/code/nepenthes/. I use a container.

  1. Configure docker/config.yaml to your liking. The default values are fine.
  2. Build the container using the docker/Containerfile.
  3. Train the markov chain. To do so, you need a big text file. Wikipedia is a good resources, I used Homer’s Iliad from Project Gutenberg. Everyone should use a different file, so that the output is different. One can use multiple files.
curl -X POST -d @iliad.txt -H 'Content-type: text/plain' http://localhost:8893/train
  1. http://localhost:8893
  2. Grap a drink and enjoy

Final note

And now the link is in circulation. I wonder how long it will take to catch the first baddies.