I recently discovered Nepenthes, a tarpit for LLM web crawlers. It’s essentially a reverse Slowloris that spits out markov-chain nonsense. A nice way to catch unwanted guests and feed them what they are searching for.
I had my fair share with those nasty LLM crawlers and so I setup my own tarpit. The robots.txt file disallows everything, so unless you are not respecting the big STOP sign you have nothing to fear.
Setup
Follow the instructions from https://zadzmo.org/code/nepenthes/. I use a container.
- Configure
docker/config.yaml
to your liking. The default values are fine. - Build the container using the
docker/Containerfile
. - Train the markov chain. To do so, you need a big text file. Wikipedia is a good resources, I used Homer’s Iliad from Project Gutenberg. Everyone should use a different file, so that the output is different. One can use multiple files.
curl -X POST -d @iliad.txt -H 'Content-type: text/plain' http://localhost:8893/train
- http://localhost:8893
- Grap a drink and enjoy
Final note
And now the link is in circulation. I wonder how long it will take to catch the first baddies.