Rimu@piefed.socialtoFediverse@lemmy.world•FediDB has stoped crawling until they get robots.txt supportEnglish
21·
3 hours agoMaybe the definition of the term “crawler” has changed but crawling used to mean downloading a web page, parsing the links and then downloading all those links, parsing those pages, etc etc until the whole site has been downloaded. If there were links going to other sites found in that corpus then the same process repeats for those. Obviously this could cause heavy load, hence robots.txt.
Fedidb isn’t doing anything like that so I’m a bit bemused by this whole thing.
Let’s see about that.
Wikipedia lists http://www.robotstxt.org/ as the official homepage of robots.txt and the “Robots Exclusion Protocol”. In the FAQ at http://www.robotstxt.org/faq.html the first entry is “What is a WWW robot?” http://www.robotstxt.org/faq/what.html. It says:
That’s not FediDB. That’s not even nodeinfo.