forked from nevali/crawl
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
If Anansi is asked to crawl a URI which returns a 303 redirect to a representation of the resource, the redirect is sometimes ignored. On other occasions, the redirect is queued and the resource is fetched. I couldn't discern any pattern to this behaviour.
The environment where I noted this behaviour is the Acropolis docker stack, and can be reproduced as follows:
- Clone and run the Acropolis docker stack, so you have a running Anansi. See https://github.com/bbcarchdev/acropolis for instructions. The short version is
docker-compose buildinside a clone of the Anansi project. - Seed Anansi with a non-information resource URI which redirects to an information resource URI, e.g. http://dbpedia.org/resource/Dracula. The command for doing this with the docker stack is:
You may need to repeat this command several times, as Anansi will sometimes follow the redirect correctly. Repeat until Anansi doesn't follow the redirect, effectively ignoring the resource you seeded it with. There is no information in the log about what has happened: the resource URI appears to be getting queued but is never processed.
docker exec acropolis_anansi_1 /usr/bin/crawler-add -f http://dbpedia.org/resource/Dracula