Skip to content

Anansi sometimes fails to follow 303 redirects #68

@townxelliot

Description

@townxelliot

If Anansi is asked to crawl a URI which returns a 303 redirect to a representation of the resource, the redirect is sometimes ignored. On other occasions, the redirect is queued and the resource is fetched. I couldn't discern any pattern to this behaviour.

The environment where I noted this behaviour is the Acropolis docker stack, and can be reproduced as follows:

  1. Clone and run the Acropolis docker stack, so you have a running Anansi. See https://github.com/bbcarchdev/acropolis for instructions. The short version is docker-compose build inside a clone of the Anansi project.
  2. Seed Anansi with a non-information resource URI which redirects to an information resource URI, e.g. http://dbpedia.org/resource/Dracula. The command for doing this with the docker stack is:
    docker exec acropolis_anansi_1 /usr/bin/crawler-add -f http://dbpedia.org/resource/Dracula
    
    You may need to repeat this command several times, as Anansi will sometimes follow the redirect correctly. Repeat until Anansi doesn't follow the redirect, effectively ignoring the resource you seeded it with. There is no information in the log about what has happened: the resource URI appears to be getting queued but is never processed.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions