Skip to content

Discrepancy Between Database/Code Logic and Paper Description During Reproduction #30

@huangwei021230

Description

@huangwei021230

Hi there! I’m trying to reproduce the AutoSurvey project (project link) and have encountered some issues. I’d appreciate your help in clarifying a few points.

According to the paper:

For subsection drafting, the models generate specific sections using the outline and 60 papers retrieved based on the subsection descriptions, focusing on the main body of each paper (up to the first 1,500 tokens).

My understanding is that the subsection drafting logic should utilize the main body of each paper (up to 1,500 tokens), not just the abstract. However, in the current implementation, I noticed the following inconsistencies:

  1. Database File: The provided database.zip only contains the abstract portion of the papers, not the main body.
  2. Code Logic:
    • In Database.py, the function def get_paper_from_ids(self, ids, max_len=1500) does not seem to be called anywhere in the codebase.
    • In writer.py, the subsection generation logic only uses the abstract portion of the papers, not the main body.

Questions:

  • Does this mean the current implementation only supports subsection generation based on abstracts, and the paper’s described logic (using the main body) is not yet implemented?
  • Or am I missing some details (e.g., additional data files or configurations)?

Expectations:

  • If the current implementation only supports abstract-based generation, are there plans to implement the main body-based generation logic as described in the paper?

Thank you for your help! If you need further information, please feel free to reach out.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions