How to use dataset

Hi,

thanks for providing the dataset as a download. I downloaded the dataset from the location mentioned in https://github.com/EdinburghNLP/XSum/issues/12#issuecomment-558241165
But it appears that the format of the dataset is different from the files you receive if you dowload the data yourself.

See this [gist](https://gist.github.com/pyfisch/5be8f6411fb49c222c61bd8364f05049), the first file ` 12092740.data` I downloaded myself from archive.org, while the second file was part of the dowloaded dataset.

As you can see the downloaded file contains the attributes `[XSUM]URL[XSUM]`, `[XSUM]INTRODUCTION[XSUM]` and `[XSUM]RESTBODY[XSUM]`. But the file from the dataset has `[SN]URL[SN]`, `[SN]TITLE[SN]`, `[SN]FIRST-SENTENCE[SN]` and `[SN]RESTBODY[SN]`.

My problem is that if I follow the tutorial at https://github.com/EdinburghNLP/XSum/tree/master/XSum-Dataset the scripts don't work with the unmodified files. 

Which changes do I need to make to the scripts?

Best,
Pyfisch


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use dataset #17

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to use dataset #17

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions