Hi,
thanks for providing the dataset as a download. I downloaded the dataset from the location mentioned in #12 (comment)
But it appears that the format of the dataset is different from the files you receive if you dowload the data yourself.
See this gist, the first file 12092740.data I downloaded myself from archive.org, while the second file was part of the dowloaded dataset.
As you can see the downloaded file contains the attributes [XSUM]URL[XSUM], [XSUM]INTRODUCTION[XSUM] and [XSUM]RESTBODY[XSUM]. But the file from the dataset has [SN]URL[SN], [SN]TITLE[SN], [SN]FIRST-SENTENCE[SN] and [SN]RESTBODY[SN].
My problem is that if I follow the tutorial at https://github.com/EdinburghNLP/XSum/tree/master/XSum-Dataset the scripts don't work with the unmodified files.
Which changes do I need to make to the scripts?
Best,
Pyfisch
Hi,
thanks for providing the dataset as a download. I downloaded the dataset from the location mentioned in #12 (comment)
But it appears that the format of the dataset is different from the files you receive if you dowload the data yourself.
See this gist, the first file
12092740.dataI downloaded myself from archive.org, while the second file was part of the dowloaded dataset.As you can see the downloaded file contains the attributes
[XSUM]URL[XSUM],[XSUM]INTRODUCTION[XSUM]and[XSUM]RESTBODY[XSUM]. But the file from the dataset has[SN]URL[SN],[SN]TITLE[SN],[SN]FIRST-SENTENCE[SN]and[SN]RESTBODY[SN].My problem is that if I follow the tutorial at https://github.com/EdinburghNLP/XSum/tree/master/XSum-Dataset the scripts don't work with the unmodified files.
Which changes do I need to make to the scripts?
Best,
Pyfisch