Skip to content

sodestream/metadata_seg

Repository files navigation

metadata-seg

Code to add segmentation of message text for messages from the IETF mailarchive.

To generate segmentations install the ietfdata library available at this link.

Then checkout this repository and run:

./preprocess-seg-to-csv.sh

This may take some time (possibly a few hours). Then run:

`python preprocess-seg-to-db.py

This will write the results to the your mongodb mailarchive2 instance. You can now access the segmentations using this code:

ml_name = "100all"
ml = archive.mailing_list(ml_name)
if ml:
   threads_dict = ml.threads(this_list_only = True)
   for thread_root_msg in threads_dict.values():
       for msg in iterate_over_thread(thread_root_msg):                            
           # segmentations 
           for segment in get_segmentation(msg):
               print("Segment %d is quoting segment %d" % (segment.id, segment.antecedent))
               print()
               print("Content of segment:")
               print(segment.content)
               print()
               print("Type of segment:")
               print(segment.type)

About

Code to add segmentation of message text for messages from the IETF mailarchive.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors