Florilegium

A Repo on Ingestion & Analysis of the Special Index of Plants and Chemicals mined via Public Resource media.org

In 2021, The Florilegium: A Special Index to Plants was generated and made publicly available by Public Resource

BACKGROUND

A florilegium in ancient times was a chapbook, a commonplace book you carried with. When you read a manuscript, you would jot down quotes and observations. The name means "to gather flowers." You can learn more about the name of this collection from the Wikipedia.

This collection consists of a scan of journal articles, looking for plant names from two lists:

The first is a list of 1,828 names from the EssoilDB generated by National Institute of Plant Genome Research (NIPGR) New Delhi, India

The other is a list of 8,361 plants from the University of Trans-Disciplinary Health Sciences and Technology (TDU).

In addition, the data also contains search results for 3,443 chemical names, that were found in our earlier work (EssOilDB)

THE DATA

This data is preliminary and consists of 6,150,600 instances of a plant name found in a journal article for the TDU list and 1,468,218 found in the NIPGR list. In addition, the data contains 30,683,797 hits found for 3,443 chemical names

The corpus of 107,233,728 journal articles was first put through Text Extraction- Followed by Tables of N-grams (Singel terms, BiGrams, Trigrams and so on upto 5 terms in length from each document Text). These n-gram tables were then searched for Terms of Interest (eg. Plants and Chemicals) and the output is split into 16 slices.

The current data upload consists of all 16 slices using the TDU list of plants.

The data is presented in a human readable report and in a tsv format for loading into spreadsheets or databases.

The ingestion will be done by NIPGR Florilegium Interns

The files are coded with the letter being searched (a-z) then by the slice being searched (0-f). So, a human readable file (print master report) might have the name pmr_b4_2021-11-22.txt, indicating that the file was generated on November 22 and consists of plant names starting with the letter b on slice 4.

GOALS - GENERAL Goal: Mine the Corpus to end up with a related table connecting:

Plant names

	1. Mentioned in the corpus
		In how many papers
			How many times per paper?
    
	 2. Mentioned alone (with no other plants) in the same document

	3. Commonly mentioned together?

Chemical Constituents

	1. Mentioned in the corpus?
		In how many papers?
			How many times per paper?
	2. What plants do they co-occur with?

Activities

1. Biological
2. Uses

	Agriculture, Medicine, Industrial, etc

Identifiers

1. Article IDs
 2. KEYS to the data will be the DOIs and MD5#
  3. Need to add wikidata IDs
	wherever possible

PEOPLE INVOLVED

Project Owner: Gita Yadav Responsible for: Project Vision and Criteria
Project Advisor: Peter Murray-Rust Responsible for: Strategy
Program Manager: Manny Faria Arruda Responsibilities: Planning, Coordination, and Tracking
Intern A Responsibilities:
- Timely, complete and accurate logging of activities, methods, trial and error results, etc.
Intern B Responsibilities:
- Timely, complete and accurate logging of activities, methods, trial and error results, etc.

CHALLENGES

This current data does not have TF (term Freq) or IDF metrics but these will be added later, to support weak/stroing correlations

TO LEARN MORE

To learn more, watch <a href=https://archive.org/details/multicasting?and%5B%5D=subject%3A%22TDM%20Today%22> The TDM Today Show!

You may also be interested in The General Index

And the Special Index to Species.

This data only represents the searches on n-gram Tables. if you wish to search the complete Texts of all 57 million papers used in the corpus, please search Open Alex Directly.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Florilegium

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Florilegium

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages