Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
150 changes: 150 additions & 0 deletions test_run_ami-search-cooccur20190120
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
Documentation of test run of ami-search-cooccur over Ocimum sanctum test dataset.
Introduction
Contentmine is an open-source software suite for text data mining (content mining) especially of scientific journals. It is intended to benefit research based on literature survey in terms of high-throughput and accuracy.Here is documentation and tutorial related contentmine tools - Getpapers, norma and ami.
Getpapers - It fatches scientific papers (full PDF or XML) along with metadata and supplementary informations. It performs first most step for content mining acquiring scientific papers for reading or bibliometrics.
Norma - tool for processing the output of getpapers into normalized, tagged, XHTML or scholarly HTML which are further used as input for ami tools/plugins.
AMI tools/plugins - Mining and analysis is performed by these tools or plugins. It searches and indexes structured documents on a high-throughput basis.

Installation

Tools are enabled for various computational platforms - Windows as well as Linux. Separate installation steps are discussed here.

Getpapers installation over windows.
step 1: Installing nvm-windows.
Go to the downloads page - https://github.com/coreybutler/nvm-windows/releases and download nvm-setup.zip for the latest version. Unzip the downloaded file and run the included installer.

step 2: Run following commands over windows command-prompt.
> nvm install 7
> nvm use 7

step 3: Installing a node tool - getpapers.
> npm install --global getpapers

step 4: Set environment variables to access installed getpapers tools.
For example on my laptop it is installed into following directory.
C:\Users\hadoop_pc\AppData\Roaming\npm

Getpapapers installation over Linux.

step 1: Installing nvm
> crl -o- https://raw.githubusercontent.com/creationix/nvm/v0.30.1/install.sh | bash
Make sure of curl installation. (sudo apt-get install curl).
or
> wget -qO- https://raw.githubusercontent.com/creationix/nvm/v0.30.1/install.sh | bash

step 2: Installing node
Type following commands on to your terminal.
>nvm install 7
>nvm use 7
>nvm alias default 7

step 3: installing a node tool.

>npm install --global getpapers

AMI tool installation over windows.

srep 1: Make your own installation area (directory containing the package).
> mkdir AMI

step 2: Get the ami software package into the directory.
Download link - https://github.com/petermr/ami-jars

step 3: Set environment variable to access the bin directory containing ami plugins/tools.
To set environment variable in Windows - 08.
- From the desktop, right click the Computer icon.
- Choose System from the context menu.
- Click the Advanced system settings link.
- Click new option into user variables for desktop.
- Set variable name (environment variable name) and value (absolute path value).


AMI tools installation over Linux.

step 1: Make your own installation area (directory containing the package).
> mkdir AMI

step 2: Get the ami software package (clone ami repository into your area or directory).
> git clone https://github.com/petermr/ami-jars.git

step 3: Set environmental variable to access the ami plugins (tools).
Get into either of the directory ../ami-jar/ami20190115/bin/ or ../ami-jar/ami20190115/bin/ folder. All ami tools are contained into it.
Set the absolute path of the bin directory to environment variable.

> export PATH=$PATH:/absolute path/ami-jar/ami20190115/bin/

check for the set path to bin directory
> echo $PATH

5. Check for the installation.
> ami-pdf
If exported or installed correctly. It will list out all tool help information.

Running steps for getpapers and ami plugins/tools.
step 1: generate CProject.
> getpapers -q <query_name> -o <project_folder>

<query_name> - query name for the search (generally scientific names. These names are contained as dictionary into software suite.
<project_folder> - name of the project folder (CProject). These contain the downloaded papers into PDF or XML format.

step 2: Add scholarly.html file to CProject folders.
> norma --project <project_folder> -i fulltext.xml -o scholarly.html --transform nlm2html

step 3: Run ami plugins/tools.
>ami-search-cooccur --project <project_folder>/ <plugin_options>
<plugin_options> - space separated options for search e.g - country species gene plantparts drugs monoterpene.

Test run over Ocimum Sanctum dataset.

Here we run six getpapers queries for Ocimum sanctum dataset. It will form a subset of 100 papers. One may increase the number of count of downloaded papers just by changing the -k option of the command-line.

> getpapers -q "Ocimum sanctum" -o osanctum20190121 -x -p -k 100
> getpapers -q "ocimum" -o ocimum20190121 -x -p -k 100
> getpapers -q "sanctum" -o sanctum20190121 -x -p -k 100
> getpapers -q "ocimum AND sanctum" -o ocimumandsanctum20190121 -x -p -k 100
> getpapers -q "ocimum sanctum" -o ocimum_sanctum20190121 -x -p -k 100
> getpapers -q "((Ocimum sanctum) OR (Ocimum tenuiflorum) OR (thulasi) OR (tulasi) OR (tulsi) OR (holy basil))" -o ocimumsanctumadvancedsearch20190121 -x -p -k 100


Perform normalization of downloaded papers and formation of scholarly HTML files.

> norma --project osanctum20190121 -i fulltext.xml -o scholarly.html --transform nlm2html
> norma --project ocimum20190121 -i fulltext.xml -o scholarly.html --transform nlm2html
> norma --project sanctum20190121 -i fulltext.xml -o scholarly.html --transform nlm2html
> norma --project ocimumandsanctum20190121 -i fulltext.xml -o scholarly.html --transform nlm2html
> norma --project ocimum_sanctum20190121 -i fulltext.xml -o scholarly.html --transform nlm2html
> norma --project ocimumsanctumadvancedsearch20190121 -i fulltext.xml -o scholarly.html --transform nlm2html

Run ami plugin for search results.

> ami-search-cooccur osanctum20190121/ country species drugs gene plantparts monoterpene
> ami-search-cooccur ocimum20190121/ country species drugs gene plantparts monoterpene
> ami-search-cooccur sanctum20190121/ country species drugs gene plantparts monoterpene
> ami-search-cooccur ocimum_sanctum20190121/ country species drugs gene plantparts monoterpene
> ami-search-cooccur ocimumandsanctum20190121/ country species drugs gene plantparts monoterpene
> ami-search-cooccur ocimumsanctumadvancedsearch20190121/ country species drugs gene plantparts monoterpene

Results