Skip to content

adnan15110/AricleReferenceGraph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Project Goal:

I would need a python script which starts crawling the information from the following page: http://dl.acm.org/citation.cfm?id=2488205&CFID=875547729&CFTOKEN=93609255&preflayout=flat

Especially the section "references" and "cited by".

The idea would be to start with an document (page / scientific paper) and build up a network graph with the references connected to this paper and also the papers who cited this paper. This process should then be extended to all papers to a crawling depth of 6 elements.

You can see a picture were I tried to visualize the crawling documents.

The crawled information for each element must be at least the name, Title, Date, Journal, abstract

Example starting point would be http://dl.acm.org/citation.cfm?id=2488205&CFID=875547729&CFTOKEN=93609255&preflayout=flat

If for example a element in references section has no link only text, then the information should still be scrapped and the crawling part ends at this point.

The graph should be build using the networkx library from python. For the scrapping the python library scrape should be used.

A visualization of the graph is not necessary. Only the crawling part and the network build part should be developed.

Libraries Used: 1.Scrapy 2.NetworkX

About

-Build Reference and Cited by graph.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors