Skip to content

ComLake/comlake.Crawler

Repository files navigation

Crawler

Crawler Website, Data Analysis using JAVA

Description

In the problem of determining business objectives, collecting the appropriate data to obtain the business objectives and analysing to identify it properly, a data lake is an effective solution that has the ability to handle virtually any type of data with limitless scale and capacity and store all of the organization of raw data for future work. In the goal of making a data lake management system, my data acquisition and integration tool has the role of a website crawler that can seek and frequently ingests data through API processes and automated testing framework. Then the application extracts the data to standard data and send to interface web dashboard and DataLakeAPI, supplying the resources for DataLakeCore and other parts of DataLakeServices

Subject Aim

Development of Data Acquisition and Integration Tool for Data Lake

Features and Detail :

  • Crawling data from websites
  • Data anlysis and set metadata for web dashbroad

Target users :

For everyone

Expected outcomes:

Minimum 500 Gb of :

  • CT, X-ray scan of the human Body
  • Environment data : weather,..
  • Remote sensing photos

Download and setup JDK, Chrome

Running

  • Run tests

Used methods and Techniques

  • JAVA - Main Programming Language
  • UML, Object-oriented Analysis and Design
  • HTML

Contributing

Just a small part of the big ComLake

Authors

  • Nguyen An Thiet - USTHBI8 -174 - JeffKi11er -<Fresher Java, Android>

Mindstones

About

Automation Crawler Website, Data Analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published