Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.pyc
139 changes: 139 additions & 0 deletions FlavorADX_Surya.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# This is mere flavor of ADX


Assuming our data directory is conveniently mounted on root like `/data`.

## Crawler

Crawler can either run from your favorite shell as a standalone program or run in the background as daemon patiently waiting and checking for new files for indexing and analyzing.

To run the daemon on our data directory:

adxd /data --parser=Parse_Filterbank.py

adxd /data --frequency * * * 59

adxd /data --db=localhost 34001

`parser` argument is nothing but a class definition which will be defined below

`frequency` argument is just like arguments of crontabs which decides the frequency at which the daemon should check for new files.

`db` argument provides the IP and port number to host a database. It will be ignored if user seeks astropy tables argument.

To run the crawler from your shell:
`$ adx /data --parser=Parse_Filterbank.py`

As mentioned above, `parser` will be discussed here.


## Parser
There will be a main parser class which user doesn't touch. All the info will be added to the parser class by Adx `ParserType` instance. For instance, consider this example:

ParserType fil
Creation of `ParserType` class

fil.AddExtenstionRule('fil')
Method of `ParserType` which adds an rule for extension.

fil.AddFilenameRegexRule('[JB]+[0-9]{4}[+-][0-9]{2,4}_[0-9]{5}_')
Method to add a regex rule to test for files for further processing. This regex is source JName or Bname followed by MJD.

fil.AddFilenameRule('!kur')
Method which adds filename rule.
The rule starting with ! implies logical NOT.

fil.AddFilenameRule('kur')
Method which adds filename rule, which means, that it will only accept kurtosis files.

fil.AddSignature('^BBX')
Method which checks for signature in the file. N.B. there is no way around to NOT open the file.
`^` means the beginning of the file. `$` means the end of the file. This is similar to regex matching.

fil.Reader(MyFilReader)
Binds class interface to Filterbank files with the parser. If this option is provided and there are metrics in *this* `ParserType` instance which take the class interface as argument, a single instantiation of the reader class (interface class) is instantiated and passed as argument to one (or more) such metrics. *This is will further elucidated in an example*

fil.AddFloat('mean')
Adds a data field (which is to be tabulated) with column name 'mean' and since it is a fairly common statistics, `Parser` use it's own definition of mean computation. User doesn't have to provide this implementation of mean computation.

fil.AddFloat('smean', GiveSMean)
Similar to above, but column name is 'smean' which stands for special mean and `GiveSMean` is a callable class or function
which is user provided. If the user has provided an interface class using `Reader` method, `GiveSMean` should take that class as an argument.

fil.AddString('polarisation', GivePol)
This method adds a string data field with name 'polarisation' and similar to above `GivePol` is a callable class or function which can take either filename or `Reader` class as argument. The exact function matching falls on the mantle of users.

fil.AddDateTime('ctime')
There will be some fields which are captured by ADX by default. One of them is the `ctime, atime, mtime` of files which proves to be helpful in keeping track of files.
....

There will/can be many more options which for brevity sake are left out. Now, here comes first act of magic:

Parser.AddParserType(fil)
This method takes the necessary stuff (all the required things it would need) and merges it with `Parser` class.
`ParserType` class can be one heavy and over-engineered to the extent of over-engineering but ONLY those which are actually are relevant (decided by `Add*` methods as called by the user gets added) are injected into the `Parser` class.
You can add multiple parsertypes in the same `Parser` class and `Parser` class knows what to do with each of them.

This approach not only gives set of tools for the user but only constraints the structure of the code and thereby increases regularity which helps us developers in writing smooth code. Instead of telling the user "look, this method `def __call__(self)` define it to your liking and then make it return dictionary with key value pairs" we are telling user to "you know what you want? OK, add methods which we provided in `ParserType` to get whatever you want. We made sure that anything you want, our methods in `ParserType` class can get you. Once, you're done, pass that class to main `Parser` and chill"

This approach also ensures that every file is only opened once since every metric is computed from same `File Handle`.

### ParserType example

Let us crawl through a directory containing Pulsar integrated profiles which have .prof extension and first line (and only the first line) is a header with

`# MJD, Fraction-of-day, Number of periods in integration, period, DM, num-bins, polarisation, observatory-code`

followed by num-bins number which correspond to actual data.
This is how it would look.

ParserType prof
prof.AddExtensionRule('prof')
prof.Reader(MyProfReader)
prof.AddFloat('SN',GiveSN)
prof.AddFloat('DM',GiveDM)
prof.AddFloat('MJD',GiveMJD)
prof.AddString('PSR', 10, GiveJName)
Parser.AddParserType(prof)

The functions used above are defined in the same place where `ParserType` is defined and they all have the following signature:

class MyProfReader(object):
def __init__(self, filepath):
# initalization
self.dm = ...
self.data = ...

def GiveDM(x):
return x.sn

def GiveSN(x):
# SN computation using x.data

## Crawler

Coming back to crawler with our `Parser` class loaded with all the rules while still hidden by the user, can safely interact with `Crawler` class in a pre-determined fashion and just the way we developers seek.

Parser.GetExtensions()
This method would return the extensions which were added to `Parser` class.

Second act of magic:
`Parser` class where all the `ParserType` are injected creates regex rules for each of the `ParserType` classes. This regex rule is generated from filename, pathname, extension and is pretty robust. And, it is this rule which is passed onto to `Crawler` at the start of crawling which is used to figure out what to do. On successful match, `Parser` internally calls and computes all the metrics as asked by the user (the user is not calling an function, s/he is merely specifying what s/he wants).

## Logger
Third and final act of magic:

`Parser` class again comes to rescue here and tells us the schema of each of the table. `Crawler` and `Logger` can internally talk among themselves.

## Final comments

We can really brainstorm and add contrived `ParserType` methods such as `AddOnFile` which computes a statistics (not recorded by Logger) and performs logic based on the statistics.

ParserType prof
prof.AddOnFile(AlertIfNoDetection)

In the running example, `AddOnFile` binds function or callable class `AlertIfNoDetection` which takes `MyProfReader` as argument and runs some statistical test to check for detection and if it finds that there's no detection, it shouts.

Not just it, this approach hides `Crawler` and `Logger` class from the user with the exception of `Parser` in which only one method is exposed, the `AddParserType` method. The definition of `Parser` will happen in the main body and `adx --parser=MyParser.py` uses it without defining it.

The true power (according to me) comes from creating the tools which the user can just call and use to his/her liking and rest is taken care by ADX.
4 changes: 4 additions & 0 deletions TODO
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
-


- PEP8
64 changes: 64 additions & 0 deletions adx/Adx.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
'''
ADX main class definition
'''
# ADX stuff
from parser import Parser
from crawler import Crawler
# other stuff
import multiprocessing as mp

__all__ = ['Adx']

class Adx(object):
def __init__(self, cdir, pars, logg,
daemon=True,
verbose=0,
numthreads=1,
debug=False):
'''
Arguments
---------

cdir : str, or list of str
Directory or list of directories to crawl
pars : instance of Parser or list of ParserTypes
logg : any instance of logging

'''
# crawl setup
self.crawler = Crawler(cdir)
# parse setup
if isinstance(Parser, pars):
self.parsers = pars
elif isinstance(list, pars):
self.parsers = Parser()
for pt in pars:
self.parsers.AddParserType(pt)
# logger setup
self.logger = logg
# misc options
self.daemon = daemon
self.debug = debug
self.verbose = 3 if debug else verbose
self.numthreads = numthreads if numthreads < mp.cpu_count() else mp.cpu_count()
# max number of threads is number of CPUs

def __step(self,currdir, curr):
# moving mountains
# one rock at a time
# to ensure grouped and make use of InsertMany
rdict = self.parsers.parseAction(curr)
for k,v in rdict.items():
if len(v) == 0:
continue
self.logger.InsertMany(k,v)

def __setup(self):
# this isn't necessary anymore
# self.logger.getSchema( self.parser.putSchema() )
pass

def walk(self):
for pdir, curr in self.crawler:
self.__step(pdir, curr)

File renamed without changes.
127 changes: 127 additions & 0 deletions adx/adx
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
#!/usr/bin/env python2.7

_VERSION_ = "0.0.1"
_PERSISTANCE_FILE_NAME_ = "/tmp/adx_persistant_logger"
_PERSISTANCE_KEY_ = 'ala'

import argparse
import subprocess as sp
import sys
import logging

# log setup
logging.basicConfig(format='[%(levelname)s] %(message)s')
mylog = logging.getLogger()



def parseargs():
adxargparser = argparse.ArgumentParser(prog="ADX", description="Command line interface to ADX", epilog="ADX v"+_VERSION_)
subs = adxargparser.add_subparsers(help="Commands", dest='cmd')
## logout
lgroup = subs.add_parser("logout", help="Closes everything and logs out.")
## schema
sgroup = subs.add_parser("schema", help="Prints schema.")
## refresh
rgroup = subs.add_parser("refresh", help="Refreshs database.")
## update
ugroup = subs.add_parser("update", help="Updates database.")
## crawl
crawlgroup = subs.add_parser("crawl", help="Crawl action.")
addarg = crawlgroup.add_argument
addarg("-d,dir", help="Directories", action='store', nargs='*', dest='DIRS')
addarg("-p,parse", help="ParserType files", action='store', nargs='*', dest='PTS')
## connect
cgroup = subs.add_parser("connect", help="Connect help")
addarg = cgroup.add_argument
addarg("-n,name", help="Name of the ADX/Project", default='adx', dest='NAME')
addarg("--one-session", action='store_true', default=True, dest='persist', help="Flag to make persist connection.")
## db options
cgroup_subs = cgroup.add_subparsers(help="Interfaces", dest='interface')
dbcgroup = cgroup_subs.add_parser("mongodb", help="MongoDB interface")
addarg = dbcgroup.add_argument
addarg("--connect", help='IP to connect to database.', default='127.0.0.1:27017', metavar='X.X.X.X:P')
## tab options
tabcgroup = cgroup_subs.add_parser("table", help="Astropy Table interface")
addarg = tabcgroup.add_argument
addarg("--tabpath", help="Table path.", default='.')
## query
qgroup = subs.add_parser('query',help="Query help")
addarg = qgroup.add_argument
addarg("--par", help="Parameter to query")
addarg("--cond", help="Condition")
addarg("--exec", help="Execute afer finding")
addarg("--out", help="Output filepaths")
addarg("--explain", help="Explain query")
addarg("--absolute", action='store_true', help="Return absolute paths")
addarg("parsertype", help="The parserType", nargs=1)
addarg("query", help="JSON-like query\nYou will need to quote it.")
####
return adxargparser.parse_args()

def main():
### this function returns
logger = None
persister = None
###
opts = parseargs()
if opts.cmd == 'schema':
print "requested schema"
elif opts.cmd == 'connect':
print "requested connect"
if opts.interface == 'mongodb':
print "requested mongodb"
print "Connect at ", opts.connect
print "DBname is ", opts.NAME
import mongodbio
logger = mongodbio.dbio(name = opts.connect, dbname = opts.NAME)
print type(logger)
elif opts.interface == 'table':
print 'requested tabio'
print "tabpath", opts.tabpath
import tabio
logger = tabio.tabio()
if opts.persist:
print "Asked for persistance"
# same filename
from adxshelver import Shelver
persister = Shelver(_PERSISTANCE_FILE_NAME_)
persister.save(_PERSISTANCE_KEY_, logger)
# XXX ala is for now a hack.
# you need a persistent name too
elif opts.cmd == 'crawl':
# from imp import load_source
# XXX this is deprecated!!!
import importlib
print "requested crawl"
print "DIRS:", opts.DIRS
print "PT files", opts.PTS
pts = [importlib.import_module(p) for p in opts.PTS]
print pts
elif opts.cmd == 'refresh':
print "requested refresh"
elif opts.cmd == 'update':
print "requested update"
elif opts.cmd == 'query':
pt = opts.parsertype[0]
print "queried parsertype", pt
import ast
query = ast.literal_eval(opts.query)
print "requested query", opts.query
from adxshelver import Shelver
persister = Shelver(_PERSISTANCE_FILE_NAME_)
logger = persister.get(_PERSISTANCE_KEY_)
logger.Query(pt, query)
elif opts.cmd == 'logout':
print "requested logout"
from adxshelver import Shelver
persister = Shelver(_PERSISTANCE_FILE_NAME_)
persister.close()
### graceful termination
logger and logger.close()
persister and persister.close()
print "Exiting main"
return opts

if __name__ == '__main__':
opts = main()
Binary file added adx/adx_persistance_logger
Binary file not shown.
9 changes: 9 additions & 0 deletions adx/adxceptions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
'''
Class definitions of all the exceptions passed around
by ADX
'''
class ADXception(Exception):
pass

class ADXLogImportError(ADXception):
pass
31 changes: 31 additions & 0 deletions adx/adxshelver.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
'''
Shelving class
'''
import shelve as sh

class Shelver():
'''
Manages the shelving actions
'''
def __init__(self, filename):
self.shelf = sh.open(filename)
self.filename = filename

def close(self):
self.shelf.close()
# delete file

def list(self):
return self.shelf.keys()

def save(self, k, v):
self.shelf[k] = v

def get(self, k):
return self.shelf[k]

def savefilename(self,path, session):
f = open(path + session,'w')
f.write(self.filename)
f.close()

Loading