The codebase is together in a single package uk.ac.susx.mlcl.byblo, but some of the code is not always required for default functionality. I propose creating two packages, one for core code and another extras. This will make no difference to end users but will sanitise development somewhat, and make it easier for people to contribute. The new packages could be named:
uk.ac.susx.mlcl.byblo.core
uk.ac.susx.mlcl.byblo.extra
Areas that can be separated include:
- Similarity measure classes in
uk.ac.susx.mlcl.byblo.measures should be moved to uk.ac.susx.mlcl.byblo.extra.measures. This with the exception of the default measure (e.g uk.ac.susx.mlcl.byblo.measures.Jaccard) which would be moved to uk.ac.susx.mlcl.byblo.Jaccard
- Feature re-weighting functions (which don't currently exist) could be implemented in
uk.ac.susx.mlcl.byblo.extra.weightings. A default re-weighting could be implemented in uk.ac.susx.mlcl.byblo; each binary features. Obviously this all depends on extracting the re-weighting code.
- Additional filters could also be specified in extras. Again this would require some refactoring to generalise filtering routines.
All extra code could conform to plug-able API, and probably implement a bean like structure so Byblo can be configured at run-time. For example, some similarity measures have hyper-parameters which should be listable and settable from the command line. The software would therefore need to be able to inspect all available measures at run-time, discover what hyper-parameters are available, and create command line options on the fly (now supported by JCommander).
The codebase is together in a single package
uk.ac.susx.mlcl.byblo, but some of the code is not always required for default functionality. I propose creating two packages, one for core code and another extras. This will make no difference to end users but will sanitise development somewhat, and make it easier for people to contribute. The new packages could be named:uk.ac.susx.mlcl.byblo.coreuk.ac.susx.mlcl.byblo.extraAreas that can be separated include:
uk.ac.susx.mlcl.byblo.measuresshould be moved touk.ac.susx.mlcl.byblo.extra.measures. This with the exception of the default measure (e.guk.ac.susx.mlcl.byblo.measures.Jaccard) which would be moved touk.ac.susx.mlcl.byblo.Jaccarduk.ac.susx.mlcl.byblo.extra.weightings. A default re-weighting could be implemented inuk.ac.susx.mlcl.byblo; each binary features. Obviously this all depends on extracting the re-weighting code.All extra code could conform to plug-able API, and probably implement a bean like structure so Byblo can be configured at run-time. For example, some similarity measures have hyper-parameters which should be listable and settable from the command line. The software would therefore need to be able to inspect all available measures at run-time, discover what hyper-parameters are available, and create command line options on the fly (now supported by JCommander).