TML has moved to and the code to
- Document indexing and selection using Apache's Lucene
- Fast VSM generation with several local and global weights (term - doc matrix)
- Dimensionality reduction using SVD or NMF for LSA or related.
- Meta-data annotators (PennTree grammar parsing).
- Operations: Document distances, topic clustering, keyword extraction, and many more!
User Reviews
It seems to be good, but there are some errors that dont let the program load correctly the library ( Abstract Annotator constructor receives parameters but PennTreeAnnotator doesnt receive)
very good library for doing text mining