Projects
Specific correspondence topic models (SCTM)
Correspondence between a news article and a comment can be specific in nature i.e. the comment may be related only to a very small part of the article which may not be contiguous. Similar relationship can be found in paper-bibliography, image-tags etc. We call such relationships as specific correspondence and propose SCTM to model it. more details.
Classification of text documents without any labelled data
It is expensive and some times near impossible to generate labelled training data given explosion in text information at present whether it be blogs, comments to news, software codes or websites. However classification is a basic step in many situations where the user is expected to have idea of the categories she wants the documents to be classified into. We propose to provide few descriptive words for each category and that can lead to excellent classification accuracy which can be very close to supervised methods like SVM which used labelled training data. more details.
Multi-lingual hier-archical topic models
Hier-archy of topics are useful representation of any corpus, where topics near the root present general topics and topics away from the roor describe more specific topics. For example, sports will be in some higher level of the tree than that of football. Nested Chinese restaurant process (nCRP) is well known to model such hierarchy for mono-lingual scenario. I am working on extending nCRP for learning the hierarchy in multi-lingual scenario where each node in language 1 will have a correspondence node in language 2. more details.
|