Extend taxonomy by labeled documents. In this repo, we extend NSFC 3 level discipline taxonomy by NSFC project keywords.
- Python 3
For raw text, you can extract keywords using the same approach in HierRec. In this repo, we provide a processed file. Therefore, just:
- download data/nsfc_kws_filt.jl, which is a temporary file of HierRec.
- run main.py to get result.
We use PMI to compute the relativeness of a word and a discipline. And the softmax of PMI is used to represent the discipline distribution of a word.