Emil Rijcken
Nov 14, 2022

--

Thank you for your question, Elkamel. You can get the initial topic distribution for documents as follows.

Suppose, you have a corpus (list of lists of strings). Then, you initialize a fuzzy model, say FLSA_W, with a number of topics, say 10.

`flsaw = FLSA_W(corpus, 10)`

Then, you train your model with:

`pwgt, ptgd = flsaw.get_matrices()`

Here, `ptgd` is the topic distribution per document.

If later, you have more documents that you want to get the distribution for, you can use the `get_topic_embedding()` method. Here, you feed the trained model a new document/set of documents and it will return the topic distribution for these documents. I recommend playing around with the hyperparameters. (method, topn and perc) to assess what works best as we don't know what settings could work best.

--

--

Emil Rijcken
Emil Rijcken

Written by Emil Rijcken

PhD candidate in Natural Language Processing

Responses (1)