Topic Modeling

Topic modelling is a statistical learning approach in the field of natural language processing that discovers latent or hidden “themes” (which are referred to here as ‘topics’) from a large textual collection or corpus.
Topic Modelling is one of the most popular techniques for text categorization and approaches to textual analysis. We are employing a particular form of topic modelling called Latent Dirichlet Allocation (LDA) to analyse Anglo Saxon & 11th to 13th century property transfer documents. In this model, the number of latent topics we wish to discover is provided apriori.
The collection of the words in the vocabulary we are referring to are not necessarily individual words, as one would find in a dictionary. They can also be n gram words (a string of n number of consecutive individual words).
From experimental analysis and interpretability of the results, for the DEEDS English collection, we have implemented an LDA model for the Anglo-Saxon and 11th to 13th century property transfer documents.