Bayesian Analysis of Dynamic Linear Topic Models
Chris Glynn, Duke, Statistics
In dynamic text analysis, the proportion of a document characterized by a semantic topic may depend on the time trend of that topic’s overall prevalence and covariates of the document itself. We extend the Dynamic Topic Model of Blei and Lafferty (2006) by explicitly modeling document-level topic proportions with covariates and dynamic structure that includes time trend and periodicity. A Markov Chain Monte Carlo algorithm that utilizes Polya-Gamma data augmentation is developed for posterior inference. Conditional independencies in the model and sampling are made explicit, and our MCMC algorithm is parallelized where possible to allow for inference in large corpora. To address computational bottlenecks associated with Polya-Gamma sampling, we appeal to the Central Limit Theorem to develop a Gaussian approximation to the Polya-Gamma random variable. This approximation is fast and reliable for parameter values relevant in the text-mining domain. Our model and inference algorithm are validated with multiple simulation examples, and we consider the application of modeling trends in PubMed abstracts.
November, 2 2015 | 12:30 p.m. - 2:00 p.m. | 230E Gross Hall