David Banks, Duke University - Department of Statistical Science
The dynamics of the Wikipedia, political blogs, and computational advertising are all situations in which the analyst can draw upon two kinds of data: information on text in webpages, and network connectivity structure between pages. In principle, each kind of information can inform the joint analysis; for example, latent Dirichlet allocation analysis can identify topics in text, and the extent to which a particular node participates in a topic may be a covariate used in forecasting the formation of edges. Reciprocally, one may use connectivity patterns to sharpen inference on topic memberships. This talk describes several forays into this area, and points up some of the emerging challenges in joining the recent field of network modeling with text mining.
February, 11 2014 | 12:30 - 2:00 | 230E Gross Hall