Alongside an explosion in research and development related to large language models, there has been a concomitant rise in the creation of pretraining datasets-massive collections of text, typically scraped from the web. Drawing on the field of archival studies, we analyze pretraining datasets as informal archives-heterogeneous collections of diverse material that mediate access to knowledge. We use this framework to identify impacts of pretraining data creation and use beyond directly shaping model behavior and reveal how choices about what is included in pretraining data necessarily involve subjective decisions about values.
View Article and Find Full Text PDFPhys Rev E Stat Nonlin Soft Matter Phys
July 2014
Bipartite networks are a common type of network data in which there are two types of vertices, and only vertices of different types can be connected. While bipartite networks exhibit community structure like their unipartite counterparts, existing approaches to bipartite community detection have drawbacks, including implicit parameter choices, loss of information through one-mode projections, and lack of interpretability. Here we solve the community detection problem for bipartite networks by formulating a bipartite stochastic block model, which explicitly includes vertex type information and may be trivially extended to k-partite networks.
View Article and Find Full Text PDFMost food webs use taxonomic or trophic species as building blocks, thereby collapsing variability in feeding linkages that occurs during the growth and development of individuals. This issue is particularly relevant to integrating parasites into food webs because parasites often undergo extreme ontogenetic niche shifts. Here, we used three versions of a freshwater pond food web with varying levels of node resolution (from taxonomic species to life stages) to examine how complex life cycles and parasites alter web properties, the perceived trophic position of organisms, and the fit of a probabilistic niche model.
View Article and Find Full Text PDF