The text-critical practice of grouping witnesses into families or texttypes often faces two obstacles: the methodological question of how exactly to isolate the groups, given the chicken-and-egg relationship between “good” group readings and “good” group manuscripts, and contamination in the manuscript tradition. I introduce non-negative matrix factorization (NMF) as a simple, automated, and efficient solution to both problems. Within minutes, NMF can cluster hundreds of manuscripts and readings simultaneously, producing an output that details potential contamination according to an easy-to-interpret mixture model. I apply this method to Wasserman’s extensive collation of the Epistle of Jude, showing that the resulting clusters correspond to human-identified textual families and their characteristic readings ccorrectly divide witnesses into their groups. Due to its demonstrated accuracy, versatility, and speed, NMF could replace prior state-of-the-art classification methods and find fruitful application in a number of text-critical settings.

jude_nmf_rank_13_primary.xlsx (209 kB)
Basis and mixture matrices for primary manuscripts.

jude_nmf_rank_13_secondary.xlsx (9 kB)
Post-factorization mixture matrix for secondary (lacunose) manuscripts.