A bag but is language nothing of words

From Mondothèque

Revision as of 10:50, 3 December 2015 by FS (talk | contribs)

In text indexing and other machine reading applications the term "bag of words" is frequently used to underscore how processing algorithms often represent text using a data structure (word histograms or weighted vectors) where the original order of the words in sentence form is stripped away. While "bag of words" might well serve as a cautionary reminder to programmers of the essential violence perpetrated to a text and a call to critically question the efficacy of methods based on subsequent transformations, the expression's use seems in practice more like a badge of pride or a schoolyard taunt that would go: Hey language: you're nothin' but a big BAG-OF-WORDS.
Michael Murtaugh