A Book of the Web
Historically, we have been treating texts as discrete units, that are distinguished by their material properties such as cover, binding, script. These characteristics establish them as either a book, a magazine, a diary, sheet music and so on. One book differs from another, books differ from magazines, printed matter differs from handwritten manuscripts. Each volume is a self-contained whole, further distinguished by descriptors such as title, author, date, publisher, and classification codes that allow it to be located and referred to. The demarcation of a publication as a container of text works as a frame or boundary which organises the way it can be located and read. Researching a particular subject matter, the reader is carried along by classification schemes under which volumes are organised, by references inside texts, pointing to yet other volumes, and by tables of contents and indexes of subjects that are appended to texts, pointing to places within that volume.
So while their material properties separate texts into distinct objects, bibliographic information provides each object with a unique identifier, a unique address in the world of print culture. Such identifiable objects are further replicated and distributed across containers that we call libraries, where they can be accessed.
The online environment however, intervenes in this condition. It establishes shortcuts. Through search engine, digital texts can be searched for any text sequence, regardless of their distinct materiality and bibliographic specificity. This changes the way they function as a library, and the way its main object, the book, should be rethought.
(1) Rather than operate as distinct entities, multiple texts are simultaneously accessible through full-text search as if they are one long text, with its portions spread across the web, and including texts that had not been considered as candidates for library collections.
(2) The unique identifier at hand for these text portions is not the bibliographic information, but the URL.(3) The text is as long as web-crawlers of a given search engine are set to reach, refashioning the library into a storage of indexed data.
These are some of the lines along which online texts appear to produce difference. The first contrasts the distinct printed publication to the machine-readable text, the second the bibliographic information to the URL, and the third the library to the search engine.
SVP: De toegang gaat niet meer over: “deze instelling heeft dit, deze instelling heeft iets anders”, al die instellingen zijn via dezelfde interface te bereiken. Je kan doorheen al die collecties zoeken en dat is ook weer een stukje van die originele droom van Otlet en Vander Haeghen, het idee van een wereldbibliotheek. Voor elk boek is er een gebruiker, de bibliotheek moet die maar gaan zoeken.Wat ik intrigerend vind is dat alle boeken één boek geworden zijn doordat ze op hetzelfde niveau doorzoekbaar zijn, dat is ongelooflijk opwindend. Dat is een andere manier van lezen die zelfs Otlet zich niet had kunnen voorstellen. Ze zouden zot worden moesten ze dit weten.
Even though this is hardly news after almost two decades of Google Search ruling, little seems to have changed with respect to the forms and genres of writing. Loyal to standard forms of publishing, most writing still adheres to the principle of coherence, based on units such as book chapters, journal papers, newspaper articles, etc., that are designed to be read from beginning to end.
Still, the scope of textual forms appearing in search results, and thus a corpus of texts in which they are being brought into, is radically diversified: it may include discussion board comments, product reviews, private e-mails, weather information, spam etc., the type of content that used to be omitted from library collections. Rather than being published in a traditional sense, all these texts are produced onto digital networks by mere typing, copying, OCR-ing, generated by machines, by sensors tracking movement, temperature, etc.
Even though portions of these texts may come with human or non-human authors attached, authors have relatively little control over discourses their writing gets embedded in. This is also where the ambiguity of copyright manifests itself. Crawling bots pre-read the internet with all its attached devices according to the agenda of their maintainers, and the decisions about which, how and to whom the indexed texts are served in search results is in the code of a library.
Libraries in this sense are not restricted to digitised versions of physical public or private libraries as we know them from history. Commercial search engines, intelligence agencies, and virtually all forms of online text collections can be thought of as libraries.
Acquisition policies figure here on the same level with crawling bots, dragnet/surveillance algorithms, and arbitrary motivations of users, all of which actuate the selection and embedding of texts into structures that regulate their retrievability and through access control produce certain kinds of communities or groups of readers. The author's intentions of partaking in this or that discourse are confronted by discourse-conditioning operations of retrieval algorithms. Hence, Google structures discourse through its Google Search differently from how the Internet Archive does with its Wayback Machine, and from how the GCHQ does it with its dragnet programme.They are all libraries, each containing a single 'book' whose pages are URLs with timestamps and geostamps in the form of IP address. Google, GCHQ, JStor, Elsevier – each maintains its own searchable corpus of texts.
Corporate journal repositories exploit publicly funded research by renting it only to libraries which can afford it; intelligence agencies are set to extract texts from any moving target, basically any networked device, apparently in public interest and away from the public eye; publicly-funded libraries are being prevented by outdated copyright laws and bureaucracy from providing digitised content online; search engines create a sense of giving access to all public record online while only a few know what is excluded and how search results are ordered.
Digitisation and posting texts online are interventions in the procedures that make search possible. Operating online collections of texts is as much about organising texts within libraries, as is placing them within books of the web.
Originally written 15-16 June 2015 in Prague, Brno and Vienna for a talk given at the Technopolitics seminar in Vienna on 16 June 2015. Revised 29 December 2015 in Bergen.