Chapter 4                                                                                           


Electronic Infrastructure for Representing Script Acts


Type faces—like people’s faces—have distinctive features indicating aspects of character.

            Marshall Lee, Bookmaking (1965)


Humanity, technology,
is never merely good or bad---or worse:
     authentic or unnatural,
but somewhere in the greys our habits spread
      as the brain's best stab at rainbow.

Herbert, W. N., Get Complex [from Cabaret McGonagall] (1996)


This chapter is divided into two parts, reflecting the difficulties I have had in finding a balance between desire and fulfillment, between theory and practice.  The first part maps out a conceptual space for electronic representations of literary texts, the second part reviews a chaos of practical problems and specific cases that have yet to be resolved.


Part I—A Conceptual Space of Electronic Knowledge Sites


There was a time when all scholars, textual and literary alike, desired one thing in a text: that it accurately represent what the author wished it to contain.  The paradigm was God as author and sacred writ as text.  Texts that were true to their author’s intention contained truth that was worth every effort to get the text right.   Textual scholarship in this model was devoted to two complementary but opposite propositions: that the text must be preserved from change, protected from the predations of time and careless or malicious handling, and that the text must be changed to restore the pristine purity it had lost through neglect and time.  Correctness and control were the watchwords of this type of work.


The paradigm scholarly edition for such a view of work and text was a critically edited (emended) text to reflect the true text with an apparatus that showed the differing readings of authoritative source texts and a variorum of previous editorial or scribal conjectures and commentary.


Then God died, followed closely by the author.  What had seemed like a cooperative enterprise between textual and literary critics to get the author’s words right in order to get the author’s meaning right fell first into a division of labor and then into a division of goals.  Literary critics found that the difficulties and impossibilities of recovering an author’s meaning were happily replaced by textual appropriations, reader responses, and the study of what texts could, rather than did, mean.  Textual critics, though appearing to fight a rearguard action, discovered that texts were more than simply correct or erroneous. Textual shape was in flux, affected by authorial revision and by the acts of editors meeting new needs: new target audiences, censorship, and the tastes of new times.  Tracing the history of these textual changes and their various cultural implications became an activity parallel to that of literary critics pursuing new ways to (mis)read texts.


In this new atmosphere the old paradigm scholarly edition would not do.  The new paradigm has not yet been designed, though limited prototypes abound. Several questions about the new paradigm must be asked and answered:

What is the goal of a scholarly edition? 

How can it be constructed? 

How should it be published? 

Who will use the scholarly edition?

How will it be used?  And perhaps most important for textual critics,

Who can or who should be in control? 

Who will pay for the scholarship, construction, and dissemination?


These questions have no correct or permanently viable answers.  Because of the multiple points of view and the multiple uses to which texts are now put, no text is per se more important than any other text for all purposes; and, therefore, there is no text that can be agreed upon as everyone’s goal text for an edition.  But it does not follow that any text will do for any purpose.  A user who wants to know what hymns Emily Dickinson knew is not going to find the answer in the hymnal used today in Congregational Churches.  Modern paperback editions of Shakespeare are unlikely to give an idea about what 18th Century readers understood the plays to be. The goal of the scholarly edition will depend on the uses to which it will be put.  Some texts are inappropriate for some uses. 

From that point of view, the goal of a scholarly edition of any work should be to provide access to specific texts—not to the universal text.  And of course the construction and presentation of each scholarly edition should specify which text is which.  From that it follows that readers should be able to select from the texts the specific one that is appropriate for the kinds of questions that will be asked of it.  Can all texts, in forms designed for appropriate uses, be provided in one electronic archive in a way that will not confuse users?

The answer to the questions How shall it be constructed? and How shall it be published? are increasingly clear: scholarly editions should be constructed and published electronically.  The print alternatives must either be content with a single text—either falsely presented as a universally useable text or honestly presented as just one of several possible texts and inadequate for some critical purposes—or expand to multi-volume print editions of each work.  This condition works fairly well for those authors whose stature commands the resources in funds and intellect and dedication to sustain multi-volume publication. Electronic editions, one must admit to begin with, require all the same research and dedication required by major print editions—and they may be even more demanding because the medium offers space and method for practical ways to give more.  But the two main reasons electronic editions are now the only practical medium for major projects are that such projects are open ended and can be added to and manipulated after their original editors have retired, and, second, that only this medium actually gives users the practical power to select the text or texts most appropriate for their own work and interests. 

The electronic solution has the added potential to give end users tools enabling them to take possession of the electronic edition and to enrich and personalize it—even more so than they now do with their dog-eared, underscored and interleaved print books.  This idea, that users might customize editions according to their own views, has long raised a bugaboo about electronic editions—one that is actually bourn out in projects contenting themselves with the archival model.  It is that editors duck their responsibility to give users a “properly edited” texts by assuming, with no basis in history whatever, that there exist users who wish to do all their own collating and emending and checking of the facts.  Such readers exist, if at all, in market-negligible numbers.  But this vision of an unmediated archive of texts does not fulfill the goal of creating editions that “users can appropriate, enrich, and personalize.”  The tools I imagine here are not the basic tools for analyzing and editing documents to create a scholarly edition. That work is to be done by the editors.  What readers should be able to do is second guess the editor, make local notes and even changes, and create links, extract quotations, and trace themes using electronic tools associated with the edition.  They should not find that turning down page corners, underlining, and making marginal notes in a cheap paperback is easier than doing comparable things with the electronic edition.


Whereas in the earlier paradigm editorial control was paramount, in the new model edition, control should be passed along with the edition to its users.  The main reason for this is that, whereas there may have been a time when the editor served the main interests of the user by providing a text that approximated a general view of what the text should be—a time when the words “standard text” and “established text” had general currency and meaning—it is now the case that users have differing specialized needs.  This condition is not affected by the fact that many literary critics have no interest in the authenticity or condition of the texts they use, or by the fact that some literary critics are in principle opposed to the notion of the integrity of texts. It is, nonetheless, the case that for many sorts of literary inquiry and commentary, what text is used makes a difference.  The publication of James L. W. West’s edition of Theodore Dreiser’s Sister Carrie created a furor in some critical quarters because, by eliminating the effects of Dreiser’s friend George Henry and the effect of the publisher’s editors, West “created” a Sister Carrie radically unlike the Sister Carrie that had been known for eighty years.  Any attempt to understand the original reviews of the novel would be confounded if studied in relation to West’s new “established” text because reviewers hadn’t read that book.[1]  It is not surprising that furors arise only when radically different views of the work are at stake.  See Warren and Taylor’s collection of essays on what constitutes Shakespeare’s King Lear in The Division of the Kingdoms or see the controversy surrounding the publication of Binder and Parker’s manuscript edition of Stephen Crane’s The Red Badge of Courage.[2]  This does not mean that smaller unpublicised textual histories and textual differences do not matter.  Many examples of small but significant variants were revealed in the series of reviews of classroom editions fostered by Joseph Katz in Proof.[3] 


The point is that for critics who care which text they are using as a basis for the arguments they are making, a scholarly edition that offers them access to the right text for the task is preferable over a “standard” text that eliminates the elements of greatest interest simply because the editor did not anticipate such a user or because the editor disapproves of that form of inquiry.


  Much of what follows is offered as analysis of the difficulties and potential answers to the question How can a scholarly edition be constructed, but the emphasis will be on why a full view of script act theory makes the forms of representation necessary and useful, rather than on technical or practical advice about hardware or software.


It is widely asserted that electronic technologies have changed the nature of textuality.  The function of the chapter three has been to draw one portrait of the nature of written textuality.  One could conclude that textuality’s nature has been constrained during the Gutenberg era, indeed, since the first commitment of text to paper.  Manuscript and print texts both “speak” primarily linearly and singularly.  Efforts to have these forms speak simultaneously, in chorus, radially, or by indexed random access have worked marvelously well in print for the committed few willing to learn the coding and turn the pages and hold the book with fingers in multiple places at once.  For the many who are unwilling to invest that type of commitment, the thrills of the single linear text suffice.  And it is still an open question whether that will not continue to be the case, though the advent of DVD movies with editors’ and directors’ introductions, commentaries, alternative cuts, and outtakes suggest that, given sufficient ease and intuitive access, not only scholars but general readers would find multiple forms of works and information about “making” to be of interest.  It can be questioned whether textuality, in the constrained form of print, has been allowed to reveal its nature fully.


It can still be argued that texts were not constrained by print technology but, instead, were designed specifically for print technology.  This argument might hold that while electronic media have provided novelists and poets in the computer age with new visions about how and what to write, it would be inappropriate to drag texts written with print design in mind—indeed, written with no notion of any alternative “condition of being” other than print—into an electronic environment with some notion of releasing them from the constraints of print.  Such acts might better be termed “adaptations” rather than “editions” or even “electronic representations of print literature.  But I believe that argument puts the opportunities and conditions of electronic editions too simply and starkly.  In what follows I distinguish between the historical condition of print texts—which are “enshrined” in the notion of the textual archive (actual or electronic)—on the one hand, and the use of tools to investigate texts both as processes of composition and production and as instances of historical script actions.  What is being “electronified” in an electronic scholarly edition is not the texts but the access to texts and textual scholarship.  The potential effects are profoundly textual, both in the sense of changing readers’ relationships to the text and changing their interpretations and uses of texts.


The reading strategies now taught in schools and universities, and the literary theories that explain and justify every conceivable appropriation and twist of text, may have seemed necessary as compensation for the ambiguities and uncertainties of textuality imposed by its print form and the consequent clumsiness of attempts at choreographed and harmonic arrangements in print.  Scrolls by their form emphasize the linearity or works, enabling compact packaging but very clumsy movement from one part of the work to another. (Imagine a scroll with cross references or endnotes!) The codex (book with leaves, as opposed to scrolls) maintained the compact packaging and linearity but added “random access” to the extent that readers could keep fingers positioned strategically at various openings for quick reference.[4]  If metaphor can allow one to clarify differences in how textuality could fulfill its “nature”, one could say of the codex that it provided texts with an architectural habitation with very limited openness.[5]  Its varied fonts, its footnotes, running titles and side notes, its appendixes and indexes, its illustrations, tables, charts, and maps, and more recently, its attached recordings, videos, fiches, and CDs all showed a remarkably inventive openness to organization, packaging, and readerly navigation.  And yet, in the end, in the print world every book had a closing date, a production schedule, a publication date and then the making process ended.  Every part remained fixed and immovable relative to every other part.  The codex was flexible and extendible but only in the limited ways captured in the metaphor of architecture—once built, it could be added to or renovated, but not easily.  Both actions required publication production acts from the ground up in order to enable structural change. Readers with both the original and the revised print edition could see and use first one form and then another and then the first again.  But the normal impulse of readers would be to see one as the replacement of the other—as though a house had been torn down and rebuilt. Architecture is, then, perhaps the wrong metaphor in which to encapsulate the concept of textuality.  Perhaps architecture is too small a vision. 


We could try “infrastructure” with its evocation of roads, streets, alleys, bridges, sewage systems, electrical grids, traffic lights, wall plugs, and appliances each contributing in a flexible way, inviting by its openness the invention of new concepts of organization and new instruments for the enhancement of human action.  It appears that electronic environments could aspire to work as an infrastructure for textuality—a concept that allows for multiple notions of what constitutes a text and what sorts of approaches to it should or could be taken and what instruments could be devised to enhance human actions in relation to texts.  If texts, like food, water, clean air and means to remove waste, are food to the mind and spirit, nourishing, cleansing, beguiling, and enhancing human action, then texts must have many means of being brought to us and of being used.  Dickinson’s “there is no frigate like a book” might be paraphrased for electronic texts somehow.  But how?  It remains to be seen if an electronic architecture or infrastructure for written texts can be conceived and then devised that will alter the conditions of textual habitation and make texts stand forth in what will appear in practice to be a new nature of textuality.


The images of architecture and infrastructure both suggest human planning, strategies, and goals with human development of means for achieving them.  It has been suggested that textuality might find a better metaphor in the coral reef.[6]  A sense of natural development and symbiotic relations and mutually dependant developments in a hugely complex natural interaction under the control of no one in particular and eventuating in breath-taking beauty may be an attractive alternative vision.  But I cannot go there.  Texts are human inventions constituted by humanly devised sign systems and mechanical means of production and distribution.  Its conventions are of human invention and agreement. Humans ruin rather than build coral reefs. It is true that language grows and changes in spite of French Academies and Websters’ dictionaries, but insofar as humans create texts of great complexity and dexterity through the conscious manipulation of the conventions of writing, it seems necessary to provide conscious ways to enhance one’s ability to comprehend the functions, meanings, purposes, and even intentions of their creation and manufacture.  Coral cannot be prevented from forming on human structures placed in coral-friendly environments.  Nor can misuse of tools—using a screwdriver for a hammer or a cooker to heat a house—be prevented.  Unintended consequences and unintended uses are inevitable in all human action.  But if we are to explore the textual potentialities of the electronic environment, we cannot leave it all to chance.  Just look at the texts already proliferating like parasytic zebra mussels on the Internet, clogging the exchange of reliable information.  In a coral reef it might be difficult to distinguish between a Project Gutenberg, a Rossetti Archive, and Chaucer’s The General Prologue on CD-ROM.[7]  Texts on screens look remarkably alike, despite profound differences in quality, and search engines tend to throw them up in lists prioritized by elements other than textual acumen or reliability.


The purpose of this chapter is, first, to imagine the difference that the enriched and more dexterous medium of electronic editions will bring to text presentations and, more important, to receptions of literary works; and, second, to suggest a space and a shape for developing electronic editions that will serve not only as archives but as knowledge sites that would enable the kind of reading imagined.  The space and shape I will try to describe is one where textual archives serve as a base for scholarly editions which serve in tandem with every other sort of literary scholarship to create knowledge sites of current and developing scholarship that can also serve as pedagogical tools in an environment where each user can choose an entry way, select a congenial set of enabling contextual materials, and emerge with a personalized interactive form of the work (rather like a well-marked and dog-eared book), always able to plug back in for more information or different perspectives. 


In spite of the advances already made in the medium of electronic texts, I do not believe we have fully understood or exploited the capabilities of electronic texts.  I think our slow adaptation to the medium arises in part from the narrow concept of textuality to which we have been habituated in print culture and in part from a too easy satisfaction with the initial efforts to transport print to marginally improved electronic forms.  Attempts to create single comprehensive edition-presentation software may also have slowed progress by investing effort in closed systems not designed for expansion or adaptation beyond the purposes of the particular project at hand.  In any case, it has resulted in many promising but limited or dead-end projects.


What developers of electronic scholarly editions to date have in common is the absence of a full array of interactive and compatible tools for mounting full-scale electronic scholarly editions. Because most of what we have learned about creating electronic editions comes from the work of individual scholars or small teams working in isolation on specific scholarly projects, the pieces of the puzzle are scattered and frequently incompatible.  Each project is built on a particular platform (Macintosh, Windows, Sun, etc.), using particular text formats (word processors, typesetting or formatting programs, HTML, SGML, XML, etc.), to archive texts with a range of particular characteristics (hundreds of scribal manuscripts or just one authorial manuscript; a few printed sources or multiple authorial manuscripts; fair copies or heavily revised manuscripts or palimpsests, etc.), in order to produce editions conceived in particular ways (as databases for philological studies, as archives of manuscripts, as repositories of the “most authentic” or “most important” documents, as critically edited texts), with or without illustrative materials (paintings, drawings, sculpture, architecture, maps, charts), designed to show textual fluidity or textual stability.  It is not surprising that each project has made choices for software or choices for arrangement or choices for access that depend both on the nature of the materials that are being edited and on the nature of the scholarly interests of the editors or the audience they perceive. It is a complex situation that has been and is being addressed but for which a generally accepted solution has yet to emerge. There is great hope that greater compatibility will be achieved with the TEI (Text Encoding Initiative) and XML (Extendible Markup Language)—encoding language that sets standards for data files and mark-up so that multiple tools can access the same data. 


The chief characteristic of this account of the current state of things is that each developing scholarly project is tied fairly closely to a particular set of tools and markup protocols.  One scholar’s data is not easily accessed by another scholar’s tools.   This is so in part because texts and scholarship are often just as proprietary as the software used with them.  Copyrights are relevant to the problem.  Just imagine a new James Joyce Ulysses electronic edition with an archive of files representing every extant stage of manuscript from first drafts through marked proofs and revisions in later printings.[8] Imagine the archive to be fully linked so that variants can be accessed.  Imagine it copyrighted and sold. And then imagine that another scholar/IT technician develops software that can take the archive and crawl through it in such a way as to so show, at any speed the user wants, the process of writing for any given passage, so that the user can watch it grow and change.  Give the user VCR controls for rewind and pause.  And provide a window for commentary.  Then ask, how can that new piece of software be used for Portrait of the Artist, or Beckett’s Stirring Still, Cary’s The Horse’s Mouth, and Stoppard’s Dirty Linen—assuming there were archived files of these works marked up and in a condition to be enhanced by added markup.


The world of electronic scholarly editions may be working towards it but has not yet achieved a condition in which scholarship is invested modularly into the development of marked archives, marked commentary and annotation, marked analysis of text variation and genesis in such a way that the results of scholarship could be employed modularly with a variety tools for display of static texts, for display of dynamic texts, for selection of texts, for manipulation of texts, for accessing commentary and annotations, or for personalizing editions for a variety of critical, historical, linguistic, or philological uses. 


In most cases the electronic editions now on offer do not serve as models for the construction of new editions of works other than those whose basic characteristics are like those of the project already undertaken.  Thus, stand-alone electronic editions of  Beowulf or King Lear and works by Samuel Beckett and Marcus Clark have developed not only the files of text and scholarship associated with major scholarly editions but have created or aggregated non-interchangeable electronic tools for their use. The net result is an individualization of the project both in its materials and its modes of storage and retrieval.  Even collaborative projects and centers of electronic editing have produced limited and limiting results.  The editions surrounding Peter Robinson’s amazing work on Chaucer tend to be works with similar textual histories—Dante and the New Testament, not Shakespeare, Joyce or Thackeray.   Likewise, the projects produced at the Institute for Advanced Technology in the Humanities (IATH) at the University of Virginia tend to be works for which illustrative material is of high importance (Blake and Rossetti) and where the concepts of archiving, imaging, and commentary is more valued than that of critical editing.  This is not to say that these projects are less good than they could have been.  Without them we would have a hard time imagining improvements.


This is early days, though the enthusiasm of those involved in the more elaborate prototype editions vents itself in statements like, "I think one can do an awful lot with XML and XSL, and I think what we lack in the infrastructure right now is a good, free XML search engine that would support Xpath and Xquery.  If we had that, I actually don't think there would be a whole lot to complain about."[9]  Well, we don't have that (or didn’t when I wrote this), and we do not have several other important things—or we have them in isolated and incompatible platform-dependant forms.  What we have now will not serve for very long technologically and does not meet and never has met demands from a scholarly point of view.  If one were to put together the extraordinarily dexterous and beautiful presentations of electronic editions being done at IATH[10] with Peter Robinson's extraordinarily complex combination of text collation and beauty of presentation for the Chaucer and other medieval projects at De Montfort Univeristy,[11] and Paul Eggert's and Phil Berrie's collation and conversion tools, and authentication processes at the Australian Defence Force Academy,[12] and Eric Lochard’s ARCANE authoring project involving an extensive array of charting, mapping, time-lining, and other types of verbal and graphic annotation and a panoply of output capabilities,[13]  and the comprehesive organization of materials and access planned by the HyperNeitzche project[14]—in short, if one had comprehensive scholarly compliations of the documents of a knowledge area, beauty of presentation, imaging, collation on the fly, constant self-check for authenticity, writer's tools for annotational linking, multiple forms of output (to screen, to print, to XML, to WORD, to TeX, to PDF to others), sound, motion, decent speed, decent holding capacity, friendly user interface, quick navigation to any point (fewer than three clicks), and scholarly quality—and if one had these capabilities in authoring mode,  augmenter's mode, and reader's mode, in a suite of programs with similar interfaces all workable on multiple platforms so that they were not too difficult to learn or to port from one set of equipment to another, and so that the tools developed for one archive could be easily adapted for use with another archive—then we would have something to crow about.  We would also have something to write permanent how-to manuals about.  Instead, what we have are hundreds of experiments—some of which do a very good job of surveying the ground and mapping improvements, as for example De Smedt’s and Vanhoutte’s Dutch electronic edition of Stijn Streuvels' De teleurgang van den Waterhoek.[15]


Because the means in both software and hardware are still in a rapidly developing infancy, technical problems have dominated discussions of how to produce scholarly electronic editions.  When an editorial project is defined primarily as textual scholarship in the hands of literary scholars who are amateurs in technology but who want electronic presentation and distribution, complicated textual issues often find only tentative technical solutions. Or when a new editorial project is defined primarily as electronic rather than textual and is placed in the hands of technicians who are amateurs in literary and textual scholarship, the tendency has been to make beautiful and eloquent technical demonstrations relatively obvious, simple, or flawed notions of textual issues. Casual observers will invariably be much more impressed by the technical genius of the latter than by the textual complexity and nuance of the former because casual observers do not actually use scholarship, they only look at it.  The merits of a knowledge site are not to be measured by the reactions of tourists. 


A full-scale electronic scholarly edition should allow the user to answer quickly and easily questions about the work that might affect how it is used. 

   A. The documents

1. What are the important historical documentary forms of this work?

2. Can I choose a specific historical document as my reading text?

3. Can I choose a critically edited form of the work as my reading text?

4. Can I see photographic images of any of these forms of the text?

5. As I read any text can I pause at any time to see what the other forms of the text say or look like at that point? I.e., are the differences mapped and linked?

6.  As I read any text can I be alerted to the existence of major variant forms? or all variant forms?

7.  Can I alter any given reading text to represent my own emended version of it?

8.  Can I read descriptions of the provenance of each document?

9.  Can I access the editor’s informed opinion about the relative merits or salient features of each documentary version?

   B. The Methodology

10.  Can I read the editor’s rationale for choosing a historical text as the basis for an edited version and can I find an explanation of the principles for the editor’s emendations?  Are all emendations noted in some way?

11.  Is there an account of the composition, revision, and publication of the work?

12.  Is there an argument presented for the consequences of choosing one reading text over another?

13.  When variants are being shown, is there editorial commentary available about them?

14.  Are ancillary documents such as illustrations, contextual works, letters, personal documents, or news items available either in explanatory annotations or in full text form?

15.  How was accuracy in transcription assured?

   C. The Contexts

            16.  Are there bibliographies, letters, biographies, and histories relevant to the composition or the subject of this work or guides to the author’s reading? 

            17.  Are there guides to existing interpretive works—from original reviews to recent scholarship and criticism?

            18.  Are there adaptations in print, film, or other media, abridgments, or censored versions that might be of interest?

   D.  The Uses

            19.  Is there a tutorial showing the full capabilities of the electronic edition?  A guide for beginners?

            20.  Are there ways I can do the electronic equivalent of dog-earing, underlining, making marginal notes, cross-referencing, logging quotations for future use? Can I write an essay in the site with links to its parts as full-text documentation and sourcing?

            21.  What other things can I do with this edition?


Because there is no overarching goal or theory or analysis of what electronic editions can be, there has yet to be developed a sense among scholarly editors working on electronic editions that they are contributing to a system of editions that participate in a communal goal, nor, with the exception of TEI and perhaps XML, has there developed a very widely accepted sense of “industry standards” that would enhance the notion of interchangeable modules for edition design and construction.[16]  Consequently advice about particular software and hardware dates rapidly.  And nobody knows all the answers.


It Takes a Village

Creating an electronic edition is not a one-person operation; it requires skills rarely if ever found in any one person.  Scholarly editors are first and foremost textual critics.  They are also bibliographers and they know how to conduct literary and historical research.  But they are usually not also librarians, typesetters, printers, publishers, book designers, programmers, web-masters, or systems analysts.  In the days of print editions, some editors undertook some of those production roles, and in the computer age, some editors try to program and design interfaces.  In both book design and electronic presentations, textual scholarship often visibly outdistances the ability of these same person’s amateur technical attempts at beauty and dexterity.  Yet, in many cases, textual critics, whose business it is to study the composition, revision, publication, and transmission of texts, have had to adopt these other roles just to get the fruits of their textual labor produced at all or produced with scholarly quality control.  It may even seem to some that it is the textual critic's duty, in the electronic age, to become an expert in electronic matters, perhaps for the same reason some editors became type compositors—they do what they have to do in the absence of the support that would provide them with the necessary team.  On the other hand, it has also occurred that some very adept programmers and internet technologists have undertaken editions, often with results in which the beauty of professional design surpassed the amateur textual scholarship invested.  Such persons need a team as well.  The division of expertise has led to the present situation—one in which the technological answers are limited to the needs of a particular scholarly project or to those of very similar projects in a single field. 


As can be seen from the chart, below, of a possible knowledge site (as opposed to an archive or a scholarly edition), it will require a community with a life beyond the lives of the originators of scholarly projects to maintain and continue such projects.  I believe this will happen, just as communities have arisen to support libraries and to support scholarly journals and to support specialized research institutes that outlast their founders, so will communities arise around knowledge sites.  If a search engine like Google is a model for access to information—a model that truly seems like a coral reef in which every sort of life, low life as well as high, is tolerated—then the knowledge site, as a collaborative effort outliving its originators can grow and develop through changes in intellectual focuses, insights, and fads and accommodate new knowledge in configurations that may augment or correct rather than replace the work that went before.[17]


Although strictly speaking scholarly editing focuses on the study of the composition, revision, publication and transmission of texts, it yet behooves textual critics to be knowledgeable about the computer technology because knowledge of the means of achieving the aims and goals of final presentations and functionalities will affect every decision being made from the beginning of research to the final enhancement or final abandonment of the project.  Equally, textual scholarship requires the services of Internet technologists.  And both types of expert need the input of those who have thought about how readers assimilate complex textuality.  This is not a case of simplifying, dumbing down, or compacting complex textual situations; it is a case of providing access to textual complexity as a highway rather than as an obstacle course. Clarity, not simplicity.


Every textual scholar who has ever started a sentence with the words, “The goal of scholarly editing is . . .” has been accused of narrowness or waywardness regardless of how generic and bland the following statement may be.  Nevertheless, whatever else it may also include, whether a book, a CD,  a web site, recorded texts or some combination of these or a new idea yet to be conceived, the goals of a scholarly electronic edition or knowledge site could or should include the presentation of a text or texts of a work, edited according to principles and methods explained by the editor in accord with the editor’s understanding of the works modes of existence.  One could try for a more straightforward account—the truth and nothing but the truth about scholarly editing: the presentation of the texts, their variants, their origins, the production processes, their reception, along with commentary about these textual matters.  However, the straightforward statement begs too many questions and seems not to acknowledge that the evidence of textuality—the extant historical documents—cannot be handled, transcribed, or presented in objective or neutral ways.  Each editor, knowingly or naively, after having identified and analyzed every extant form deemed by that editor to be relevant, defines relevance and proceeds to transcribe, edit, and annotate, according to a particular "orientation to text." I described these in detail in Scholarly Editing in the Computer Age (1996) as bibliographical, documentary, authorial, sociological, and aesthetic.  Although editors may appeal to arguments from a mixture of these orientations, no single act of editing can conform to more than one at a time.  Presentation of texts fulfilling the demands of one orientation distorts the record for those trying to access the texts from a different orientation.  These differences may in some cases be trivial, but in others are quite important.[18]   Even editors intending to mount all the relevant texts for a work on an electronic site, must analyze those texts and provide an explanation of the relations between them.  That is not possible without comparing the texts word for word, letter for letter, punctuation mark for punctuation mark, and comparing and analyzing the iconographic differences in the paper, type fonts, and page and cover designs.  Eventually editors will deal with works that never existed in any but electronic forms, and their concerns may be different from those addressed here.  The concept of a knowledge site developed here may provide ways to accommodate the entire range of orientations to text as well as the whole range of extant texts.


Electronic scholarly editions either already can, or promise soon to be able to, offer to both editors and edition users considerably more than was possible in print editions.  That is, print editions were almost always faced with limitations imposed by economics of publishing, and by a split desire to serve a general reading public who wanted a simple but sound text and to serve a small tribe of scholars who needed the whole textual record. Print editions never actually managed to be all things to all people.  The knowledge site imagined here, constructed modularly and contributed to by “a village” of scholars could never get itself printed as an integral whole, though most of its parts have been or could be printed in smaller units.  It seems logical now, when undertaking a scholarly edition to plan to produce it as an electronic knowledge site with a variety of tools for accessing its materials and taking advantage of its incorporated scholarship.  If there are to be print scholarly editions also, they should probably be thought of as offshoots from the electronic edition, targeted to specific audiences or for specific uses such as reading or teaching as opposed to prolonged and detailed study. Although historically the print edition precedes the electronic one, even at this early stage of electronic text development, it is becoming backward to think of creating a print scholarly edition and then retool it as an electronic edition.


There are several reasons for this about-face in the procedures of editorial scholarship.  The primary one is that computer-assisted scholarly editing has already computerized every aspect of transcription, collation, revision, and record-keeping.  The production of print editions by manual means is virtually unknown any more.  It is inconceivable that anyone would produce a scholarly edition using the eqipment and procedures standard in the 1960s.  Now, although the production of print editions from electronic data will probably never cease, it seems much more sensible to aim from the beginning of research at the larger possibilities of electronic publication for the full-scale scholarly work.  It would be backward to aim now at a print scholarly edition because at almost every stage of preparing a print scholarly edition compromises are made and decisions are made about what to leave out that do not have to be made for electronic editions.  Instead of the compromise or the elision of material, in the electronic edition decisions have to be made about navigation—at what level and by what means will esoteric bits of information be accessible?  


If one thinks of print editions as off-shoots from a major electronic knowledge-site project, one can think of them as targeted to specific audiences or markets, based on the best and fullest knowledge of textual maters but trimmed and shaped for specific users, particularly casual or student users, who deserve the best access to a work for their purposes.


But because we already have many valuable print scholarly editions and many editions in progress that were designed for print first, it is useful to think of the problems of conversion and even of using electronic products as supplements to already printed editions. For projects already begun as print editions, the process necessarily still proceeds from print to electronic form. Soon, however, that stage will be over.


Industry Standards and Modular Structures

As mentioned above the only generally agreed upon industry standard for electronic scholarly editions to date is the TEI standard markup system.  The World Wide Web and XML provide a standard meeting place for editors, technicians, and edition users and access to texts, scholarship, and tools for enhancing the use of texts.  These qualify as standard in my definition simply because they apply across platforms and are used by many types of software. How these meetings will take place and how access will be achieved and how tools will be configured and deployed are all questions still being explored and answered only tentatively.  


Though highly touted and potentially very serviceable, a serious down side of the XML (and HTML and SGML before it) standard is that it does not allow what is called overlapping hierarchies—that is, the ability to install two or more ways to structure and to look at the same work.  For example, if one divides up a work by making the title, the chapters, the paragraphs, and the sentences serve as the units, one cannot then also divide up the work according to its material makeup—sheets, gatherings, leaves and pages—because a paragraph may begin on one page and end on another and XML requires that one close everything that was opened in one category before opening a new one—if a paragraph opens on one page, it must be closed before closing the page and opening a new page on which the paragraph can continue, once reopened. This would not be a problem if everyone would just agree that an essay or a chapter consists of paragraphs and that their arrangement on pages is irrelevant—but we don’t. Imaginative people have developed more or less clumsy ways around this limitation, but what is needed is a language and markup system that allows overlapping hierarchies.   But the purpose of this section is not to berate the current system, but to imagine a technological environment and structure for presenting complex textuality in logical, clear, and user-friendly forms.  Perhaps once it is imagined it can be built.


The disadvantage of industry standards, generally speaking, is that as research and development take place, regardless of the field, situations will arise in which one will want to do something that was not foreseen when the standards were set and that is not allowed by those standards.  The advantage of standards, if they are flexible and versatile enough, is that they make it possible to share services and interchange parts without affecting the functionality of the whole or of other parts.  A modular approach to the functions of an electronic edition / archive / knowledge site may help us achieve the flexibility and compatibility we want.  An outline of the editorial and reader functions and the types of materials and sets of information that affect either the editing or the reading process is set out here as an indication of the areas for which software is needed.  Much of the software already exists—that is, the ability of computers to handle the target tasks has already been demonstrated.  Many of these solutions were developed in such a way that the basic materials of the edition / archive / knowledge site could not be accessed and manipulated and added to or commented on without having to change from a PC to a Macintosh or Unix based platform and without having first to convert the text from XML or Word to TEX or Quark or something else in order to be able to run the software.  Already the solutions thus developed, and new solutions to problems of access and manipulation of data, are being transformed as data (texts and commentary) have migrated to XML-encoded form and the tools have been altered to deal with such data in multiple platforms.


It may be worth repeating, before launching this overview, that I do not imagine any one reader will wish or be able to use or attend to all these parts at any one time.  The point is to provide a place where different readers can satisfy differing demands at different times from the same set of basic materials[19] using an ever-developing suite of electronic tools.


It is also important to remember that this outline attempts to cover all forms that literary works take, and that any given literary work may lack some of the materials or its treatment may emphasize some parts over others.  And some projects will begin with what its directors think most important and leave other parts to be developed by future scholars.  The structure being imagined is one that is open and extendible in all directions.[20]


Insufficient input has been brought to bear from studies of textuality and of how people either do or can read.  It is as if we need a new profession to complement the professions of textual criticism and of electronic programming.  It should be the profession of textual reception, exploring not only how people read and study texts but how they could study texts.  Such a field of inquiry would develop a design for text presentation driven by how to create user-friendly access to all the materials and levels of signification inherent in textuality.  Perhaps a department of compter humanities or humanities computing could house such a profession.  My main point here, however, is not to imagine the mass development of readers or even one reader who would be interested in all the parts of an edition or knowledge site, but instead to imagine a heterogeneous readership wanting a variety of different things which can be accessed from a single but complex knowledge site providing access to a range of specific texts of a work and the tools to use them variously.  The electronic scholarly “knowledge site” must be capable of handling every reader even though no single reader will handle all the capabilities of the knowledge site.


Materials, Structures and Capabilities

            The chart below maps in the left column the range of materials and tools and relationships that a knowledge site needs to be capable of providing, while on the right it maps the questions or actions readers might wish to undertake, as presented in the boxed questions, above.

I. Textual Foundations

Basic Data

    Material Evidence

transcriptions of documentary

      data: ms and print texts

digitized images of same

Readers should be able to read each extant document in isolation and in full, either as a transcription or as a digitized image or both

    Inferred Data

transcriptions of critically

      developed data, edited texts

digitized image of designed pages

      for new text

New critical editions of the text, not necessarily just one, should be available, in both a firmly formatted form (like a book [.pdf, for example]) and as a searchable transcription.

Internal Data Links

     Collations: linking points of variance

     Emendations for critically edited texts

     Additional Material Facts

              (hyphenation, fonts, formats)

Readers should have access to variant forms—both image and transcript, regardless of which text they are currently reading.  Facts about the documentary texts should be available.

Bibliographical Analysis

     Physical descriptions of Manuscripts 

     Bibliographical Description or printed

          editions, printings and states

     Description of and histories of

          design, format, handwriting,

          typography, etc.

Readers can obtain information about the material production and manufacture of the physical objects that are the manuscript, proofs and books containing the text.

Textual Analysis

     Descriptions of Revisions Sites

     Explanations of convergence and

          divergence in texts

     Provenance and textual histories

     Identification of textual agency (who

          did what, where, when, how)

     Genetic analysis: composition,

          revision, production, manipulation

          censorship, appropriation, etc.

Provides information about the composition and revision of the work at every stage of its development, appropriation, or adaptation.  Identifies, to the extent possible, the agent of change and the time and place of change and any contextual information that would suggest motives for change.

II. Contexts and Progressions

Contextual Data (individualized for each

          stage of textual existence)

     Historical Introductions

     Biographical (for author, editors,

          composition and publication)

     Explanatory Annotations

     Verbal Analysis—style, grammar,

          word choices, genre, etc.

     Social, economic, political,

          intellectual milieu

     Links to full text archives of letters,

          diaries, ancillary materials

Provide as much access to the “things that went without saying” but that affect the uptake of the text.  Without this material, readers tend to make up or assume things which may not be relevant to the script act in hand.


     Links to sources, analogues,

          influences, coincidences, etc.

Provide a guide to those works against which or in connection with which the present work was written

Linguistic Analysis


     Use of italics for titles, ships,

          emphasis, foreign words, etc.

     Use of quote marks, ditto


     Syntax structures

Linguistic and stylistic analysis provide explanations for unfamiliar usages.

III. Interpretive Interactions

Reception History

     Reviews and criticism

     Literary Analysis

          Narrative structure


          Ideologies of gender, race, region,

               religion, politics, etc.

     Cultural Analyses

The history of the work’s reception can give context to any reader’s own reaction to the work.





     Radio adaptations


     Other appropriations

Provides the history of transmutations of the work—in description at least if not full text.  Capability for audio and motion pictures needs to be available.

IV. User Enhancements

New Markup

Reader/scholars can introduce new analysis markup to the texts

Variant Texts

Readers can emend and create new versions by mixing historical variants or introducing new emendations

New Explanatory Notes, Commentary

Readers can add information to the system

Personal Note Space

Readers can make notes and import quotations of text, audio and still or motion images.


The challenge is to house the materials, to provide the interfaces and links that create a navigable web; to provide access in such a way that growth and development of the knowledge site is encouragingly easy; and to provide tools that allow individuals to personalize their own access to the work.



Part II—Practical Problems


How will it be financed

It was not my intention to analyze the costs or suggest a way to finance this approach.  Rather, I wanted to analyze how script acts work so as to identify more broadly and deeply the desires and needs of readers and students of literature and to imagine the environments made possible by electronic representations in which such “thick engagements with texts” could take place.  However, if the costs were so prohibitive and the mechanisms for support of the system so inadequate as to make the whole enterprise an exercise in science fiction, it would scarcely be worth our attention.  What follows may or may not prove useful as a way to think about financing.


It appears to me that the financial considerations have several components that can be taken up separately but which in the end must be seen as coordinated or capable of being coordinated. 

The first category involves IT development of software, coordination of software, maintenance of knowledge-site computer files, and the never-ending need to migrate the system and its contents to new and better technology and to drop off discontinued technology.  Dissemination methodology belongs here, though other aspects belong in category four.  This category has both personnel and infrastructure equipment expenses.

A second category involves the scholarly development of the materials of each knowledge site, with its extensive bibliographical, textual, interpretive and interweaving tasks. This category has personnel, travel, photographic, and personal equipment expenses.

A third category involves the problem of review, refereeing, or gateway tasks that separates the wheat from the chaff and ensures the quality of knowledge-site knowledge.  This category involves personnel and communication expenses.

A fourth category involves permission fees, copyrights, and royalties.  In order for a knowledge site to make unique primary materials and copyrighted materials available for access, libraries, publishers, authors, and anyone with a vested interest in materials and the power to withhold that material must be addressed pragmatically—which probably means fees rather than just appeals to goodwill toward the intellectual community.  Dissemination by some financially feasible scheme belongs here, though its technical logistics belong in category one.


            Two points are worth making at the outset.  The first is that knowledge in the print world has found ways to finance the analogous categories of expenses: the world of publishing has invested and recovered enormous sums in printing and graphics equipment; the world of scholarship has supported large and small research projects that include most if not all the kinds of investigation needed for knowledge-site creation; the world of academe in conjunction with publishers have created wide-ranging networks of referees and gateways to uphold the quality of scholarship; and the world of libraries, archives, book and manuscript collection, and copyrights have all learned to live with the financial arrangements that make their existence possible.  Finally, in addition to the commercial aspects of this vast network of print-knowledge development and maintenance, there is the world of governmental, institutional, and private funding that is constantly adding financial support to the world of knowledge.  The first point, then, is that these worlds will all continue to play crucial roles as forms of re-representation for scholarly work under the rubric “knowledge site” become electronic instead of print.  These “worlds” support or are supported by knowledge, knowledge generation, and knowledge dissemination.  The fact of print, like the fact of electronic representation, has to do with medium not substance.  What looks like the sale of a physical book is in fact the sale of text with intellectual value.  It may be true that some publishers can sell physical books with false or inferior textual value, but that fact is not relevant to the world of knowledge—except perhaps as an irritation.  The quality of real knowledge is the bread and butter of real publishing.  Real publishers, real libraries, and real scholars are committed first to the quality of knowledge.  But they must be financially sustainable.


            The second point is that for the development and maintenance of electronic knowledge sites with democratic (i.e., affordable to most people) access, a pricing system different from that current in the book and database world probably needs to be devised.  Rather than the sale and purchase of a book at a given price, rather than the periodic payment of a subscription fee, and rather than the payment of a one-time or periodic license fee for receiving materials or for access gained to otherwise closed data-bases, a different approach is needed.


Academic institutions and funding agencies as well as the small world of scholarly editors have all failed as yet to come up with a full-scale solution to the complex problem of funding, training, development, maintenance, and distribution of large scholarly projects.  Of course there are spectacular exceptions: the Cambridge Bibliography of English Literature, the Dictionary of National Biography, the Oxford English Dictionary.  There are also spectacular success stories in the building of collections of materials—hundreds of collection successes in world-famous libraries.  But from the democratic student/scholar’s point of view these wonders are accessible in a limited number of places on earth, and for them the lack of funds to acquire collections or to travel to collections restricts access to written and printed knowledge.


  The infrastructure and social system that would provide and maintain the personnel required and the long-term support that would make real progress in electronic knowledge sites possible has not coalesced.  What is needed is the community of scholarship that over hundreds of years has developed around printed knowledge to conjoin in the development of electronic knowledge sites.  Like small villages growing together into great cities, the boundaries of knowledge sites can merge and interact.  It is a project for all scholars in document based disciplines working together—as they always have—in conjunction with the existing support systems found in funding agencies, academic publishing, and library systems.  But the primary focus needs to shift from the publication, dissemination, and maintenance of books, to the construction of electronic knowledge sites.  For this to work a new player is needed—major world-class browsers, searchers, and linking systems capable of unlimited growth, lightning speed, endless maintenance, and world-wide distribution or access—for profit.  It must be for profit, just as book publication is for profit; for, if it fails to maintain itself, it will fail the world of knowledge and scholarship. 


All our textual production skills for five hundred years have been devoted to print media and much of what has so far been done in electronic form consists of porting from print to electronic equivalents.  The exploration of what can be done has been driven by photography, the movie industry and librarianship. The exploration of how it can generate self-sustaining revenue has been driven by the history and practice of book production and sale. To me, the most likely development for revenue is not material sales or subscriptions but user fees.  Licenses or the sale of CDs or database access would not reflect value received nor the use generated.  But since accounting systems for tracking hits and charging “subscribers” are now well-developed technologies, pennies or half-pence per hit, generated the world over, would enable libraries to provide their patrons with access to a much more comprehensive and useful electronic repository of knowledge than any single library, not matter how big, could afford to purchase and house.  And royalties to contributing publishers, libraries, and scholars would also be tracked and paid based on use, rather than purchase or initial one-time fees.  Users would always pay for access to sites in their most developed and updated form and would not be stuck with last year’s purchase.  And because contributions to the knowledge sites would have to be vetted by the world of scholarship, the materials in the sites would, for the most part, be more reliable than that which could be found on the Internet at large.  Very large libraries with extensive holding and large numbers of users would, by this system both pay a great deal in use fees and receive a great deal in royalty payments from the world-wide access to their own unique materials.  Such payments would continue for as long as they continued to own the originals.  Publishers and authors would stand to earn use fees for as long as their copyrights were valid.  Small use-fees from all over the world would very likely exceed the income now generated by sale of books to a limited number of libraries.


Other scenarios have been tried or suggested; many are now in place.  My point is not that I have found the best or even a feasible structure, but rather that it appears possible to create a complex, comprehensive, world-wide, electronic representation of knowledge sites that are financially self-sustaining, and, thus, that can be developed, maintained, and function for many years—perhaps as many as Gutenberg’s 500 years and counting.


Some Language and Software Solutions

Despite its shortcomings, TEI conformant XML appears to be the best language and markup for transcriptions and other text materials.  Its primary shortcomings have been identified and revisions have been promised.  (Markup, for those to whom the concept might be unfamiliar consists of a system of tags or marks associated with sections or parts or items in a text file.  If a text file at its simplest consists of a steam of letters, spaces, and punctuation, markup provides identifiers so that various sorts of software can do a variety of things with the texts: identifying fonts (italics, etc), formats (headings, indentations, footnotes, links, etc), features (phonological, morphological, lexical, etc.), and a whole variety of association items (variant texts, annotations, instructions, etc.).  Markup can be rudimentary or rich; it can be solely bibliographical or linguistic or historical; it can be a mixture of these.  Different software accessing the same marked up files might focus attention on some tags and ignore or simply be unable to “see” other tags.)


Imaging, for the present time, has to be described in terms of its goals because the options are too many.  What is wanted is high enough resolution to make the image at least as readable as the original; tests have shown that some electronic images are more readable than the originals.  Color is wanted that will be represented with fidelity on different computer screens.  Reproductions of reproductions may have to be considered, but folk wisdom and technical knowledge suggests that images made from originals would be better.  Regardless of the solution, temporary though it may be, users have the right to know what was used as the basis of the image (an original or a reproduction) and what process was used that might have altered the appearance of the object on display.  No one, it can be assumed, will be so naïve as to mistake even a high resolution reproduction for the real thing.  When they have seen a virtually real reproduction of the Rosetta Stone, they will not say they have seen the thing itself.


Software to collate texts has existed since the early 1970s, the best known and most versatile for scholarly editing being CASE, MacCASE, and COLLATE.  The latter two also provide mechanisms for creating links among variant texts.


ANASTASIA is to my knowledge the most versatile presentation software yet developed for scholarly editions.  It gives access to images and transcripts of documents, links between variant documents, full textual apparatus, introductions and explanatory notes.  Less well developed at this point is JITM (Just In Time Markup).  It incorporates text collation on the fly, a text authentication mechanism, and it enables an enhancement-markup capability for readers.  JITM is modular and provides a kind of flexibility of approach that gives readers control over the materials, but its potentials are not fully realized and its user interface still (in 2004) leaves much to be desired.  Numerous projects in process of development employ XML with newly designed interfaces (what one sees on the screen and how one selects from menus and links) to incorporate experimental ways to present scholarly editions.[21]


A consortium of scholars interested in the works of Fredrick Nietzsche have develop a suite of programs called HyperNietzsche, in which to house, link, and make basic texts and scholarlship available for free.  It’s mark-up system, a variant of SGML with elements of TEI conformancy, and its net-working system is currently tied to a concept of “open source” which requires that copyright be abandoned by all in exchange for copyLEFT to insure free access by all users.  The software developed, the concept of how knowledge can be “constructed” from primary materials through multiple kinds of cultural and scholarly added value are in line with the principles developed in this book.  Whether or not the participants can make the system work for free and endure through time remains to be seen.  Its health is dependent on grant funding and the good will of the participants.  As this project grows to serve the development of knowledge sites other than Nietzsche, its name will become HYPER, HyperResearch, and HyperLearning.


Èric-Olivier Lochard’s ARCANE is a comprehensive, yet closed, system developed primarily for historical editions.  It gives access to individual documents, provides for user enhancement for added commentary; is far more creative in its use of charts, mapping, and chronological progressions; and anticipates multiple forms for output to screen, to paper, and to files in various forms: .tex, .pdf, .doc, etc.  It has no means of identifying variant texts.


These do not add up to the solution that is needed, nor, indeed, do I believe that any comprehensive software solution is desired.  These programs are among the most promising approaches because they are based on visions for scholarly uses, and they demonstrate some of the ways electronic editions can do more and more conveniently than print editions could.  Although I do not know how it can be done or even that it can ever be done, it seems to me important to let individual projects develop according to the nature of their materials and the approaches to knowledge that they find valuable for some time yet to come before any attempt is made to invent the cookie-cutter that all projects must conform to.  There seems hope in the idea that what is needed is a front-end interface for users that will allow them to access multiple knowledge sites in a way that helps them past the problems inherent in the fact that each project uses a different markup language or structures its content files in different ways. 


New and Legacy Projects

In this transition time, when electronic forms are challenging the reign of the "print book", scholarly editors divide into two different groups defined by the problems they face in developing electronic editions.  One group, seasoned editors or inheritors of the legacy research materials of such editors, will already have many files of relevant texts in forms not yet ready for an electronic site, not yet properly marked for posting and perhaps not fully proof-read and corrected.  The other group, editors with new projects, faced with research materials wholly in print or manuscript form, need to develop computer readable files, and find analysis tools and file-manipulation tools appropriate for mounting an electronic site.   Eventually, perhaps, the latter may be the only kind of scholarly editor, but I address first the problems faced by editors who already have computer text files developed for print or archival purposes.  There is a surprising amount of carry-over value from the procedures developed for such projects for use with new projects.  And many new editors will find that regardless of how many versions they intend eventually to post in full-text, and particularly for long prose fiction works, there are reasons to create, during the research phase, a preliminary archive of text files to enhance collation and quality control.  Such files, like the legacy files of older editions, will then need later to be converted and marked for electronic site presentation.


Neither group has the luxury yet of a set of tools that will render unproblematic the process of electronic scholarly editing.  The first task of scholarly editing is always a bibliographical project—finding original materials—a global search for unique as well as multiple copies.  The full extent of intra-edition variation within and amongst multiple printings of any edition (copies produced from the same setting of type) must be determined—a task for which computers are practically useless, but which is enhanced by Hinman Collators, Lindstrand Comparators, and other optical devices such as the ones developed by Randall McCleod.[22]  For detecting inter-edition variations, computers are very helpful---essential, really.  But relevant texts must first be rendered as computer files—by typing, and / or scanning (more probably and than or, because scanning is still more error prone, though cheaper, than the work of competent typists, and because most scholarly editors want image files as well as text files for display).  Text files must be proof read to ensure that they accurately reflect the source texts—by sight collation or computer comparison using products like COLLATE or PC-CASE or MAC-CASE or some other text comparison computer program designed to produce variants in a form easily converted to a presentation format, revealing and analyzing the relations among the variant forms of the work.[23]


A survey of truly sophisticated experimental electronic editions (excluding amateur productions such as found in Project Gutenberg and Chadwyck-Healey’s poetry projects) reveals that most of them provide at least one unique capability not found in the others.  And, because each is either tied to a particular type of software or hardware or because of the general limitation that still prevents fully-fledged full-function editions, the result is that, at the moment, no matter what course one takes, scholarly principles must be compromised with the result that some need or desire to provide some facet or other of the work will be sacrificed.  In this sense, the situation for electronic editions resembles the limitations of print editions.  One can still hope that in the future this will be less so, but one cannot help musing over the hype that has proclaimed electronic editing the panacea rescuing editors from the straightjacket of print editions.  I think it is necessary and important to sound this practical and discouraging note because of the inflated claims of some enthusiasts for electronic editions.  Yes, they are better.  No, they are not good enough.  And one reason is that a full vision of what is wanted has not been articulated either clearly or effectively.  Perhaps not enough people yet want it; but in 1975 few people knew they wanted a desktop computer, and in 1991 who wanted a DVD player?


To some extent the composition and production materials that have survived for any given written work identify and delimit the editorial treatment most appropriate in handling the work and developing the electronic edition, but editors have a great many choices to make as well, and they will do so more thoughtfully and effectively if they have explored their options well.  Editing is not a straightforward task, even in the hands of the most ignorant or unselfconscious or single-minded of editors.  Works like Jerome McGann's A Critique of Modern Textual Criticism (1983), and The Textual Condition (1991) or like my own Scholarly Editing in the Computer Age (1996) and Resisting Texts (1997) explore the effects on editing resulting from a variety of assumptions about what a written work is and how the editor is to construct it and how the reader is to interact with it.  David Greetham's Theories of the Text does not give any practical advice about how to edit texts, but it explores, so it seems, every conceivable implication and failing inherent in the ways that have been used to edit.  Anyone embarking on an editorial project in English from the early modern period to the present without a working knowledge of these works of scholarship or the tradition of essays on editing in Studies in Bibliography and TEXT: An Annual of Textual Scholarship may well be an editor but more than likely is not a scholarly one.


These "prerequisites," so to speak, are all implied or stated in the Guidelines for Electronic Scholarly Editions put out by the Modern Language Association's Committee on Scholarly Editions (see n48).  Undergirding those guidelines also is the belief that for the complexity of presentation demanded by full scale scholarly editions and for the long-range portability and survival of editorial work, editors should adopt the standards and procedures embodied in TEI (Text Encoding Initiative) for preparing SGML, XML, or comparable file markup.  Editors starting from scratch can choose tools that already have these standards in view when they begin the tasks of rendering into electronic files the bibliographic forms that will eventually occupy the edition's electronic site. I cannot over-stress the importance for new editors that they explore the whole range of problems and tools needed, from the gathering of original editions, through their analysis by collation and annotation, to the final presentation on an electronic site.  Failure to survey in detail every step of the process in advance will lead to grief over the production of files that lack some key component or that must go through some extra step of file conversion.  Time spent planning the steps of the research and processes for mounting the electronic site will be time saved from wasted efforts and from the wasteful use of tools that produce incompatible results. Editors with legacy files from projects not originally designed for electronic presentation have now to face the problems created by the fact that they did not foresee an electronic site as the end product of their work.  Their problems, as we shall see, are more complex than that of mere file conversion to TEI conformant XML. 


A Division of Labor

Every scholarly editor and every publisher of scholarly editions, whether in book form or electronic, has a different experience base from which to assess or plan the steps in the process and to determine who does what.   Some very elaborate and impressive projects have been accomplished primarily by one person who was editor, designer, programmer, and desktop publisher.  Other projects involve teams of editors, expert programers, webmasters, design specialists, and publishing houses that do a range of production tasks from copy-editing, file conversion, typesetting  and book or CD manufacturing to publishing, distributing, advertising and marketing the end products. 


This range of tasks suggests that any one editor’s or publisher’s practical experience is limited and that advice from any one source is similarly limited.  Editors and publishers who have experience with many scholarly editions may well start with an aspect of scholarly editing frequently treated as a taboo subject: the money.   It is not just the money to do the research, to travel to archives, to transcribe documents, to create image files, to proofread, to markup files, to compare  texts, to compile all the data and then to prepare an edition, or an archive / edition, or a knowledge site; it is also the business of vetting the results, having third, forth or fifth eyes to check for accuracy, coherence, and usability.  And then there are production expenses including the publisher’s overhead.  Even as a one-man desktop publication, it were folly to think that a scholarly edition could ever break even; it is first of all a labor of love and then of grants and subsidies.


And it is not just the money.  Think for a moment about the support structures, the infrastructure, the institutions that support print scholarship.  It is universities, individual departments, computing centers, internal and external funding, publishing houses, refereeing systems, marketing systems. And last but not least libraries where the products of the print industry are maintained for decades and centuries.  None of that was developed with electronic publishing in mind.  The people, the institutions, the shared notion of the continued value of electronic editions are just developing.  And by comparison with books, which when printed, are relatively stable when unattended on a shelf, electronic editions are both subject to continued upgrading and subject to absolutely sure degradation through neglect.


 But one should not be discouraged.  The existence of hundreds of scholarly editions both completed and in progress suggests that there is enough love and sometimes enough grants and subsidies to get the job done.  And yet I do not know of any project that has not compromised in some way between the ideals of the project and the practical necessities governed by money.  To some extent this book is an exploration of ideals rather than practicalities—an effort to see what could or should be done, rather than advice about what to do now or what is done now.  But one cannot ever forget the pressures to compromise, among which finances stands perhaps as paramount.


A Case Study Editorial Problem

As an example of the problems faced by an editor in the modern period of English literature, I offer my own edition of William Thackeray’s works.  The purpose is to give practical life to the abstractions of the foregoing description of aims and problems in creating an electronic knowledge site.  It also reveals my own limitations as I try to deal with a topic and an opportunity of whose importance I am convinced and whose complexity is such that neither I nor anyone I have yet met or read has a sufficient breadth of knowledge to deal with adequately.  It can be skipped without losing the theoretical structure for my arguments. 


In summary, the Thackeray Edition Project in its aim to produce a print scholarly edition already had:

1. Working electronic files of the manuscripts and of every other relevant historical document.   Some of these files were fully proofed and updated; others still contain transcription errors in the files, though their existence is documented in the working papers.

2. Collation files showing variants among the historical texts.

3.  Electronic historical collations of all authoritative texts.

4.  An electronic file of the newly edited critical text.

5.  An electronic list of emendations in the new critical text.

6.  Electronic files of historical and textual introductions.

In the case of the Thackeray edition and of a number of projects that used some version of CASE, all these are ASCII files, marked either by mnemonic codes or by type-setting codes.


In short, from the point of view of one wishing to create an electronic archive, what we have is a rich mess.  Only two or three text files for each volume in the edition were of "export quality"—the diplomatic transcription of the manuscripts, the transcription of the historical document chosen as copy-text, and the text of the critical edition.  Files for other texts exist but not fully corrected.


The producer of an electronic edition/archive faces a very different set of demands from those faced by print editors.  These demands can be categorized by the stages of work required for anyone wanting to port the Thackeray edition files, or similar legacy files, into a WEB environment conforming to the demands of XML / SGML and TEI.  And this task is not so simple as converting files from one form to another, which might be done automatically with a conversion program.  To do so would produce XML files with the same limitation and errors of the original files.


How to construct an edition

The first category of problems is the one facing all editors who long for a comprehensive answer to the question: How should an electronic scholarly edition/archive be constructed?  There is no compelling answer: there is no browser with built-in capacities to handle variant texts, variant images, good user interface, proper deployment and presentation of ancillary materials such as annotations, links to off site resources, links to moving pictures and audio, that also maintains for the user a clear sense of where one is in a tree of knowledge offering the fruits of an expanded/expanding notion of textuality.  And to my knowledge only one relevant software package, JITM, incorporates a repetitive self-check to verify that updates have not corrupted the text inadvertently.  And only two, that I know of, offer readers any significant role in interacting with the edition / archive to either enhance it or personalize it by shaping it to the needs of the user’s own research and projects.   There is not yet a good answer to the question: Whose software should one use?  There isn’t even a standard answer to the question: What file structure or tree structure should be used for files of basic texts so that the variety of next-wave text software will know where to find the textual grist for its mills.


TEI conformant XML (or formerly, SGML) anticipates the development of adequate browsers and forefends the obsolescence of one's work by maintaining basic file markup that is easily exchanged between platforms and software.[24]  Such sweeping assurances are, however, small comfort to WWW novices who see a formidable new user-unfriendly system to be understood and mastered in order to make possible the shift from legacy forms to SGML or XML-TEI.  It is important to acknowledge that scholarly editors deep in the intricacies of textual criticism of an author’s works find TEI and XML an irritating distraction and that IT experts glorying in the capabilities of electronic wonders usually do not take the time to understand the demands of textual critics and the intricacies of editorial theory.  Enervating to those who have some understanding of both fields is the fact that fully functional SGML browsers were first promised some ten years ago, and now there are none. The brave new world seems always to fall on its face just at the moment of fulfillment, usually due to the obsolescence of some temporary feature in the prototype editions on offer.


And yet, if one is not to grow old and die while waiting for the new standards to sort themselves out and for the development of an integrated series of fully-capable editorial tools, one must embrace what we have, which is TEI and a score of stand-alone and frequently incompatible tools from which to kluge together the electronic editions currently within our reach.  But our goal should be a score of stand-alone and fully compatible tools able to be used with a growing number of knowledge sites built around individual literary works.  No one is likely to produce a comprehensive software solution, but together we can form a community of interchangeable modules in a flexible, expandable structure of software and edition constructions.


How to translate legacy files

The second category of problems for porting CASE-conformant legacy files to TEI-conformant WEB files is the conversion process replacing the mnemonic codes and typesetting codes of their original designs with the TEI codes of a new design.  That is not a one-to-one conversion process, as no doubt is already clear to most editors.  The codes needed for CASE or other collation processes and for typesetting in the print world were conceived of as relating to the way texts look, i.e., formatting and fonts.  The codes now needed for the WEB world need to be conceived as relating to the ways in which texts function, i.e., semantic significance, and how they are structured.   Where originally, for example, coding indicated 12 points of indentation following 2 points of extra vertical space, coding now needs to distinguish the beginning of an extract from the beginning of a letter or a poem or a list or some other form of text that functions in a particular way, is structured in a particular way, and which, by the way, usually follows vertical space and is indented.  Differentiation of function is required where before attention focused on what it looked like. Whereas a single italics code was used for all italicized words, it is now desirable to indicate whether the italicized passage functions as a book title, foreign words, a ship’s name, emphasis, or some other function.  The difference is not an easy one because the coding of appearance in context has been so integral to our ability to read that appearance and function often seem to be the same thing.  The purpose of differentiating similar things is, as always in scholarly analysis, to let us see things more clearly, not to render them completely understood.


In some ways, such conversions are trivial.  There is usually in the original file a marker where a marker is needed.  But the marker is the wrong one, or it is used in two or three places that now must be distinguished from one another, or it exists distinguishably at the beginning of the passage to which it applies but only generically at the end, thus making it impossible to just search and replace.


Quality upgrades

The third category of problems for porting legacy files or new working files to the WEB is the need to impose quality control on every file.   Files that were working files that did not have to be finally corrected now become presentation files.  Thus, for the Thackeray edition, for example, the files that were marked mnemonically and not updated after the final computer collation have now to be proofread and updated again to ensure that they accurately reflect the historical documents they represent.  Their flaws are minor, and their flaws are noted in printouts, but in the existing files errors remain as little unmarked landmines.  The files that were finally and fully proofread no longer have the mnemonic codes but now have typesetting codes.  Any program written to convert the coding of one set of files will have to be rewritten to apply to the other sets of files.


The question is:  Is all that worth doing?  or at least that was the question for me.  I suppose each person answers it somewhat differently, but for me the answer was "No, not until the way forward is more clearly mapped and not until the conversion process becomes routine."  Of course, neither of those things happens on its own.  So, I pursued two separate efforts, each, one could say, ideologically opposed to the other. 


Two Electronic Solutions

The first was to hire a computer literate research assistant and try to guide him through the process of converting into XML the legacy file used in typesetting The Newcomes—a file coded for typesetting using TeX.  My sense of the formidableness of the conversion process derives from my layman's relation to TEI-XML, though I am quite comfortable with TeX.  The fundamental idea of this effort was that the XML product should not only "contain" all the information already contained in the print scholarly edition but that additional enrichment markup should be added as opportunity arose.  Furthermore, the form of ported information should be adapted to the strengths of the new medium.  Thus, for example, if the print edition had textual revisions reported as footnotes, the XML version would have them as links.  And if the print edition failed to report the bibliographical features of the work's historical source texts (because the expense outweighed the perceived benefits), that was no reason why such information could not be coded and added to the XML version. 


The resulting file, as the experience of others has also shown, grew in size and messiness and diminished in verifiable accuracy.  Of course we had been told that such would be the result, but for us it was a training course.  It is important for any builder of an electronic text file to be aware of the potential not only for messiness but for serious integrity failure in this process.  Because each enhancement of the presentation file is done on the same file, every act of enhancement has the potential to produce inadvertent change.  Some form of verification is therefore required at each step of the way.  Keeping back-up copies and logs of changes and running machine collations (see n32) are among the ways that occurred to us, though there are probably far more sophisticated ways of which I am unaware.  The dual problems of converting TeX to XML by hand and maintaining textual integrity led me to conclude this process was hopelessly flawed.  Perhaps my experience will save others from learning the hard way.


The second effort is the more instructive one, though it too remains unsatisfactory.  It involves the JustInTimeMarkup (JITM) system, developed by Phill Berrie at the Australian Defence Force Academy branch of the University of New South Wales.[25]  Its ideology is quite different from the single, growing, messy, enriched XML file approach.  Before proceeding I should acknowledge that the ways in which JITM frames the problems of electronic scholarly editing are not currently widely accepted and that its methods of constructing solutions is also not widely accepted.  However, my experience of it convinces me that its opponents have much to learn from it.  As in most fields, the competition for shelf space leads enthusiasts to exaggerate the shortcomings of approaches ideologically opposed to one’s own.  I present what follows as a possible temporary solution that has great appeal to me as a textual critic because it focuses on the integrity of the editorial work, not because it offers beauty or dexterity of presentation at this point in its development.  To me the important matter is not the solution but the demonstration of scholarly care for details that matter.


Although JITM is still in developing stages (and what comprehensive system is not?), its design addresses a full range of scholarly interests from the textual to the explanatory and to the illustrative. In JITM the primary concern is for the "text itself" and the ways in which text is transcribed.  But it is not by design committed to any particular pre-conception about texts that would limit it, say, to the preservation of a particular source document's rendition of the words and punctuation, the fonts such as italics and boldface, and the formatting such as block indentations and letter salutations and closings.  Often scholars interested in the historicity or authenticity of texts are concerned with the provenance and variance of texts, their sources and their composition, revision, transcription and production histories. JITM can, of course, accommodate such concerns, but it can also be used by scholars whose primary interests lie in linguistics or in historical or thematic concerns.  Its facilities accommodate concern for explanatory annotations of obscure or dated materials that will help modern readers to understand the conventions and contexts of the time when the texts were created, revised, or reproduced, and it facilitates additional uses that can be made of literary and historical texts for the study of linguistics or of typographic or design history or the relation between verbal and visual materials.


JITM helps scholars address these concerns for research into texts, and it provides an environment for Internet presentation of the texts and the scholarship by providing

a. a durable home for text files and for scholarly enhancements;

b. a system for verifying continued accuracy of texts through multiple uses and multiple enhancements;

c. a file structure and coding system designed to enable migration of texts to future systems and migration of scholarly added-value from current systems to future systems;

d. a series of two-step interactive conversion tools enabling migration of legacy text files and legacy text enhancements from older projects into the JITM environment;

e. a text-relational tool (i.e., a collation system) that allows instant identification of variation among versions of texts; and

f. text annotation systems to house all kinds of scholarly enhancements and analysis of texts, from textual and explanatory to linguistic and other analytical studies.


For the Thackeray project, had items "b" and "d" alone been in this list, it would have made JITM an attractive electronic tool.  Item "b" offers a systematic approach to the problem of text integrity and item "d" provides a systematic way to make the extensive electronic archive of research texts and print-edition files available, at least for basic quasi-representation in Internet form.  The rest of the functions of JITM make it one of the world's most versatile and forward-looking electronic text-handling systems.


The way JITM handles text files is to divide our interests in texts into categories so that relevant information is kept separately and provided on demand.  A functional, base, ASCII text file with only the most fundamental textual "content" (i.e., letters, punctuation, and spaces) is extracted and used as the basis upon which a variety of representations of the work can be built on demand or just in time.  This base text is also represented by a mathematical formula invoked after each use of the base text to ensure that no changes have been made inadvertently.  Upon its importation into JITM for the first time, SGML or XML markup is extracted and placed in parallel files or overlay files.  Each markup file represents a category of interest in the work and can be selected separately or in tandem with other categories by a user who desires to see the text as rendered by and for those interests. Thus, a person interested in the first edition of the work would select a historical and representational set of markup to create a perspective of the work representing the first edition.  Someone else might be more interested in seeing that text as analyzed by a linguist and will select appropriate markup files to enhance that kind of interest.  A user might select perspectives representing manuscripts, other historical editions, a critical edition, and might branch out from any text to annotations of various kinds.


To import legacy files into JITM, the conversion tools used to substitute TEI conformant SGML or XML codes must be used first, so that the added value already contained in the legacy files can be saved automatically for reuse.  But once lodged in the JITM system, new added value markup can be provided at any time without fear of inadvertently destroying the integrity of previous work.  These added markups can be selected or deselected at will by the user choosing the perspectives to be generated from the stored and accumulating data.  A prototype edition using JITM is introduced at  and can be viewed by way of the link for His Natural Life.  Though still under construction, the prototype edition provides a sense of how this form of presentation gives flexibility to users.


JITM is not without faults:  the tools associated with JITM are in developmental stages, not user friendly or elegant. A users' manual and help files were not yet ready in 2004. Interface design is inelegant and  "clunky."  But JITM has the potential to address all these problems.  More importantly, it currently functions on a MacIntosh (Apple) computer, and it operates within the HyperCard Software environment.  This means that JITM is fully functional as a tool only in the Mac world though to viewers it is available anywhere on the Internet.  Furthermore, perspectives generated by JITM are savable, portable, and browse-able HTML, SGML, or XML files. They can be displayed on any web-browser.


As an unexpected benefit of JITM 's divided-file structure, it provides a preliminary way to deal with conflicting overlapping structural systems, a currently fatal weakness in the SGML-XML implementations.  In cases where the selection of multiple markup files results in a perspective in which there would be conflicting overlapping structures, the user will be informed of this fact and given a choice of viewing first one and then another perspective because XML is unable to use both at once. Either form can be saved for separate use.


As an editor with 30 years of experience in scholarly editing and computer use, however, I will not invest very heavily into final presentation forms of an electronic edition/archive until the problems inherent in the current capabilities have been better addressed.  SGML / XML needs revamping or replacement to allow multiple overlapping structures. Likewise, the way transcribers describe textual elements and deploy them (either as structural hierarchies or as one-off entities) in order to skirt this SGML limitation needs to be carefully thought through because kluged (i.e., clever, ersatz) solutions within the limitations of SGML, developed as "temporary fixes," are likely to haunt us when a markup system that recognizes and enhances overlapping structures becomes functional.


The Example of William Thackeray’s Works

The editorial problems posed by William Thackeray's prose works and the strategies for dealing with them—collecting original documents, collating, emending, constructing apparatuses to show textual histories and identify those responsible for them, deploying these materials both at the foot of text pages and in appendices, and verifying the accuracy of the work—have been the business of a major editorial project begun in 1976 and resulting, to date, in seven published volumes between 1989 and 1998.  Of a projected 23-volume print edition, only eleven will be published as books, unless someone sees a need for a book form of the "volumes" in the remainder of the edition.  The editorial policies, the arguments supporting them, the changes in both policy and method of deployment, and the processes for achieving comprehensive coverage of materials and accuracy of research and edition preparation are amply discussed in the textual introductions to the published volumes and are not rehearsed here.[26] 


The Thackeray edition, like dozens of other scholarly editorial projects involving prose fiction, has accumulated millions of bytes of text files representing manuscripts and historical editions as well as new critical scholarly editions.  Our situation is, I believe, like that of other editors of print scholarly editions who have such files; for, we face three serious tasks if we decide to create electronic WEB-based or CD-based editions/archives of our projects.  The first involves the significant enterprise of learning how to convert legacy text and apparatus files to TEI conformant SGML or XML, deciding what software to employ in that process, and working through the arguments about how to house the emergent text files so that they will be beautifully and usefully presentable and yet remain dexterously portable to future electronic enviornments. The second, related to that, is the task of making and then implementing the decisions about what functions and structures within the text are to be specially marked.  This is a problem for all editors, new and old, for typists and scanners and print editions were never in the habit of or capable of analyzing for presentation the differences in function between one type of italics and another or between one type of quotation mark or another—for the expression of which there never was an opportunity until the capabilities of SGML/XML and TEI markup made it possible.  Neither did they pay any attention to the conflicting overlap of bibliographical structures, organizational structures, and semantic structures—which were never conflicting until the limitations of SGML/XML made them so. The third task involves verifying the accuracy of the legacy text files that, in the world of print editions, dropped from interest and were left  "unmaintained" once they had served their purposes in the research for the print scholarly edition.


William Makepeace Thackeray began publishing occasional pieces in London and Paris in the 1830s, came to fame with the publication of Vanity Fair in 1847-48, and died the author of eight major novels at age 52 in 1863.  His works fill 24 substantial volumes.[27]  His manuscripts, with some notable exceptions, are incomplete and tend to be scattered in a number of libraries in England and America.[28]  They indicate that Thackeray, like many journalists, relied upon compositors to impose conventional punctuation and capitalization, but that he was a careful penman whose spelling required little or no checking.  The manuscripts also show that, while most were written under the pressure of deadlines and are not heavily overwritten, Thackeray revised his work not only by adding and subtracting passages to adjust to prescribed lengths for serial publications but that he revised sentences for sense, cadence, echoes of similar words, and a myriad of minor stylistic effects.


Thackeray was an artist as well as a writer, and much of his major fiction contains the work of both his pen and his pencil.  For Vanity Fair,  The History of Pendennis and other major novels, Thackeray drew vignette illustrations for chapter initials, and he drew illustrations of the novel's actions for drop-in illustrations in the text.  Both of these types of drawings were produced as woodcuts.  He also drew full-page illustrations on steel plates for reproduction on a different stock of paper to be inserted in each installment.  The results are illustrated books unlike many illustrated novels, because one could argue that the "work of art"—in so far as novels are works of art—consists of both text and illustration.[29]


My textual work on Thackeray is best represented by the four volumes of my edition that were published by Garland and, to date, the three additional volumes published by the University of Michigan Press. I have also written a history of the edition, tracing the changes in editorial theory, policies and practice that have attended that work from 1969 to 1995—a period in which five different publishers contracted for the edition and during which I wrote Scholarly Editing in the Computer Age primarily for the purpose of clarifying the changes that had to be made in the Thackeray edition to bring its policies and procedures up to date.[30]  I also wrote Pegasus in Harness: Victorian Publishing and W. M. Thackeray to trace the financial and textual relations between Thackeray and his publishers.


I will not repeat any of those histories here, beginning instead with a brief description of the uses of electronic equipment and programs for that edition as a prelude to the considerations now under way to construct an electronic edition/archive for the works of Thackeray.   The tale I tell may be somewhat more useful to editors of print scholarly editions who used the computer extensively for research and for production of camera-ready copy.  But the principles involved in developing a match between the methods, technologies and data gathering of the beginning of the project with a clear view of the demands of the end product on an electronic site are important to all editors.  The accumulated archive of legacy text files, useful temporarily in a project meant for print, might serve as the foundation for new electronic editions/archives—even though they may have insufficient and relatively primitive coding already embedded and though some of the files may not have been finally proofread and corrected.


Some Practical Software Problems

The scholarly edition of Thackeray's works is not unlike many editions undertaken in the 1960s through the 80s in that it tried to adapt emerging electronic technology as rapidly as possible.  The development of CASE (computer assisted scholarly editing) programs made it both possible and desirable to produce electronic files for each of the historical documents deemed authorial or potentially authorial.


I will skip over our experiences with punch cards and printers restricted to upper-case letters and graphic plotters adapted as printers with upper and lower case letters.  The relevant factor is that we created machine-readable text files that have managed to migrate from punched card, through 9-track reel-to-reel tapes, to 8.5in, 5.25in, and 3.5in floppy disks, and finally to the hard disks and CDs of today.  Perhaps the luckiest accident of those early days was that our particular campus was dedicated to machines that had already chosen ASCII, rather than IBCIDIC, as the basic encoding language for verbal texts, because ASCII is still the basic language of TEI /SGML/XML.[31]


The development of CASE was also important to the electronic future of the Thackeray edition because its collation routines and its handling of diplomatic transcriptions of the manuscripts made it desirable to create computer transcriptions of each historical form of the work.  CASE, supported by NEH and Mississippi State University, was adapted by a number of NEH funded editions in the USA and by others in Australia. Over time the programs were converted from their original electronic home in the PL1 language of Univac computers to IBM , DEC10, and PRIME mainframes, and then to PCs and MACs.   Each conversion sloughed off of some capabilities and developed new ones.  But the important thing to note is that for each project, CASE made it desirable to produce computer readable, ASCII files of each historical form of the work deemed to be of potential interest.


The capabilities and methods of CASE have been described in print elsewhere.[32]  For our purposes, it is sufficient to know that for most projects, Computer Assisted Scholarly Editing consisted to three important concepts. First, the computer mechanisms for discovering, recording, and storing variant forms of the work and for discovering and listing the variants among those forms encouraged editors to create computer-readable transcripts of each potential source text—thus, CASE users already have electronic files for each different authoritative form of the work.  Second, the process consisted of a progression of steps in which the output of one set of routines became, after verification and correction, the input of advanced routines in a series of steps that would culminate in the files used for typesetting the new edition text and apparatus.  And third, the process from beginning to end incorporated verification and correction procedures tending to render the final product more accurate than had ever been the case with old systems that relied only upon repeated proofreading.


However, there was, from the point of view of electronic archive/edition builders, a serious flaw in the beauty of that three-part concept.  The flaw consisted in seeing only one text—the one destined to drive the typesetting machines for the new edition—as the center of attention and full maintenance.  All other computer texts were considered to be useful up to some point in the process after which they were left aside in a repository of stored files.  Those mothballed files can now be seen as a legacy that might be refurbished, saving greatly on the amount of labor that would be required to start over from scratch.  In retrospect it was too bad that some files in the scholarly process were deemed no longer useful and therefore not maintained or updated regularly; for, when one contemplates the electronic edition/archive of the future (a future which is already here), one sees the need for electronic files of each historically important form of the work.


[1] Pizer??

[2] Warren and Taylor, The Divisions of the Kingdom; Bender and Parker, eds., Red Badge of Courage in The Norton Anthology of American Literature, ??; also separately (New York: W. W. Norton, 1999).

[3] Proof, vols. 2-4, 1970??

[4] Anyone who has tried to read on microfilm a book with endnotes will immediately sense the advantage of codex over scrolls.

[5] I began thinking about electronic editions in terms of architecture, as the title of this chapter still attests, but I was persuaded of the limitations of “architecture” as a metaphor and of the usefulness of “infrastructure” and “coral reef” by Peter Robinson, Willard McCarty, and through them by Michael Sperberg-McQueen, each of whom has a more intimate acquaintance with the world of computers than I.

[6] See Michael Sperberg-McQueen’s comments come from his ‘Trip report’ on the ‘Text Analysis Software Planning Meeting’ held at Princeton, 17-19 May 1996, at (accessed 19 December 2003). 

[7] Project Gutenberg (, Rossetti Archive (, Chaucer (

[8] This is unlikely to happen, given the current attitude of the Joyce estate, which has used copyright recently to halt at least two efforts to start such editions.

[9] This is a direct quotation from an email sent to me by a person who shall remain nameless and not held responsible for what was probably an unguarded and not just ill-considered statement. Unfortunately, it is not unlike frequent expressions of enthusiasm for electronic media. Since then others have said what we need is XSL-FO and XSL Formatting.  The list will continue to grow—as will our desires for and capabilities for new ways to access and display texts.

[10] Institute for Advance Technology in the Humanities:



[13] See Èric-Olivier Lochard, Dominique Taurisson: “The World According to Arcane”. An Operating Instrumental Paradigm for Scholarly Editions  Perspectives of Scholarly Editing / Perspektiven der Textedition, ed. Bodo Plachta and H.T.M. van Vliet (Berlin: Weidler Buchverlag, 2002), pp. 151-62.   There are dozens more projects that I do not mean to neglect here.  I selected these because I'm sufficiently familiar with them to know that each offers a desirable capability that is absent from the each of the others.  My surmise is that most truly sophisticated experiments in electronic editing provides some unique capability not available elsewhere. 

[14] Accessed 18 November 2004.

[15] See Edward Vanhoutte, “A Linkemic Approach to textual Variation: Theory and Practice of the Electronic-Critical Editions of Stijn Streuvels' De teleurgang van den Waterhoek,” Human IT 1 (2000)

[16] Significant efforts to create a sense of community in this regard have been developing within and in connection with the Modern Language Association of America’s Committee on Scholarly Editions. See “Guidelines for Scholarly Editions”   and

See also, Report from an Editors’ Review of the Modern Language Association’s Committee on Scholarly Editions’ Draft Guidelines for Electronic Scholarly Editions and John Unsworth, "Reconsidering and revising the MLA Committee on Scholarly Editions' Guidelines for Scholarly Editions" .  URLs accessed September 7, 2004.

[17] The coral reef image is taken from Michael Sperberg-McQueen who has used it frequently including in “New TA software: Some Characteristics, and a Proposed Architecture”: “We are not building a building; blueprints will get us nowhere. We are trying to cultivate a coral reef; all we can do is try to give the polyps something to attach themselves to, and watch over their growth.” ( accessed June 26, 2004).  It is picked up and elaborated by Peter Robinson in “Where We Are With Electronic Scholarly Editions, And Where We Want To Be”:As yet, we are not even agreed what path to follow towards this goal: should we try to create a single architecture, which all must use? Or, should we fashion something like a tool set, an infrastructure which may be used to make what editions we please? Or do we need something yet more anarchic: what Michael Sperberg-McQueen describes as a ‘coral reef of cooperating programs’, where scattered individuals and projects working ‘in a chaotic environment of experimentation and communication’ yet manage to produce materials which work seamlessly together. Unlikely as it sounds, and untidy as it may seem to those used to ordered programs of software and data development, with the neat schedules of work-packages so admired by grant agencies, this last may be our best hope. This model has certainly worked in the software world, where open source software developed in the last years under these conditions drives large sections of the community.“ ( accessed June 26, 2004.)

[18] It would take too much space to rehearse the intricacies or significance of different orientations here.  See Chapter two, “Forms,” of Scholarly Editing in the Computer Age 3rd. ed. (U Michigan P, 1996). 

[19] For a fuller argumentation of this point, see my Scholarly Editing in the Compter Age, 3rd. ed. (Ann Arbor: U Michigan P, 1996), esp. the chapter on Orientations.

[20] It is not a criticism of this overview to point out that is provides space for materials or information that one or another critical approach finds unnecessary or even objectionable.  It would be a weakness of the system if a category of material or analysis that someone wished to have was impossible to provide.

[21] See Works Cited, below, particularly for the Nietzsche Project, Alexandra Brown-Rau’s King Lear prototype, Dirk Van Hulle’s Samuel Beckett project (as yet only demonstrated as a prototype); Marcel De Smedt & Edward Vanhoutte. Stijn Streuvels, De Teleurgang van den Waterhoek. Elektronisch-kritische editie/electronic-critical edition. Amsterdam: Amsterdam University Press/KANTL, 2000. ISBN: 90-5356-441-1 (CD-Rom); Kevin Kiernan’s Beowulf and Boethius projects.

[22] Optical collation is conducted with multiple copies of what appear to be identical books, the products of a single setting of type or, possibly, a new setting that apes a previous edition line for line.  The discovery of stop-press corrections and variants between printings is best conducted by optical collation. The machines mentioned allow one to see two books at once in such a way as to highlight variation.  Thus, a full page can be collated and all differences found in about two or three seconds.

[23] Some researchers, acting before thinking, have thought it clever to type or scan one text and then save the time an effort of typing or scanning additional exemplars that require collation by using the first text and emending it to reflect the differences they see in other texts.  That procedure renders the computer useless as a collation device because all differences will depend on sight collation first.  Each material text must be transcribed separately in order for machine collation to discover variants that sight collation misses.

[24] The most accessible presentation of this concept that I have read is Michael Sperberg-McQueen’s “Textual Criticism and TEI” at

[25] See and linked web sites.

[26] Vanity Fair (1989), Henry Esmond (1989), Pendennis (1991), Yellowplush and Gahagan (1991), Newcomes (1996), Barry Lyndon (1998), Catherine (1998)—the first four published by Garland; the latter three by U Michigan P.

[27] There is no comprehensive bibliography. My checklist of Thackeray's books in the CBELL3 is supplemented by Edgar Harden's A Checklist of Contributions by William Makepeace Thackeray to Newspapers, Periodicals, Books, and Serial Part Issues,  1828-1864, No. 68 ELS Monograph Series. (Victoria, B.C.: English Literary Studies, University of Victoria, 1996).

[28] See the Census of Thackeray Manuscripts in Costerus n.s. II (1974), supplemented by various listings in The Thackeray Newsletter.

[29] The major discussions of Thackeray's illustrations are in Nicholas Pickwoad, "Commentary on Illustrations" in Vanity Fair ed. Peter Shillingsburg (New York: Garland,1989) and in The History of Pendennis ed. Peter Shillingsburg (New York: Garland,1991); J. R. Harvey, Victorian Novelists and Their Illustrators (New York: New York UP, 1971); Patricia Runk Sweeney, "Thackeray's Best Illustrator," Costerus, n.s.2 (1974), 84-111; and Anthony Burton, 'Thackeray's Collaborations with Cruikshank, Doyle, and Walker," Costerus, n.s.2 (1974), 141-84..

[30]  For an account of how editorial policies and funding developed for the edition see "Editing Thackeray: A History," Studies in the Novel  27 (Fall), pp. 363-74; reprinted in Textual Criticism and the Common Reader ed. by Alexander Pettit (Athens: U Georgia P, 1999).

[31]  CASE, originally developed by Susan Follett as a Masters degree project, was revamped by Russell Kegley who added a total of nine routines for handling text files to enhance research and production of print scholarly editions on a UNIVAC mainframe.  Boyd Nations ported the programs to PC-CASE and Phill Berrie adapted them to MacCASE.  I do not know the names of the many other programmers involved in developing other versions for PRIME, DEC, and IBM mainframes.

[32] Miriam Shillingsburg, "Computer Assistance to Scholarly Editing," Bulletin of Research in the Humanities 81(Winter 1978), 448-63; and Peter Shillingsburg, "The Computer as Research Assistant in Scholarly Editing," Literary Research Newsletter, 5 (1980), pp. 31‑45.