Simplex-Wide Recombinant Text

Recombinant text is a medium of collaborative design and composition, with an underpinning in evolutionary genetics. It has a distributed, peer-to-peer communication pattern that distinguishes it from the more common, centralized approaches to collaborative design. Instead of a central copy of the shared design or composition, a recombinant text has multiple copies, one per author. Instead of pushing contributions to the center, they are pulled among authors, from peer to peer. As a result, the text diverges into multiple variations that co-exist side by side. Collectively, it has spatial diversity. In statistical terms, it has a population.

Within the population, pieces of text are swapped and recombined in discrete units, like genes. The resulting pool of variant pieces serves the author as an up-to-the-minute variorum, a continually refreshed palette of compositional ideas from which the next generation of the text is recomposed. Over time, and successive generations, the collective effort converges on optimal paths in the composition space, and the text evolves.

This page describes the simplex-wide approach to recombinant text. Simplex-wide is based on single gene transfers and population-wide communications. It may be outmoded by the more recent paired-regions approach. But the two approaches share a similar infrastructure, and the description below is largely relevant to both.

Introduction  Genetic Code  Population  Gene Pool  Recombination
Notes  References  Glossary

Introduction

Suppose you are working on a text, revising it, drafting and redrafting, when you come to a stop at a particularly difficult passage. Something is wrong with it, but exactly how to correct it is unclear. Imagine at this point that you could look out through your copy of the text and into the population, where you would see an array of variations — alternatives of the same passage drafted by other authors. If one appeared to be an improvement, you could select it, adopt it as a replacement, and continue working...

This pattern of communication would differ from the centralized approach of common groupware or collaborative media systems. Instead of multiple authors pushing contributions to a single, authorative copy of the text, they would communicate by pulling contributions from each other.

                               O  <---- O
     \       /
      \     /                    \
       \   /                      \
                                   \
------>  O  <------       O  -------\------>  O
                                     \
       /   \                \         \     /
      /     \                \         \   /
     /       \
                               O         O
                
Figure 1. The centralized pattern of communication common to most groupware or collaborative media (left), and the distributed pattern of recombinant text (right).

As a consequence, a recombinant text would exist in multiple variations. Collectively, it would have spatial diversity. In statistical terms, it would have a population.

A similar pattern occurs in nature, where innovations that arise in the genes of isolated individuals are diffused piece-wise and selectively throughout the group. Adopting the terms and theory of biology, therefore, recombinant text is classified as a type of evolutionary genetic system. Table 1 places it in context.

Table 1. Evolutionary genetic systems and their refinements.
  system or refinement genetic
material
selectand innovator selector
  natural populations nucleotide individual nature nature
    artificial selection nucleotide individual nature man
      genetic engineering1 nucleotide individual man man
      recombinant text data gene man man
      human-based evolutionary algorithms2 data individual man man
    interactive evolutionary algorithms3 data individual computer man
  evolutionary algorithms data individual computer computer

An obvious pattern in the table is the division between organic systems based on polynucleotides (top), and computer systems based on data sequences (bottom). Another is the vertical symmetry above and below the middle row.

Looking to the far right, the selector is the agent that decides fitness in the system. It decides which variations will reproduce and contribute to the next generation. In natural populations, and in evolutionary algorithms, the decisions are automatic; wheras in refinements of these systems, and in recombinant text, the decisions are made by people.

The innovator is the agent of genetic change. The innovator mutates and recombines the genetic material, to produce the variations on which the selector operates. In most organic and computer-based systems (top and bottom), innovation is automatic, operating without human intervention. In recombinant text, the innovators are people; all variations are manually created.

The selectand is the unit selected. In most organic and computer-based systems (top and bottom), the selectand is an individual member of the population. For example, in artificial selection, a breeding individual is selected from a cohort of domestic animals or plants. Similarly, in interactive evolutionary algorithms, a whole artifact is selected from among comparable artifacts. In both organic and algorithmic systems, the selectand is always a whole individual.4 By contrast, in recombinant text, the selectand is a gene; a particular gene is selected from the larger pool of variants and recombined into the F1 genotype, in order to generate the F2 child.

Compared on lines of human agency, recombinant text is similar to both of the innovatory refinements: genetic engineering and human-based evolutionary algorithms. In all three systems, the innovators and selectors are people. But recombinant text extends human control of the evolutionary process further, to the direct selection of genes and recombinant sequences, throughout the population. In fact, it extends so far that the detailed construction of every genotype of the population is under direct and exclusive human control.

The remainder of this page introduces the essential structures and methods of simplex-wide recombinant text (orginally described as ‘evolutionary phenogenetic engineering’; Allan, 2001). Details below are drawn from prototypes that were developed for project textbender.5

Genetic Code

A genetic code is a set of rules for encoding and decoding texts. To illustrate, consider a literary text, The Metamorphoses by Ovid. The original Latin begins:

In nova fert animus mutatas dicere formas
corpora; di, coeptis (nam vos mutastis et illas)
adspirate meis primaque ab origine mundi
ad mea perpetuum deducite tempora carmen!
· · ·

Assuming a genetic code based on Extensible HyperText Markup Language (XHTML), it might be encoded as:

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE html PUBLIC
        "-//textbender//DTD XHTML recombinant//EN"
        "xhtml-recombinant.dtd">
<html xmlns='http://www.w3.org/1999/xhtml' xml:lang='la'>
<body t:locus='metamorphosen'>
<div class='book' t:locus='metamorphosen.1'>
<div class='canto' t:locus='metamorphosen.1.1c'>
        <div t:locus='metamorphosen.1.1'>
                In nova fert animus mutatas dicere formas
                </div>
        <div t:locus='metamorphosen.1.2'>
                corpora; di, coeptis (nam vos mutastis et illas)
                </div>
        <div t:locus='metamorphosen.1.3'>
                adspirate meis primaque ab origine mundi
                </div>
        <div t:locus='metamorphosen.1.4'>
                ad mea perpetuum deducite tempora carmen!
                </div>
        · · ·

The encoded form of a text, as shown above, is known as a genotype. It is composed mainly of genes. A gene is a datum encoding a part of the text. In the example above, most of the genes are <div> elements. They encode the separate books, cantos, and lines of the poem. The encoding is hierarchical. At top, a single <body> gene encodes the whole.

The variable part of a gene is termed the sequence. Genes with variant sequences are termed alleles of the gene. For example, the genotype below has the same genes as the original, but different alleles.

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE html PUBLIC
        "-//textbender//DTD XHTML recombinant//EN"
       "xhtml-recombinant.dtd">
<html xmlns='http://www.w3.org/1999/xhtml' xml:lang='en'>
<body t:locus='metamorphosen'>
<div class='book' t:locus='metamorphosen.1'>
<div class='canto' t:locus='metamorphosen.1.1c'>
        <div t:locus='metamorphosen.1.1'>
                My mind inclines [me] to speak of forms changed into new
                </div>
        <div t:locus='metamorphosen.1.2'>
                bodies; ye gods, on my undertakings (for you also changed them)
                </div>
        <div t:locus='metamorphosen.1.3'>
                breathe [propitious] and from the first origin of the world
                </div>
        <div t:locus='metamorphosen.1.4'>
                to my own times bring down an uninterrupted song!
                </div>
        · · ·

While the original genotype encoded ancient Latin, the genotype above encodes a literal translation (Giles, c.1900) into modern English. Comparing just the alleles of a single gene:

<div t:locus='metamorphosen.1.1'>
        In nova fert animus mutatas dicere formas
        </div>
<div t:locus='metamorphosen.1.1'>
        My mind inclines [me] to speak of forms changed into new
        </div>

Alleles of a gene share a common locus. The locus identifies the gene. In this case, the gene is metamorphosen.1.1, which corresponds to the first line of Ovid's Latin text.

Finally, a genotype may also be decoded. The rules for this, too, are part of the genetic code. The technical term for decoding is expression. Expression generates or regenerates a text from the genotype. The result is a particular, tangible instance of the text, which is termed the phenotype. Its detailed characteristics will depend both upon the genotype, and upon the particular environment in which it is expressed: phenotype = genotype + environment.

From the genotype above, any number of phenotypes may be expressed, all more-or-less similar according to the environment. The environment would include the particular renderer used — the Web browser, printer, or other device — plus the formatting instructions of the style sheet. For example, one particular phenotype might look like:

My mind inclines [me] to speak of forms changed into new
bodies; ye gods, on my undertakings (for you also changed them)
breathe [propitious] and from the first origin of the world
to my own times bring down an uninterrupted song!
· · ·

Population

A population is a community of individuals that exchange genetic information. An individual is an instance of a text, normally associated with a single author.

For example, here is a dictionary at an early stage of its formation. Its associated author is lexicographer A:

A
evolution /i:və'lu:ʃ(ə)n/ gradual change.

Individuals are genetically encoded, and each has a particular genotype. Here is the dictionary's genotype:

<lexicon locus='a0'>
        <word locus='a1'>
                <head locus='a2'>evolution</head>
                <pronunciation locus='a3'>i:və'lu:ʃ(ə)n</pronunciation>
                <meaning locus='a4'>
                        gradual change
                        </meaning>
                </word>
        </lexicon>

The gene at top, lexicon-a0, contains word-a1, which contains head-a2, pronunciation-a3 and meaning-a4. For purposes of illustration, we name the locus of each gene after its creator, A.

Other lexicographers, B and C, make copies of this dictionary. Now there is a population of three individuals:

A
evolution /i:və'lu:ʃ(ə)n/ gradual change.
B
evolution /i:və'lu:ʃ(ə)n/ gradual change.
C
evolution /i:və'lu:ʃ(ə)n/ gradual change.

Populations normally grow like this, one individual at a time. The exchange of information that initially defines them as individuals of the same population is wholesale cloning — B and C have taken whole copies of A's genotype. Subsequent exchanges are more precise.

Gene Pool

A gene pool is the collection of alleles across a population. It is fed by cycles of mutation.

B
evolution /i:və'lu:ʃ(ə)n/ gradual progressive change.

Here, B has modified the definition, and the genotype is modified accordingly:

<lexicon locus='a0'>
        <word locus='a1'>
                <head locus='a2'>evolution</head>
                <pronunciation locus='a3'>i:və'lu:ʃ(ə)n</pronunciation>
                <meaning locus='a4' mu='b'>
                        gradual progressive change
                        </meaning>
                </word>
        </lexicon>

This modification affects the sequence of meaning-a4, and creates a new allele of the gene. In this simplified genetic code, the 'mu' attribute records both mutation history (gene ancestry, also known as gene genealogy) and authorship. The resulting allele is designated meaning-a4-b, which means "allele created by B's mutation of meaning-a4". This allele enters the gene pool, so now there are two alleles of the gene.

Meanwhile, C has also modified the definition:

C
evolution /i:və'lu:ʃ(ə)n/ 1 gradual development. 2 a process of development and origin of species from previous forms. 3 the progression of events etc. in due course (the evolution of the plot).
<lexicon locus='a0'>
        <word locus='a1' mu='c'>
                <head locus='a2'>evolution</head>
                <pronunciation locus='a3'>i:və'lu:ʃ(ə)n</pronunciation>
                <meaning locus='a4' mu='c'>
                        gradual development
                        </meaning>
                <meaning locus='c0'>
                        a process of development and origin of species from previous forms
                        </meaning>
                <meaning locus='c1'>
                        the progression of events etc. in due course
                        <example locus='c2'>the evolution of the plot</example>
                        </meaning>
                </word>
        </lexicon>

C has created two new alleles, and also added two new genes to the genome. The mutation that creates allele meaning-a4-c is similar to B's, in that it modifies a sequence of characters; while that of word-a1-c modifies a sequence of genes.

A gene that contains other genes is termed a parent gene. In the hierarchy of the genotype, to add, or delete, or rearrange genes, is to alter the sequence of the parent gene that contains them. The result is a new allele of the parent gene. So C's addition of meaning-c0 and meaning-c1 has created the new allele word-a1-c.

Recombination

Recombination allows the author to improve a text by incorporating sequences from other individuals. Recombination targets a precise part of the text, and the corresponding gene that expresses it.

For example, suppose A chooses that part of the dictionary expressed by meaning-a4:

A
evolution /i:və'lu:ʃ(ə)n/ gradual change.
<lexicon locus='a0'>
        <word locus='a1'>
                <head locus='a2'>evolution</head>
                <pronunciation locus='a3'>i:və'lu:ʃ(ə)n</pronunciation>
                <meaning locus='a4'>
                        gradual change
                        </meaning>
                </word>
        </lexicon>

Alleles of the target gene are sought in the gene pool. In this example, three are found:

<meaning locus='a4'>
        gradual change
        </meaning>
<meaning locus='a4' mu='b'>
        gradual progressive change
        </meaning>
<meaning locus='a4' mu='c'>
        gradual development
        </meaning>

These alleles are expressed and made visible to the author, as variants of the chosen part.

gradual change.
gradual progressive change.
gradual development.

If one of the variants appears to be an improvement, the author may select it. Its corresponding allele is then replicated into the genotype, replacing the old allele. For example, selecting the middle variant, A's genotype and dictionary would become:

<lexicon locus='a0'>
        <word locus='a1'>
                <head locus='a2'>evolution</head>
                <pronunciation locus='a3'>i:və'lu:ʃ(ə)n</pronunciation>
                <meaning locus='a4' mu='b'>
                        gradual progressive change
                        </meaning>
                </word>
        </lexicon>
evolution /i:və'lu:ʃ(ə)n/ gradual progressive change.

Recombination, then, replaces an allele within a genotype by a different allele of the same gene.

Where the old allele is a leaf gene (has no child genes of its own) replacement is wholesale, as shown above. However, if the old allele itself contains child genes, then replacement must preserve their content. In the case of parent recombination, only the order of genes is transferred. Thus the tranfer may may reorder, delete or add child genes as whole units, but it may not alter their content. So, if the selected (new) allele has a sequence of child genes in a different order, then the child genes of the target are reordered to match. Or, if it has a reduced sequence of child genes, then child genes are deleted from the target. Or, if an expanded sequence, then child genes are added.

For example, if A were to target the dictionary's entire entry for 'evolution', then the gene pool holds two alleles of the corresponding gene:

<word locus='a1'>
        <head locus='a2'/>
        <pronunciation locus='a3'/>
        <meaning locus='a4'/>
        </word>
<word locus='a1' mu='c'>
        <head locus='a2'/>
        <pronunciation locus='a3'/>
        <meaning locus='a4'/>
        <meaning locus='c0'/>
        <meaning locus='c1'/>
        </word>

This abbreviated representation leaves out the content of the child genes. Although this is artificial, because parent alleles do not appear like this in genotypes, it is clearer, because it shows exactly what the parent allele encodes: a particular sequence of child genes, regardless of their content.

To express a parent allele, however, it is simpler to add content to its child genes. Adding content of the source genotypes (A and C respectively), for example, here are the two variants for A to choose between:

evolution /i:və'lu:ʃ(ə)n/ gradual progressive change.
evolution /i:və'lu:ʃ(ə)n/ 1 gradual development. 2 a process of development and origin of species from previous forms. 3 the progression of events etc. in due course (the evolution of the plot).

If A selects the expanded definition, then allele word-a1-c is replicated into the genotype, replacing the old word-a1. The resulting genotype and phenotype are:

<lexicon locus='a0'>
        <word locus='a1' mu='c'>
                <head locus='a2'>evolution</head>
                <pronunciation locus='a3'>i:və'lu:ʃ(ə)n</pronunciation>
                <meaning locus='a4' mu='b'>
                        gradual progressive change
                        </meaning>
                <meaning locus='c0'>
                        a process of development and origin of species from previous forms
                        </meaning>
                <meaning locus='c1'>
                        the progression of events etc. in due course
                        <example locus='c2'>the evolution of the plot</example>
                        </meaning>
                </word>
        </lexicon>
evolution /i:və'lu:ʃ(ə)n/ 1 gradual progressive change. 2 a process of development and origin of species from previous forms. 3 the progression of events etc. in due course (the evolution of the plot).

Note, the original meaning 'gradual progressive change' was not altered. Two new child genes (ultimately from genotype C) were replicated into genotype A, but the content of the existing child genes was preserved.

Altogether, at this point in time, here is the population:

A
evolution /i:və'lu:ʃ(ə)n/ 1 gradual progressive change. 2 a process of development and origin of species from previous forms. 3 the progression of events etc. in due course (the evolution of the plot).
B
evolution /i:və'lu:ʃ(ə)n/ gradual progressive change.
C
evolution /i:və'lu:ʃ(ə)n/ 1 gradual development. 2 a process of development and origin of species from previous forms. 3 the progression of events etc. in due course (the evolution of the plot).

Of course, the lexicographers would also extend the dictionary with new words. And other lexicographers might join in, if they found the project interesting. As long as contributions were to continue, and new mutations were to feed the gene pool, then each cycle of mutation or recombination would improve the collective population, by ratchet effect. Over time, the dictionary would evolve.

Notes

1.

Although genetic engineering is not an evolutionary system in itself (only an isolated technique), it is applied to systems that are, at least in potential, evolutionary; namely populations in the laboratory. So it may be considered a technique in refinement of an evolutionary system. Its evolutionary application is more explicit in population genetic engineering (Burt, 2003; and Burt & Trivers 2005, pp. 223-).

2.

Human-based evolutionary algorithms are also called ‘human-based evolutionary computation’ (Kosorukoff, 2001). See http://en.wikipedia.org/wiki/Human-based_evolutionary_computation.

3.

Interactive evolutionary algorithms are also called ‘interactive evolutionary computation’, and ‘collaborative evolutionary algorithms’. See http://en.wikipedia.org/wiki/Interactive_evolutionary_computation. See also Bentley & Corne (2002).

4.

Neither genetic engineering nor human-based evolutionary algorithms introduces a selection technique of its own. They are refinements of innovation, not selection. Where selection occurs in these refined systems (whether organic or algorithmic) the selectand remains the whole individual.

5.

Allan (2006). For an online overview of the prototypes, see http://zelea.com/project/textbender/d/_/simplex-wide-obsolete/.

References

Allan, Michael. (2001). Method and system for evolutionary phenogenetic engineering. Canadian patent application 2,340,792 (withdrawn). Filing record: http://patents1.ic.gc.ca/details?patent_number=2340792.
Allan, Michael. (2006). Project textbender, release 0.1.0. SourceForge.net. http://sourceforge.net/project/showfiles.php?group_id=134813&package_id=148018. Later release online: http://zelea.com/project/textbender/.
Bentley, Peter J. and David W. Corne. (2002). An introduction to creative evolutionary systems. In Peter J. Bentley and David W. Corne eds., Creative Evolutionary Systems. Academic Press, San Diego.
Burt, Austin. (2003). Site-specific selfish genes as tools for the control and genetic engineering of natural populations. Proc. Biol. Sci. 270: 921-928.
Burt, Austin and Robert Trivers. (2005). Genes in Conflict: The Biology of Selfish Genetic Elements. Harvard University Press.
Giles. (c.1900). The Metamorphoses of Ovid. Cornish and Sons, London. As reproduced at http://www.textkit.com/learn/ID/154/author_id/71/.
Kosorukoff, Alex. (2001). Human-based genetic algorithm. In IEEE International Conference on Systems, Man, and Cybernetics, 2001.

Glossary

Biological terms are defined here as they apply to recombinant text.

allele a variant of a gene, varying with respect to its sequence
child gene a gene that is an immediate child element of another (parent) gene
express generate or regenerate a text, or part of a text, by decoding it from a genotype or a gene [cf. in nature, expression + ontogenesis]
gene a datum encoding part of a text, at a particular locus a locus, especially in the context of comparison, 'instances of the same gene'
gene ancestry a family tree of sequences related by mutation
gene pool the alleles of a population
genetic code rules for encoding and decoding genotypes and genes
genome the genes of a population
genotype the complex of alleles that encodes a whole, individual text
individual an instance of a text, especially within a population
locus the identifier of a gene, common to each of its alleles the logical location of the gene in genome space, where recombination occurs
mutate alter the sequence of a gene
parent gene a structural composite gene that contains one or more child genes
phenotype the observable characteristics of a text resulting from the interaction of its genotype with the environment (phenotype = genotype + environment)
population a community of individuals that exchange genetic information, e.g. by replication or recombination
recombine replace an instance of one allele in a genotype with an instance of a different allele of the same gene
sequence the part of a gene that may mutate, to create alleles of the gene
species the overall population of a text
textbender

Portions of this document were contributed by Michael Allan in 2007 to Wikipedia articles on human-based genetic algorithms.

Copyright 2005-2007, Michael Allan. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Textbender Software"), to deal in the Textbender Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicence, and/or sell copies of the Textbender Software, and to permit persons to whom the Textbender Software is furnished to do so, subject to the following conditions: The preceding copyright notice and this permission notice shall be included in all copies or substantial portions of the Textbender Software. THE TEXTBENDER SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE TEXTBENDER SOFTWARE OR THE USE OR OTHER DEALINGS IN THE TEXTBENDER SOFTWARE.