, , , ,

Not About Goldfish

In many languages – and remember, mathematics is a synthetic language which can be made sufficiently complex to become unreadable by most persons – one can construct a class of hierarchical elements.  It doesn’t take much sophistication to do so.

We can go about sorting the set-based description of goldfish and just look at it as though we are unfamiliar with the terms and details (which many of us are):

{Cyprinidae – Psilorhynchidae} ∈{Cyprinoidea} ∈{Cypriniformes} ∈{Ostariophysi} ∈{Teleostei} ∈{Neopterygii} ∈{Actinopteri} ∈{Actinopterygii} ∈{Osteichthyes} ∈{Gnathostomata} ∈{Vertebrata} ∈{Chordata Craniata} ∈{Chordata} ∈{Animalia} ∈{Eukaryota}

First off are operators – the ‘glue’ to the language sequence, which has three elements, each of which are repeated about fifteen times – “” “{” and “}”  Regardless of what they “mean,” they are highly redundant.  They serve to perform an operational process – two of them define a set, and one of them defines a relation within the set.  The operation is simple – “is an element of”

Recall, there is just one classification here, and it is of one species, the goldfish. If one sets about characterizing all sorts of things, such as “the Fish of North America,” or even “the Fish of California,” one will notice that the broadest category – say, the Gnathostomata – will become highly repeated, as it is a category to which (almost) all fishes belong.  The category Actinopterygii includes most of which the non-expert thinks of as fish.  There will be freshwater and saltwater fish – they will fall out into the various categories down a bit further.  Those categories which comprise a high percentage of all fish will be seen frequently. Other terms, instead of them, will be seen but scarcely, throughout the text.  Osteichthyes will predominate; Chondrichthyes (such as the sharks) will be seen in salt water.

In a compendium, there will be a hierarchy of usages depending on class.

Now, how might this be applied to analyze the Voynich Manuscript (remember the Voynich Manuscript?)

The elements need not be classified into strict taxonomy, but they are classifiable into logically common sets, depending on the opinion of the author.  English garden flowers, for instance, can be described as to their tolerance of dry spells, their height, their color, et cetera.  But the general rules of classification should prevail:  there is a simple common element to use as a discriminator, such as {deciduous trees}∈{trees}, {evergreens}∈{trees} which are mutually exclusive subclasses depending on leafe fall.  However, every description of evergreens will have some reference to their being a tree, same as for deciduous trees.

Up to the point of the complete universe described in a compendium, terms may become more and more common.

Now, how would one use the techniques of Shannon analysis to give answers that tell information about our new expectations?

Language-independent logical ordering

Several assumptions can be made about the underlying structure of the text:

  • Small arguments are extent, i.e. a sentence does not rely on other sentences to grammatically complete it.
  • Definitions and identities are not expressed redundantly.
  • The writer is to some degree writing a compendium, associating characteristics with individual types of things, and classifying them into higher orders.  This assumption is language-independent.

Independent of what language is used, one can parse the information below in terms of its type and frequency:

The house cat is a type of feline that lives in the home.  The house cat pursues other animals, such as mice, for its nourishment.  The house cat has fur.  The house cat is not the same as the dog, although they can share the human household comfortably.

An individual element “house cat” exists that is an element of the greater set “feline.”

The language of expression goes like this:

  1. A particular.
  2. A logical operator.
  3. A set to which the housecat belongs.

house cat{∃}{feline}

For proper taxonomy, house cats are NOT elements of any other set at such a level:  House cats are NOT members of any nonfeline set.

house cat{∄}{¬}{feline}  Particular, operator, operator, set.

housecat{¬} dog   Particular, operator, particular.

Expected occurrences in a compendium.

Clearly the set of logical operators should prevail in a compendium or a didactic text, the purpose of which involves the association of nouns with other nouns in a certain way.  The very simplest logical operators, ∃ and =, should predominate throughout the text.  The operators should have no sharp boundaries delineating their use – their “bandwidth” should be that of the text.

Words which characterize general properties of particulars and sets should appear throughout the compendium without boundary, e.g. “green,” “heavy.”  The bandwidth is the same, but the frequency of use is lower.

Words which delineate sets of things, e.g. “plant” will be used more often where plants are concerned, and hardly at all elsewhere; and then usually for negation:

“The cat sits on the mantel.  However, the cat is not a plant.”

Finally are the particulars – words which are very frequently used in one region, one paragraph, and not hardly at all elsewhere.

“The house cat often sits on the mantel.  However, the house cat is not a plant.”

If one does something very geeky, e.g. plots a Fourier transform against the word use, one should suspect the following data:

  1. Logical operators should have a predominant component of low-frequency noise – they are present throughout, but are not used in regular intervals, so the distribution will be erratic, from a mathematical point of view.  The bandwidth should be that of the text (related to the Nyquist frequency.)
  2. Essential descriptors should have less amplitude (i.e. oftenness of use) but should otherwise appear like logicals. (e.g. (the star is blue,” “the flower is blue.”)
  3. Set descriptors should be similar to essential descriptors, except they are bounded (or generally bounded) within a region of text.
  4. Particulars have a tight section of text in which they are frequently used, and a tight distribution.

Using the numbers to name sets that contain the elements as above, one should see clusters of statements:

4₁,1₁,3₁.  4₁,1₂,2₁.  4₁,1₁,4₂

4₁ is an element in the set of particulars, {4} – it is a certain particular.

1₁ is an operator in the set {1}.

3₁ is a “taxonomic category” or some such thing – an element of discrete, non-overlapping sets such as quadruped, fur-bearing, animal.

4₁,1₂,2₁ should be seen often – it is a discussion of the essentials of this particular, and is expected in a descriptive text (e.g. “brown” for a birding manual.)

4₁,1₁,4₂ differentiating one element in 4 from another.

So What?

One can take an unreadable text, and posit certain elements, clusters of types of words, that occur in that text.  This skill is very useful in drawing meaning from the text.

I sent off some of this stuff to one of the authors that writes about this stuff – Amancio DR, Altmann EG, Rybski D, Oliveira ON Jr, Costa LdF (2013) Probing the Statistical Properties of Unknown Texts: Application to the Voynich Manuscript. PLoS ONE 8(7): e67310. doi:10.1371/journal.pone.0067310

They already have classified the operators – they just don’t know it yet.  And if they look for clusters, they might get some useful information.

I think I’ll push all the wiggy stuff until Fridays.  I don’t think people are all that faskinated with it, and that way, there’s nothing on the plate until the weekend.