The natural increasing of entropy in a closed system has its counterpart in the decreasing of information. The fact that information may be dissipated but not gained, is the information theory interpretation of the second law of thermodynamics where entropy becomes the opposite of information. A message may spontaneously lose order during the communication, as occurs in bad telephone lines where a different kind of noise is present. Words in the conversation are lost and have to be reconstructed from the significant information of the context. From this point of view, information decrease is synonymous with entropy increase.
The fact that information may be lost by entropy but never gained is also seen in the act of translation between two languages. The translation never gains exactly the same meaning as the original. The translator always has to compromise between phrases that are more or less appropriate — in either case, some of the author’s meaning is lost. Other sources of entropy are information input overload (see p. 128) and the importation of noise as banalities (e.g. nonsense entertainment and background sound effects). All this together will impair the organization and structure of a given message and thus culminate in a loss of meaning.
Earlier, entropy has been mentioned as the degree of disorder in physical systems. Transferred to information theory, the concept is used to inform us of the relation between a phenomenon and our information regarding it. More information is necessary to describe it when entropy increases. Consequently, what is arranged and structured needs less information to relate and comprises less entropy. A situation with many alternatives gives high entropy, while a decrease in the number of alternatives gives a lower entropy. Lower entropy is also the result when adding information to a system, something which was mentioned in the section on information physics. The quantitative measure of entropy, interpreted statistically, therefore corresponds to the quantitative measure of uncertainty as defined in information theory.
Information entropy has its own special interpretation and is defined as the degree of unexpectedness in a message. The more unexpected words or phrases, the higher the entropy. It may be calculated with the regular binary logarithm on the number of existing alternatives in a given repertoire. A repertoire of 16 alternatives therefore gives a maximum entropy of 4 bits. Maximum entropy presupposes that all probabilities are equal and independent of each other. Minimum entropy exists when only one possibility is expected to be chosen. When uncertainty, variety or entropy decreases it is thus reasonable to speak of a corresponding increase in information.
It is possible to calculate empirically the relative entropy of a certain language. An attempt to do this with the English language would need to begin with a study of its construction. Existing combinations of letters with their probabilities can be evaluated by using a dictionary as a starting point. If the word INFORMATION is used we may state that it has been chosen from a repertoire of 26 letters in 11 successive choices. With équiprobable letters, every choice represents an entropy of 5 bits and the whole word yields 11 x 5 = 55 bits of entropy. The real entropy is however lower and is calculated according to the successive choices presented in Figure 4.9.
The choice of the first letter is completely free and gives an entropy of 5 bits. The second choice is less free and must be made from among the 18 letters in column two; the English language does not permit any other combinations. This gives an approximate entropy of 4 bits as does the third choice. In the fourth choice the possibilities rapidly decrease and the freedom of choice of the information source is now practically reduced. The last seven letters are calculated in a similar manner but add very little to the information content. Therefore the acronym INFO is very often used instead of the full word.
Figure 4.9 Calculation of entropy in the word INFORMATION.
The final calculation gives an actual entropy of 22 bits. A quotient between actual and maximum entropy gives 0.4 which is the value for the relative entropy in this case. Such a value may be interpreted as if the choice of the information source obeys 40 per cent free-will and 60 per cent compulsion according to the structure of the system. This component of an average message is what could be guessed owing to accepted statistical regularities inherent in our use of the alphabet and is called syntactic redundancy. It reflects the lack of randomness in our choice of signs or messages and denotes what initially seems to be superfluous, as we already know the structure of the system (the language). Redundancy is a richer presentation structure as the message is conveyed in several parallell ways. A consequence is that the mass of text increases but also that short messages are more prone to interpretation errors.
Redundancy, the opposite of information entropy, is however both necessary and desirable in human language and is one of its most typical qualities. Empirical investigations have shown that a colloquial language has a redundancy of 50 per cent while a more technical language has less. The BBC newscast may transmit the following: ‘The President of United States, Bill Clinton, has today announced…’. For the majority of listeners the message is just as intelligible if the first part is omitted. There is however a substantial technical risk of interference; a hooting car may disturb the listener. The redundant phrase The President of United States is therefore wholly functional in the given situation. Redundancy is therefore potential information, available for us if necessary.
Normally we have no problem to interpret AB-ND-NCE -F IN-ORMA- I-N as abundance of information despite more than 25 per cent of the letters being missing. Our language may therefore be regarded as inefficient at first sight; an inefficiency which turns out to be highly necessary if it is to provide an inherent reliability. A language is always a compromise between basically inconsistent demands: precision and security in contrast to flexibility and efficiency. Our vocabulary is adapted to our everyday needs and we cannot have words for every special object or event. When everyday need demands important distinctions, they do, however, exist; the Laplanders have eight different words for snow in their language while the British only have snow and sleet. In comparison with the decimal system, it is obvious that the alphabet contains a large amount of superfluous information.
Semantic redundancy is most easily described by the use of synonyms and paraphrases in natural language. The more extra names for the same thing, the greater the probability of making everything clear and avoiding misunderstanding.
Pragmatic redundancy is defined as the percentage of letters, words, etc. that can be removed from the receiver’s message without changing his response. In this case, compare 1027 and the word Onethousand-twenty seven. Remove one entity from each group and do the conclusion. Total pragmatic redundancy in a message exists if a response intended by the sender has already occurred and is not repeated by the receiver.
The performance of a certain communication system must of course be designed according to the existing entropy of the information source. We must therefore realize the differences between human communication and machine communication. In communication between machines it is possible to reduce the redundancy in order to enhance the speed and efficiency. Technically, a complete reduction of redundancy in machine communication is only possible in a noiseless channel. Otherwise every individual error in a certain message would change the message into another one.
Source: Skyttner Lars (2006), General Systems Theory: Problems, Perspectives, Practice, Wspc, 2nd Edition.