Reimplementation of
"The Evolution of Phenotypic Correlations and Developmental Memory"

Introduction

This report is based on the article "The Evolution of Phenotypic Correlations and Developmental Memory" by Richard A. Watson et al . This work is referred to throughout the remainder of the report as "the original article."

Experiment 2 and experiment 3 from the original article are re-implemented using details of the evolutionary algorithm provided and results are compared with those in the article.

Overview

The referenced article describes how evolved developmental processes have the ability to produce previously selected high-fitness phenotypic traits due to gene regulations. We further learn how this reproduction is robust for differences in embryonic phenotypes and has a high accuracy for recall even with varying levels of corruption to the embryonic phenotype.

This recall or "memory" of a developmental process is experimentally observed to be similar to results obtained from Hebbian learning algorithms.

One main difference of this model as compared to models used the earlier studies is the non-linearity of mapping. When a linearly mapped development process is used, the evolved results tend to average out the phenotypic traits selected in the past even if it is of low fitness. The non-linear model for developmental interactions is key to holding and recalling distinct phenotypes of high fitness that were selected in the past.

The article also discusses how this biased reproduction of traits can hold memories of sub-patterns or combinations of traits like a learning algorithm learns patterns in data to generalise. This is explained by evolving phenotypes to select from a pattern set made up of "loop" or "stalk" in each quadrant. The evolved developmental interaction matrix resulting from this process will produce a large variety of phenotypes that have the general characteristics of the previously selected high-fitness phenotypes. This generalisation further shows a correspondence of evolutionary algorithms with well-understood learning algorithms.

Experiment 2

In experiment 2, we use two selection phenotypes which are made up of 8 bits or numerical parts, N = 8, each either positive or negative.

The two patterns used in the article and our reimplementation of the experiment are S1 = ++—+-+ and S2 = +-+-+—. In the implementation, these were represented as -1 and +1 for the negative and positive bits respectively.

The fitness function was the vector dot product of the developed phenotype and the target phenotype. The target phenotype was altered between S1 and S2 every 2000 generation and the algorithm was run for 8 × 10⁵ generations.

As in the original article, the development was modelled with the function P(t + 1) = P(t) + τ₁σ(BxP(t)) − τ₂P(t) where σ(x) = tanh (x) is the non-linear function and rate constant τ₁ = 1 and decay rate τ₂ = 0.2.

We used Strong Selection Weak Mutation, so the mean phenotypes (vector component G and interaction matrix B) were evolved which was similar to using a hill climber with a high mutation rate on G (μ₁ = 1random element mutated by a value randomly chosen uniformly over the range -1 to +1) and a low mutation rate on B (μ₂=μ₁ / 15). The magnitude of direct mutations (on G) are capped at -1 and +1 on either side.

Similarly, for the same setup, we use Hebbian learning to create the B matrix. As per Hebb’s rule which roughly means "neurons that fire together, wire together", we adjust the weights of the matrix. If the signs of the target phenotype and developed phenotype match at a particular locus, we increase the weight that corresponds to their regulation coefficient and if the signs are opposite, we decrease the weight. The change in weights is governed by a learning constant (we used c=0.6). It does not take many iterations for the signs to stabilise as the modifying function is linear. Producing each target as a learning pattern once was sufficient to get the matrix in 3.

Experiment 3

The two target phenotypes used in experiment 3 were of size, N = 1764 which are represented in the form of a matrix of size 42 × 42. Here S1 and S2 represent portraits of Darwin and Donald Hebb respectively.

The vector part of the genotype (G) is evolved by a hill climber that gets to a local optimum with the available gene regulation matrix (B) which is a sparse matrix with one gene being regulated by a maximum of 10 other genes chosen at random. We use a matrix mask to choose and update elements in B. This mask has 11 ones on every column, one on the element that falls on the matrix diagonal and others chosen randomly.

The gene regulation coefficient matrix (B) is evolved by randomly selecting all the connections without replacement and adding a certain mutation to them. Mutations on B are chosen randomly over uniform distribution with mean q and standard deviation σ = 0.01q. Two mutated B matrices are created at every evolution step, one towards the positive (q = 0.02) and the other towards negative (q = −0.02) mutation values. The best among the three, one original and two mutated individuals are selected as new B at every step.

The target phenotypes were changed between S1 and S2 at every such evolutionary step for a total of 40 times. The fitness function, as in the previous experiment, was a dot product of vectors.

Discussion

As mentioned in the original article, the G vector component produced by the evolutionary algorithm does not have much influence on P, the phenotypes produced at the end of the developmental cycle. The P vector has a strong dependency on the gene regulation interaction matrix, B. Even to a point where only a partial (a few pixels of) genotype is capable of producing the target phenotype reliably.

The re-implementation results for both experiment 2 and experiment 3 closely match that in the original article. All the subfigures for experiment 2 are reproduced in [fig:fig2group].

For experiment 3, the required figures (7 and 9) along with 8 (from partial G) are produced and observed to match the results from the original article in 10.

One important observation from different trials of experiment 3 was that the quality of results was sensitive to the target phenotypes chosen. When target phenotype S2 was used with the horizontal translation of 2 positions (2 pixels in the image) and all other parameters were kept the same, the reproduction of the phenotypes was poor with a lot of noise. In this case, there was a huge bias towards one of the target environments as when development was started with a random G, the result was usually S1 and when in rare cases S2 was developed it had quite a bit of noise.

(D) Random G (top row) and resulting adult phenotypes (bottom row).

(E) Partial G (top row) and resulting adult phenotypes (bottom row).

(F) G varies systematically from S1 to S2 (top row) and resulting adult phenotypes (bottom row)

Extension

As an extension to the work seen so far, we will try to understand the capacity of this developmental process.

Hypothesis

Since the developmental interaction matrix resembles the Hebbian learned matrix, it must have similar properties. A Hopfield network (which is functionally the same as the gene development interaction matrix) is known to have a capacity which is a linear function of the size of the network and roughly equals 0.15N, where N is the size of the vector.

Experiment

If the hypothesis is true, we will get 1 to 2 memories for N=8 and 2 to 3 memories for N=16. We perform experiments resembling experiment 3 from the earlier section to produce data that we can compare.

For the memories or target phenotypes, we randomly generate M vectors of a given length N. While generation we check that they are all distinct - no two target phenotypes are equal to or complement each other.

Then we evolve starting from G vector of length N and B matrix of size N × N. The evolution steps are calculated by M × 4 × 10⁵ where a new target phenotype from the pattern list is introduced every 2000 evolutionary steps, making sure each pattern gets the same evolution time to be selected for.

Metrics

Once we have evolved a B matrix like before, for a particular value of N, we randomly generate 100 samples of embryonic genotypes of length N that have randomly selected values for each element which is either +1 or -1. This G vector is then developed for T=10 development steps and matched to the selection target phenotypes.

We use a metric "Memory Recall" here which is equal to the number of patterns that were correctly developed to one of the target phenotypes given for selection. This number can also be considered as a percentage for the accuracy of developing phenotype from the target pattern set.

The results that were found were averaged over three runs with different random generation seeds

Results

The result data of the experiment is shown in the table 1. Here N is the length of the phenotype, M is the distinct number of high-fitness environments selected and Memory recall as discussed before indicates the percent of adult phenotypes that exactly resemble one of the selected phenotypes or their complement.

Using Hopfield’s observation we expect the memory capacity to be roughly 0.15N. A good correlation to expected results was seen in 2. While there was an unexpectedly good performance with N=10, more experiments will be needed to come to a firm conclusion.

As discussed in the reimplementation results for experiment 3, the capacity to hold memory seems to be heavily influenced by the memories themselves. The set of target phenotypes affects the quality of developed phenotypes and also the memory recall.

Hence, there seems to be evidence for N and M to be linearly related as it is in Hopfield networks for an evolution algorithm that gives gene regulation networks. This result is intuitive to understand as the two networks are functionally similar. More experiments need to be performed by studying the nature of memories or selective environments that influence this capacity of the network.

Extension experiment results
N	M	Memory recall
8	1	100
8	2	100
8	3	98
8	4	65
10	2	100
10	3	91
10	4	97
10	5	96
10	6	85
12	2	100
12	3	95
12	4	20
12	5	53
12	6	10
14	2	99
14	3	63
14	4	58
16	2	91
16	3	96
16	4	2
16	5	0

Result hypothesis comparison
N	Expected memory	Memories Held
8	1.2	2 to 3
10	1.5	1 to 4
12	1.8	2
14	2.1	2
16	2.4	2 to 3

Discussion

The trend with the small set of experiments performed looks like the hypothesis tends to hold as N is larger (here, N > 10).

While the results seem to agree on these values, it is intuitive to understand that some sets of patterns will produce a higher memory recall as compared to some other patterns, when all other parameters are kept the same. One reason for this could be for some groups of patterns, the loci tend to disagree on the weights of the B matrix quite a lot, these weights wouldn’t converge quickly or at all at least for producing those conflicting patterns.

Another reason could be due to the generalisation feature of the evolved B matrix. This makes certain common blocks that get produced semi-independently of the remaining phenotype.

An underlying assumption here is that the gene interaction matrix is fully connected like in the Hopfield network, as the connections get sparse the memory recall would decrease and the network will not accurately predict the exact phenotypes as seen in results for experiment 3.

In the biological context, the memory recall need not be a full 100%. As selection will remove the low-fitness phenotypes, a high memory recall with a lower number of gene regulation interactions will be beneficial compared to having a high number of interactions with a perfect memory of the past. Moreover, a small part of the population diverging from "overfitting" on previously selected phenotypes could be helpful as a mutation-like function which opens a probability (although small) of producing a high-fitness phenotype which was not selected in the past, as seen in generalisation.

Conclusion

From the results and discussion, we can conclude that there does seem to be some correlation between the expected memory capacity and the observed value.

There are several challenges in this experiment. Firstly, only whole memories can be held and a linear function like the one we used might not give the complete picture of how much of the partial memories are held. Secondly, a good recall can come with a little noise which our method here completely ignores. Lastly, a lot more iterations need to be performed to get convincing results.

N	M	Memory recall
8	1	100
8	2	100
8	3	98
8	4	65
10	2	100
10	3	91
10	4	97
10	5	96
10	6	85
12	2	100
12	3	95
12	4	20
12	5	53
12	6	10
14	2	99
14	3	63
14	4	58
16	2	91
16	3	96
16	4	2
16	5	0

N	M	Memory recall
8	1	100
8	2	100
8	3	98
8	4	65
10	2	100
10	3	91
10	4	97
10	5	96
10	6	85
12	2	100
12	3	95
12	4	20
12	5	53
12	6	10
14	2	99
14	3	63
14	4	58
16	2	91
16	3	96
16	4	2
16	5	0

Reimplementation of"The Evolution of Phenotypic Correlations and Developmental Memory"

Introduction

Overview

Experiment 2

Experiment 3

Discussion

Extension

Hypothesis

Experiment

Metrics

Results

Discussion

Conclusion

Appendix

Experiment 2

Experiment 3

Reimplementation of
"The Evolution of Phenotypic Correlations and Developmental Memory"

N	M	Memory recall
8	1	100
8	2	100
8	3	98
8	4	65
10	2	100
10	3	91
10	4	97
10	5	96
10	6	85
12	2	100
12	3	95
12	4	20
12	5	53
12	6	10
14	2	99
14	3	63
14	4	58
16	2	91
16	3	96
16	4	2
16	5	0