R.H. Fletcher, M.Sc Hons(Statistics), July 1995.

There are two major factors that influence an individual’s measured performance:
genotype and environment. These two factors are sometimes loosely called ‘breeding
and feeding’. The effects of these two factors are not immediately distinguishable in
an individual animal’s performance, but it is of primary concern to the animal
breeder to attempt to separate these two influences, as it is only the genetic component
or ‘breeding’ that is passed on directly to future generations. Breeding values
are an attempt to estimate the genetic component of the actual measured performance of an
animal, and hence are more correctly known as Estimated Breeding Values or EBVs.

In order to attempt to discern the genetic component of an animal’s performance,
we need to understand a little about the nature and distribution of the variation of
performance figures within a flock.

**
**Some statistical terms:

**1. Normal distribution:**

This is the distribution of values that is most widely used to represent a frequency
graph. Variables that follow a **Normal distribution** are most likely to have
an average or typical value called the **mean**, and are progressively less
likely to occur further away from (above or below) the mean. The frequency graph is a
bell-shaped curve like the one shown below for body weights:

The dotted vertical lines are drawn at plus and minus one **standard deviation**
(15kg) from the mean. Approximately two thirds of values will lie within this range ie.
from 20kg to 50kg in this example.

Suppose that we did not know that the mean was 35kg, and we were attempting to *estimate*
the mean by weighing a random sample of say 25 animals. These 25 animals might have an
average weight of 33kg with a *standard deviation* of 12kg. Because of the small
sample size, the best we might be able to say about the average of a very large number of
such animals is that the mean should lie within say 2.4kg of the observed average of 33kg.
This figure of 2.4kg is called the **standard error** of the estimate of the
mean. It is calculated as the *standard deviation* divided by the square root of the
number of animals measured:

12kg / square root(25) = 12 / 5 = 2.4kg

If we were to take a large number of such samples of 25 animals, then we would guess
that in approximately two thirds of the samples, we would measure an average between
30.6kg and 35.4kg.

Because of the square root involved in the formula, increasing the sample size
four-fold to 100 animals would halve the **standard error.** The **standard
errors** of EBVs behave in a similar manner.

Thus a **standard deviation** is a measure of the variability or spread of
the data, whereas a **standard error **is a measure of the likely variability
(or accuracy) of an **estimate** of a figure such as a population mean or an
individuals breeding value.

A **standard deviation** is more or less the same, no matter how much data is
used to calculate it, whereas the **standard error** of an estimate gets smaller
(ie the accuray of the estimate is improved) as more information is used to calculate the
estimate.

**
****2. Breeding Values:**

These are defined as a figure for each individual, which represents how much better or
worse the average of an infinite number of progeny of this animal should be. Thus, since
no individual animal can have this many progeny, it is a figure that can *never* be
known with complete accuracy. However, the idea of a breeding value is still a very useful
concept, and using **estimated breeding values** (EBVs) to *help* in
deciding which animals to keep for breeding purposes can increase the annual rate of
genetic (permanent) improvement in a flocks performance. In the final analysis, however,
the breeder must also take other things into consideration, such as faults, when selecting
animals to be the parents of the next generation.

**
****3. Estimating Breeding Values:**

Now, we have to resort to a mathematical *model *describing how the performance
figure for an individual animal arises:

P = g + e,

or, in words, what we observe, the animal’s *phenotype*, is the sum of
it’s *genotype* and the *environmental* influences it has faced in its life
to date. ("Breeding" plus "feeding").

We can further refine this model by splitting the environmental and genetic components
as follows:

i) g = (g_{s} + g_{d}) / 2 + g_{i}

or an individual’s genetic component is the average of that of his sire and his
dam (g_{s} + g_{d}) / 2, plus a portion that derives from the random
assortment of genes at conception (this is the bit that can make brothers and sisters in a
large family quite different, even though they share a family similarity.)

ii) e = e_{k} + e_{u}

where e_{k} is the sum of known environmental influences such as birth and
rearing ranks, sex, birthdate and age of dam,

and e_{u} are other (unknown) environmental influences such as feed
availability, parasite control etc, etc.

We can see from i) that half sibs (animals with the same sire) can provide some
information about the likely value of g_{s}, while full sibs will help in
estimating the g_{d }part.

Similarly, in ii), we can use the average differnce between twins and singles for
example, to help account for the effects of birth and/or rearing ranks.

**
****Comments:**

Using all the information available to us, we can get a fairly good estimate of g_{s}
(the animals’ sire’s breeding value) , (not so good an estimate for the dam
because of fewer progeny) but even under truly ideal conditions, we can only guess at the
g_{i }part of an animal’s genotype, and the e_{u} part of it’s
phenotype. This is why EBVs have quite large **standard errors.**

**
**In fact, using the formula 1} above, we can prove that, in a randomly mating
population (where genetic variation is maintained), the genetic variation due to the
random assortment of genes at conception is exactly one-half of the total genetic
variation in the population: