People's Democracy(Weekly Organ of the Communist Party of India (Marxist) |
Vol. XXXVI
No. 46 November 18, 2012 |
History,
Genetics and Statistics Prabir
Purkayastha THE
study of the past has never been an easy exercise. We
earlier had two sources
of data textual and archaeological. All this had to be
fitted into patterns
for a coherent and consistent view of the past this was
the core enterprise
of history. Of course, this still leaves open the question
of how to look at
these patterns the framework of what constitutes
history. Is it an account of
various rulers and dates or an account of the people, how
they lives and what
they did? Was it looking at what caused changes in
society? All this brought to
the fore that history was not just simply a value-free
exercise of exploring
our past but also how you view society today and
importantly, what kind of
society you want to build.
DISCOMFORT WITH OBSCURE & INCOMPREHENSIBLE TOOLS In
this contested territory of history, we now have new tools
that have made their
entry. Increasingly, genetics is being harnessed to
analyse human populations;
ancient and not so ancient migration patterns are sought
to be teased out. This
brings in the second set of tools required to analyse
human population and its
genetic data statistical tools. For the historians who
have spent a lifetime
understanding history based on archaeological, textual,
historical linguistic
evidence, an influx of these tools, some times presenting
a picture at variance
with the archaeological and other evidence, leads them to
discard the tools
altogether. Some of the most distinguished historians in It
is important here to understand what the tools can do and
what they cannot do.
It is also important for those who do not understand these
tools that the tools
by themselves cannot conclusively provide a narrative of
the past. Any such set
of tools can give multiple possibilities. Which one
represents the past needs
additional corroborative evidence and it is only with such
evidence that we can
come to some tentative conclusions. As
we are aware, we all carry within us a genetic code. There
are four chemical
compounds, called bases --- Adenine, Cytosine, Guanine,
and Thymine ---
generally referred to by the alphabets A, C, G and T. Each
of these bases can
line up with another in pairs (A can only pair with T and
C only with G) to
form a string of base pairs or gene sequences,
constituting the genetic code. There are over
three billion such letters in
the human genetic code. Though humans are 99.9 per cent
identical to each
other, but in a genetic code of three billion letters,
even a tenth of a
percent of a difference translates into three million
changes in spellings.
It is to these differences in the genetic code that we
look at in mapping the
human population. If
we look at the differences between the genetic sequences
of a group of people
or of people, we can see what the differences are, and if
we know the rate of
change in the DNA sequences per generation and the number
of years per
generation, we can then trace back when they had common
ancestors. The
DNA variations that we inherit are of three types. One is
through the DNA
sequences that are inherited as two copies, one from each
parent. These are
called autosomal DNA sequences. A second DNA inheritance
is through the DNA
sequences in the Y chromosome, which are inherited from
father to son and
represent a record of purely paternal inheritance. The
third type of DNA inheritance
that can be traced is in the DNA sequences of the
mitochondria, which carry
their own independent DNA sequences and are inherited only
from the mother.
Therefore, with such genetic studies we can also analyse
the differences between
the paternal and the maternal population. CRUCIAL QUESTIONS The
population studies have addressed the following questions: 1)
Did agricultural spread through cultural transmission of
the hunter gatherers
taking to agriculture from the agriculturists or through
demographic expansion
of the agriculturalists? 2)
What are the migrations that took place in the past? These
have been addressed not only for One
such question is: When did the human population come out
of One
of the major questions have been how did agriculture
spread? Was it demographic
expansion (demic expansion) agriculturists extending
agriculture and
expanding their numbers or did the hunter gatherers take
up agriculture as they
came in contact with the agriculturists? I
am not going to suggest here that the debate is settled in
favour of demic
diffusion models, even though there is increasing evidence
to support it. I
will focus instead on what the tools are and why such
tools may not be able to
distinguish between a scenario on which people migrated
and the genes migrated.
One
of the methods used in such studies is to identify the
sets of gene codings
that are different what are called single nucleotide
polymorphisms (SNP's).
If we plot these SNPs on a geographical map and look at
variations across populations
and space, we will see that there are variations which are
larger in one
direction than in others. Finding such axis of variations
is called Principal
Component Analysis. The largest direction of such
variation can then be thought
of as a migration path a set of people migrating along
this direction. On a
map of SUPPORTING EVIDENCE NEEDED FOR DEFINITIVE CONCLUSIONS Cavalli-Sforza
and his colleagues have postulated that this is what
happened and the genetic
variation along the major axis of such variations is a
record of this demic
diffusion. The problem here is: Even if we assume that
there is a small amount
of gene flows between local populations, how different
would the population
genetic map be from people migrating? In other words, is
it possible that a
certain amount of local gene flows combined with cultural
transmission of
agriculture would still provide very similar population
genetic maps to we would
get for migration of agriculturalists? It
is here that the statistical tools must be used with a
great degree of caution.
There are two sets of tools that are used one is to take
the current set of
data that we have and then try and find a statistical
model that would best fit
the data. The other is to simulate with different sets of
ancient populations,
provide some kind of mixing and then work out which of
these combinations and
mixing approximate what we see on the ground. With
computers, obviously much of
these is done by algorithms and modelling tools and the
researcher may not have
a good feel for what is happening in these number
crunching exercises. Those
familiar with such tools know that we can get a good model
out of our data but
such models are not unique. Varying certain parameters,
using a different
algorithm etc, may get us a different model. In such a
scenario, it is
imperative that supporting evidence must be used in order
to come to any
definitive conclusions. Such models may therefore be
artefacts of our
calculation methods and not real. However,
one definite conclusion can be arrived at when we look at
the genetic data.
Farming spread from the Fertile Crescent to Europe and
South Asia and then
further to East Asia and When
we come to South Asia, we again find that there is a
significant difference
between north STATISTICAL MODELS IN HISTORICAL LINGUISTICS Such
statistical models are not restricted to human population
studies; they are
also being used in historical linguistics. In linguistics,
changes in
vocabulary can be used in a similar way as genetic drift
to map out possible
dates when language families have split. A recent study (Mapping the Origins
and Expansion of the
Indo-European Language Family, by Remco
Bouckert and others, Science, August
24, 2012) has done such
an analysis of languages to try and map out when language
groups split and
whether we can work out a migration of language map from
such an exercise. This
exercise shows that the probable origin of Indo-European
group of languages to
be Anatolia, with the India Iranian group breaking off
from the larger
Indo-European family around 4,000-6,000 years back. If
we take all this evidence together, along with the larger
archaeological
evidence, it is clear that the Indo-European language
family in the form of
Vedic Sanskrit did not enter Romila
Thapar had postulated that the speakers of Vedic
Sanskrit were not large in
number and the language spread is due to elite
domination. A set of such
speakers came, had the use of iron and used horses and
chariots and were able
to establish themselves at the top of the existing
hierarchy in Again,
there is genetic evidence that caste groups in Finally,
those who are living in the past and would like to
postulate that there has
been no invasion of Vedic Sanskrit speakers from
outside, they have to contend
with the even more daunting task of then proving that
Indo-European speakers
originated in India and find genetic and historical
linguistic evidence for
that. Apart from archaeological evidence, there is no
such evidence and nit
picking on the detailed picture being built by
archaeology, historical
linguistics, written texts, genetics and statistical
tools will not get them
their Aryan homeland in