This file describes the popsim software, a simple population
simulator. Popsim simulates a population of single-chromosome diploid
individuals that grows at a constant rate with a simple sibling-
avoiding mating model, and outputs the final distribution of initial
alleles at a given number of marker loci in the final population or in
a sample thereof.

It was used to perform one of the experiments described in
the article

  Elina Salmela, Olli Taskinen, Jouni K. Seppnen, Pertti Sistonen,
  Mark J. Daly, Pivi Lahermo, Marja-Liisa Savontaus, and Juha Kere. 
  Subpopulation difference scanning: a strategy for exclusion mapping
  of susceptibility genes. Journal of Medical Genetics. Published
  Online First: 27 January 2006. doi:10.1136/jmg.2005.038414

The software is licensed under the GNU General Public License; see the
file COPYING. If you make use of the software when composing a
scientific publication, you should cite this article; this is not a
requirement of the license but one of ethical scientific conduct. You
can find the article (sadly, it is not open-access but requires your
institution to be a subscriber) or check the publication information
by following the link http://dx.doi.org/10.1136/jmg.2005.038414

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.


COMPILING
=========

To compile the software, first make sure that the CFLAGS in the
Makefile are appropriate for your system, then type 

  make deps && make

If the compilation fails, make sure you have GNU Scientific Library
properly installed.

RUNNING
=======

To run the software, first create a configuration file (e.g. my.conf)
as described below and then give it as the only argument to the
program:

  ./popsim my.conf

The program simulates a population of single-chromosome individuals. 
At each generation, some number of non-siblings mate to produce some
number of offspring; the number of offspring is decided by a fixed
growth percentage. At the end of the simulation, a histogram is output
describing how the alleles of the initial ("founder") population are
distributed in the final population. There is a number of markers
uniformly spread in the chromosome, and the output consists of one
line for each marker. Each line consists of 2n whitespace-separated
numbers, where n is the size of the initial population. Each set of
two numbers corresponds to the two parallel alleles of one founder
individual, and the numbers record how many individuals in the final
population carry that allele.

CONFIGURATION FILE SYNTAX
=========================

A sample configuration file is given as sample.conf. The configuration
file must include a line like

  15 markers

that specifies the number of markers to follow; the markers are
distributed uniformly across the chromosome. A larger number of
markers gives more precise results but requires more memory and
processing time. The file must also include a line specifying the
population growth. Possible specifications include:

  population from 100 to 100000 in 50 generations
  population from 100 by 5 % per generation for 5 generations

The first line specifies constant exponential growth from the first
value to the second one in a fixed number of generations; the growth
percentage is computed from this information. The second specification
means to start from the given population size, use the given growth
percentage for every generation and run for the given number of
generations. A third possible specification is

  a generation has 22 years
  population from 100 by 5 % per year for 5 generations

in which the growth percentage is per-year, not per-generation. Then
the number of generations must also be given.

Finally, the configuration file should include some number of output
specifications. Possible output specifications include

  output full histogram to 'foo.out'
  output sampled(100) histogram to 'foo-100.out'

In the first case, the output histogram is based on all individuals in
the final population. In the second case, a random sample of the given
number of non-siblings is taken as the basis of the histogram.

In addition, it is possible to give the seed for the pseudo-random
number generator like

  rng seed 12345678

This allows for replication of experiments without storing the
(possibly large) results.

Note: the words "a the has from to in for per by" do not have any
significance in the configuration syntax and may be used if the user
wants.

