LINKAGE
The LINKAGE program was developed by The Laboratory of Statistical Genetics at The Rockefeller University who provide a User Guide which includes a full description of the input file formats. In developing this package of genetic analysis programs it was decided not to invent a new set of file formats for input and output, but to use an existing standard format. Although LINKAGE format is somewhat cumbersome and has some redundancy it is nonetheless quite a rich format that is established and well documented. Moreover, many groups that are analyzing genetic data will have programs to generate LINKAGE format files automatically, and the Mega2 genetic data translation program has LINKAGE as one of its standard formats.
There is a good deal of information in the input files that our programs will ignore, and there are several features that have not yet been implemented. The input routines for the complete package have now been rewritten to check the data more thoroughly and to insert sensible default values for for missing or misspecified inputs where appropriate. CheckFormat is a stand alone program that simply reads in, checks and outputs LINKAGE data files that can be used to debug the input.
The following notes will give and indication of which features of the LINKAGE format are currently used and implemented, and which ignored. We assume that the reader is reasonably familiar with the general format.
- For example LINKAGE parameter and pedigree files see the case study on Error checking.
- The format that is used is old fashioned or "post-makeped" format not the more concise "pre-makeped" format. You can translate from pre to post makeped using CheckFormat as follows
- % java CheckFormat pre.par pre.ped post.par post.ped -pre
- Pre-makeped format differs from post-makeped format only in the pedigree input file. For each line of input the pedigree data is specified only as an offspring, father, mother triplet with no fields for the other three relationship pointers. Also there is no proband status indicator. Other than that there is no difference.
- Only the affection status , quantitative variables and numbered allele locus formats are implemented. The binary factors locus format is not currenly implemented.
- For quantitative variables a missing or unobserved value is coded as a 0.0 . This is not a particularly good convention as is does not allow a zero observation to be specified. More explicitly the program considers an observation with absolute value less than 0.0000001 to indicate a missing value. So if there is a true observed value smaller than this it has to be increased or decreased slightly. This should not affect the computations too drastically, unless of course the trait has mean zero and very small variance in which case some shifting or rescaling is necessary.
- If a multivariate quantitative trait has any element unobserved for a particular individual, then the whole observation for that locus for that individual is ignored.
- The only information actually used from the parameter input file are the number of loci, the locus by locus parameter information and the distances between the loci.
- The order of the loci in the data is assumed to the order in which the appear in the file. Thus any information on the third line of the input file is ignored. The other information is read and copied but not used.
- The distances between loci need to be recombination fractions. If a distance of greater than 0.5 is read, then the program assumes that the distances are specified as centi Morgans and these are converted to recombination fractions using the Kosambi transformation. A warning that this is being done is printed. Note that this is not a good way to code your data and you should use recombination fractions rather than rely on an unreliable value based inference.
- All programs in this package allow "half observed" genotypes when using numbered allele loci. For example 0 3 would mean that one allele is a 3 but the other is unobserved. This is mostly used to fix up detected genotype errors.
- Some programs in the package will allow specification of only 1 parent for and individual while others require that there are either 0 or 2 specified. A warning is printed if this occurs.
- The first child, father's next child and mother's next child pointers in the pedigree file are ignored and can safely be replaced by zeros, or any other string not including white space.
- If an affection status locus has only 1 liability class, the phenotype for an individual is specified by a single digit representing the affection status only.
![[JPSGCS]](/wiki/images/Logo.jpg)