Welcome to GENEHUNTER-MODSCORE version 2.0.1 * What is new in GENEHUNTER-MODSCORE 2.0.1 A bug has been fixed that occurred with version 2.0 if the analyzed genetic positions reached beyond the end of a chromosome as defined in the genetic map file, or with only one marker when no genetic map file was used. * What is new in GENEHUNTER-MODSCORE 2.0 With GENEHUNTER-MODSCORE 2.0, the usage of sex-specific recombination frequencies has been introduced. In accordance with that, we have devised an intuitive and consistent method to choose the combinations of male and female recombination values between the disease locus and its flanking markers at which the LOD, MOD, or NPL scores should be calculated. Linkage scores are determined along a polygon which is defined by the positions of all markers in the map used, some of which are genotyped and others not. The coordinates of both the genotyped and the ungenotyped markers, which can be taken from a publicly available map, serve as to define the ratio of female and male genetic distances. Because the positions of ungenotyped markers are used as well, the sex ratio of genetic distances is allowed to vary between different segments even if they are located within the same interval of two genotyped markers. By using this method, the positions at which the scores are evaluated follow the biologically meaningful path. Also, by employing this method, we avoid the need to raise the LOD 3 criterion to 3.5, since there is no necessity to vary female and male positions independently of one another, thus avoiding an additional parameter. The usage of sex-specific recombination frequencies, in combination with the method to construct the set of disease loci at which the linkage scores are determined, defines a consistent framework for appropriately using the genetic coordinates of all markers. This results in a better genetic modeling. At the same time, it crucially simplifies the practical handling of the data. This method is implemented in combination with the optional functionality to automatically read in the marker coordinates from a publicly available genetic map. The current version is prepared to read data from the deCODE (1), Duffy (2), Marshfield (3), Nievergelt-Schork (4), or the Rutgers map (5). Together with GENEHUNTER-MODSCORE, we provide four text files (duffy.txt, marshfield.txt, nievergelt.txt, and rutgers.txt) which contain the genetic coordinates of the corresponding maps (with kind permission by David Duffy, Karl Broman, Nicholas Schork, and Tara Matise). For the deCODE map, we cannot provide the corresponding map file with the sex-specific genetic coordinates. However, users can easily create such a file (to be named decode.txt) from the MS-Excel file available at http://www.nature.com/ng/journal/v31/n3/extref/ng917-S13.xls by deleting the first 27 rows as well as the fourth column (phys.loc) from the Excel spreadsheet, find-and-replace all occurrences of " " (space) by "" (empty string) and saving as plain text formatted with tabs or spaces. If Excel represents the decimal point as a comma (",") on your system, as with the German setting, you need to open the resulting file decode.txt with a simple text editor in order to find-and-replace all occurrences of "," (comma) by "." (point). The user can specify the preferred map with the 'read map' command. See the help text for the 'read map' and 'load markers' commands for details. By using this new functionality, the user no longer needs to handle the recombination frequencies manually. It is only necessary to specify the marker names in the usual way in the linkage marker file and to indicate which map to choose. A sample input file for a MOD-score analysis using a predefined genetic map (sample_use_map.in) is provided with the program. The sample run can be executed by typing 'run sample_use_map.in' at the GHM prompt, or by calling 'ghm < sample_use_map.in' from the command shell. Please note that an analysis with sex-specific recombination fractions can also be performed in the context of the affected-sib-pair and QTL analysis capabilities that were introduced with the original GENEHUNTER version 2. Here, the output only includes the sex-averaged genetic positions, i.e., the mean of the male and female positions. The corresponding male and female genetic positions can be obtained from the output of the 'scan pedigrees' command. Because GENEHUNTER-MODSCORE is based on the original GENEHUNTER version 2.1 release 6, it can only handle autosomal or pseudoautosomal loci. With the option to use sex- specific recombination fractions, the pseudoautosomal region for which the genetic maps strongly differ between males and females can now be adequately analyzed by GENEHUNTER-MODSCORE. (1) A. Kong, D.F. Gudbjartsson, J. Sainz, G.M. Jonsdottir, S.A. Gudjonsson, B. Richardsson, B. Sigurdardottir, J. Barnard, B. Hallbeck, G. Masson, A. Shlien, S.T. Palsson, M.L. Frigge, T.E. Thorgeirsson, J.R. Gulcher, and K. Stefansson. "A high-resolution recombination map of the human genome". Nature Genetics 31:241-247 (2002). (2) D. Duffy (2005) http://www2.qimr.edu.au/davidD/Duffy_unifiedmap2005.html (3) K.W. Broman, J.C. Murray, V.C. Sheffield, R.L. White, and J.L. Weber. "Comprehensive human genetic maps: individual and sex-specific recombination". American Journal of Human Genetics 63:861-869 (1998). (4) C.M. Nievergelt, D.W. Smith, J.B. Kohlenberg, and N.J. Schork. "Large-scale integration of human genetic and physical maps". Genome Research 14:1199-1205 (2004). (5) X. Kong, K. Murphy, T. Raj, C. He, P.S. White, and T.C. Matise. "A Combined Linkage-Physical Map of the Human Genome". American Journal of Human Genetics 75:1143-1148 (2004). * What is new in GENEHUNTER-MODSCORE 1.1 With GENEHUNTER-MODSCORE 1.1, the restriction of the disease allele frequency in a MOD-score analysis ('allfreq restriction' command) has been changed so that the user can specify the highest bound ('highest allfreq' command). The default is 'highest allfreq 0.5'. It is also possible to perform a MOD-score analysis without any restriction on the disease allele frequency by specifying 'allfreq restriction off'. Please also see the help texts for the 'allfreq restriction' and 'highest allfreq' commands. In addition, it is now possible to turn 'imprinting' from 'off' to 'on' (but, for technical reasons, not vice versa) within the same run of the program. This allows researchers to perform a MOD-score analysis without imprinting, directly followed by a second MOD-score round that takes imprinting into account, without having to restart the program. * What is GENEHUNTER-MODSCORE GENEHUNTER-MODSCORE is a further extension of GENEHUNTER-IMPRINTING. The program is based on the original GENEHUNTER version 2.1 release 6. GENEHUNTER-MODSCORE allows for a MOD-score analysis, in which parametric LOD scores are maximized over the parameters of the trait model, i.e., the penetrances and disease allele frequency. By this means, the disease-model parameter space is explored in an efficient way, and so researchers do not have to rely on a single trait model when performing a parametric linkage analysis. This can be of great help in the context of genetically complex traits, for which the disease model parameters are usually unknown prior to the analysis. Please note that, because of the additional maximization, MOD scores are inflated when compared to LOD scores that were calculated under a single trait model. Therefore, in the context of a MOD-score analysis, significance criteria for LOD scores cannot be applied without correction. For details regarding this issue, please see the references (Strauch et al. 2000; 2005) mentioned below, as well as the Discussion section of the article by Strauch et al., "How to model a complex trait. 1. General considerations and suggestions", Human Heredity 55:202-210 (2003). The core of GENEHUNTER-MODSCORE is a highly optimized engine for the calculation of the disease-locus likelihood. Here, the same techniques were used as for the optimization of the program GENEHUNTER-TWOLOCUS (Dietter et al., "Efficient two-trait-locus linkage analysis through program optimization and parallelization: application to hypercholesterolemia", European Journal of Human Genetics 12:542-550, 2004). With GENEHUNTER-MODSCORE, the optimizations have led to a speed-up of a factor of almost 6. This is already of benefit in a standard LOD-score analysis, but it is absolutely essential for a maximization over models, which is much more demanding. When a MOD-score calculation should be performed, the user needs to activate the storage of inheritance-vector probabilities with the 'modcalc' command, for which there are two options, 'global' and 'single'. This command must be executed before 'scan pedigrees', and should be complemented by the 'modscore' command which performs the actual maximization over trait models, after 'scan pedigrees'. With 'modcalc global', the maximum of the LOD score over all assumed disease-locus positions along the marker map is determined for each trait model, and this maximum is maximized over different models. When using the 'modcalc single' option, a separate maximization over trait models will be done by the 'modscore' command for each assumed disease-locus position. This yields a MOD score, in conjunction with the penetrances and disease allele frequency of the best-fitting trait model, for every genetic position. The parameters can be regarded as an estimate of the genetic effect at a particular locus. Since a separate round of maximization needs to be done for each genetic position, computation-time demands will be higher with 'modcalc single' than with 'modcalc global'. However, in the case of two disease genes at separate loci on the same chromosome with markedly different trait-model parameters, only the locus with the stronger signal will be identified when using the 'global' option, but probably both of them will be found with 'modcalc single'. Please see the on-line help for details, e.g. by typing 'help modcalc' or 'help modscore' at the prompt, or refer to the PDF or PostScript version of the online help (files ghm.pdf and ghm.ps, respectively). A Perl script, GH_modview, is provided with GENEHUNTER-MODSCORE. It allows for the creation of a Gnuplot graph of the MOD score, displayed by the single family contributions. Further modifications include the following: - With earlier GENEHUNTER versions, non-genotyped individuals with no children are always discarded. However, for a LOD or MOD-score analysis, individuals without marker genotypes but with available trait phenotype help to reconstruct their parents' trait-locus genotypes. Therefore, they do contribute to the LOD or MOD score - in some cases even to a substantial degree. For this reason, with GENEHUNTER-MODSCORE, such persons by default are included ('include untyped on'). If these individuals should nevertheless be excluded from the analysis, e.g. in order to save computation time or for compatibility with older versions, 'include untyped' needs to be turned 'off'. - By default, the pedigree filename is used as the title when scores are plotted as a function of the genetic position in a PostScript graph. With GENEHUNTER-MODSCORE, a different title can be specified with the 'title' command. The imprinting functionality, which is included in GENEHUNTER-MODSCORE, has been adapted from GENEHUNTER-IMPRINTING, and is described in the following section. * What is GENEHUNTER-IMPRINTING GENEHUNTER-IMPRINTING is a modification of the GENEHUNTER software package (version 2.1). It allows the user to perform parametric (LOD or MOD-score) analysis of traits caused by imprinted genes - that is, of traits showing a parent-of-origin effect. By specification of two heterozygote penetrance parameters, paternal and maternal origin of the disease allele can be treated differently in terms of probability of expression of the trait. Therefore, an imprinting disease model includes four penetrances instead of only three. For an analysis with a four-penetrance imprinting model, the command 'imprinting on' needs to be entered at the beginning of a GENEHUNTER-IMPRINTING session. Otherwise, LOD scores are calculated under a standard three-penetrance model, in the same way as with GENEHUNTER. The imprinting extension does not affect NPL-score calculation or other types of analysis newly available with GENEHUNTER version 2 (affected sib pair, QTL, and TDT analyses). Please see the on-line help for details, e.g. by typing 'help load markers' or 'help imprinting' at the prompt. For some of the additional capabilities available since original GENEHUNTER version 2 (affected sib pair and QTL analyses), identity-by-descent probabilities need to be computed and stored for all pairs of relatives in a pedigree during the execution of 'scan pedigrees'. This can consume a lot of time and, hence, take away the computational advantages of version 2.1 over previous versions. Therefore, if only LODs and MODs (with or without imprinting) and NPLs are to be calculated, the above-named i.b.d. calculation should be left turned off ('compute sharing off'). This is the standard setting for the imprinting version, differing from the original v2.1 for which 'compute sharing on' is the default. This is the only point where, for good reasons, GENEHUNTER-IMPRINTING v2.1 is not compatible to the original GENEHUNTER v2.1. Note that this version only allows for an analysis of autosomal loci. In case you want to perform an analysis of the X chromosome with imprinting, please use the xghi executable of GENEHUNTER-IMPRINTING version 1.3. Further modifications include the following: - Family and individual IDs do not need to be integers, but may contain other characters as well. - A new NPL scoring function, 'hom', has been implemented. It is similar to the 'all' scoring function, but uses both alleles of each affected individual at the same time, instead of choosing one out of two alleles. Type 'help score' at the prompt for details. - If NPL analysis is turned off ('analysis lod'), calculation is truly omitted, not just the score report. This accelerates the calculation of LOD scores. - An array of memory (post_pscore/p_score in newcombo.c), which was allocated for every family but never used, has been removed. This allows for an analysis of samples that contain a larger number of pedigrees. * What is GENEHUNTER 2 GENEHUNTER 2 is an extension of the GENEHUNTER software that provides the researcher with a much wider range of analyses for performing linkage and disequilibrium analyses. The backbone of the system is the same as GENEHUNTER - the very rapid extraction of complete multipoint inheritance information from pedigrees of moderate size. This information is then used in exact computation of multipoint LOD scores, non-parametric linkage statistics, and now in a wide range of sibpair analyses (as in MAPMAKER/SIBS) and a new variance components analysis. In addition, several TDT analyses are also available for searching for association/disequlibrium in addition to linkage. As before, quick calculations involving dozens of markers, even in pedigrees with inbreeding and marriage loops, is possible with GENEHUNTER 2. Additionally, the multipoint inheritance information allows the reconstruction of maximum-likelihood haplotypes for all individuals in the pedigree and information content mapping which measures the fraction of the total inheritance information extracted from the marker data. All of these calculations are performed the same user-friendly environment as in GENEHUNTER. * How to Obtain GENEHUNTER-MODSCORE The zipped GENEHUNTER-MODSCORE tar file for Unix, ghm-2.0.1.tar.gz, as well as the MS-Windows zipfile, ghm-2.0.1.zip, can be downloaded from the following website: http://www.staff.uni-marburg.de/~strauchk/software.html * How to Install GENEHUNTER-MODSCORE To extract the GENEHUNTER-MODSCORE files from the zipped Unix tar file, create a subdirectory for the program, move the ghm-2.0.1.tar.gz file into it, and enter the commands: gunzip ghm-2.0.1.tar.gz tar xvpf ghm-2.0.1.tar The directory will now be filled with the following files: Makefile - Unix Makefile for compilation of GENEHUNTER-MODSCORE Makefile.aix - special Makefile for IBM/AIX machines ghm.help - on-line help file - do not edit! ghm.pdf - PDF version of on-line help for printing ghm.ps - PostScript version of on-line help for printing linkloci.dat - sample marker data file linkloci.dat.sxp - as above, with sex-specific recombination fractions linkloci.imp - sample marker data file with an imprinting trait model linkloci.imp.sxp - as above, with sex-specific recombination fractions linkped.pre - sample pedigree data file sample.in - sample GENEHUNTER-MODSCORE input file sample_use_map.in - as above, with usage of a genetic map file INSTALL.ghm - this installation document COPYRIGHT.ghm - copyright and licensing agreement for GENEHUNTER-MODSCORE and GENEHUNTER gh_modview.pl - Perl script to create a Gnuplot graph of the MOD score, displayed by the single family contributions gh_modview.bat - batch file to run gh_modview.pl under MS-Windows duffy.txt - file with genetic coordinates of the Duffy map marshfield.txt - file with genetic coordinates of the Marshfield map nievergelt.txt - file with genetic coordinates of the Nievergelt-Schork map rutgers.txt - file with genetic coordinates of the Rutgers map along with the subdirectories src and ansilib which contain the source code for the program. If the extraction is successful, you can now delete the ghm-2.0.1.tar archive. For Linux (Intel 32 bit), a precompiled executable ghm_linux is provided that can be renamed by "mv ghm_linux ghm". No compilation is necessary. When using MS-Windows, simply extract the files from the ghm-2.0.1.zip archive. Here, an executable ghm.exe is provided, and no compilation is necessary. * Compiling GENEHUNTER-MODSCORE For other systems, you need to compile the code to create an appropriate executable version. One or two simple changes must be made to the Makefile: 1) line 6 (SYS= -D_SYS_OSF) must be replaced with one of the following: SYS= -D_SYS_SUNOS - for compilation on SUN machines SYS= -D_SYS_OSF - for compilation on DEC/Alpha machines, as well as on i386 machines running LINUX SYS= -D_SYS_OSX - for compilation on Mac OSX SYS= -D_SYS_ULTRIX - for compilation on Ultrix systems SYS= -D_SYS_HPUX - for compilation on HP workstations SYS= -D_SYS_AIX - for compilation on IBM/AIX RISC systems SYS= -D_SYS_SOLARIS - for compilation on Solaris machines If your system is not listed among those above, you can probably use one of the above definitions anyway - for example -D_SYS_OSF adheres to the ANSI standard and can be used with almost any ANSI compiler. For IBM/AIX RISC systems, a special Makefile is provided. It can be activated by entering the following commands: mv Makefile Makefile.sav mv Makefile.aix Makefile 2) line 27 in Makefile (CC = gcc) determines the C-compiler to be called and should be replaced if another C-compiler is used on your system (for example, CC = cc) After the Makefile is edited for your system, the Unix command "make" should automatically create the executable "ghm" appropriate for your system. If compilation is unsuccessful, contact your system administrator for assistance first as he or she will be familiar with bringing code to your particular system. If further assistance is required, feel free to contact us for help. * Getting Ready to Run It will often be useful to store your data in a separate subdirectory from the GENEHUNTER-MODSCORE program (for various reasons such as diskspace or to make the program available to multiple users). In such cases, it can be helpful to include the directory containing the GENEHUNTER-MODSCORE program in your Unix path. In addition, to use the on-line help facility and the provided genetic maps, you will need to inform the program where to find the related files. For instance, if the subdirectory you created for the programs is named /home/GHM, you will want to change your .cshrc file as follows: 1. change the line which reads something like: set path=(. /bin /usr/local /usr/bin /etc) to include /home/GHM set path=(. /bin /usr/local /usr/bin /etc /home/GHM) 2. add the following line to the end of the file setenv GHM_DIR /home/GHM When using MS-Windows, select "Start" > "Settings" > "Control Panel" and double-click "System", then select the "Advanced" tab and click on the "Environment Variables" button (at the bottom). Edit the PATH environment variable to include the GENEHUNTER-MODSCORE subdirectory, and add a new variable GHM_DIR that points to the same subdirectory. Now GENEHUNTER-MODSCORE can be called from anywhere on your system by simply typing the executable name ghm. (If you wish to hardwire the GHM_DIR into the code, replace the body of get_code_directory in ansilib/syscode.c with the line: nstrcpy(buf,"",PATH_LENGTH); return(TRUE); and recompile.) With MS-Windows, you may need to add a file gnuplot.bat (that calls wgnuplot.exe) to the Gnuplot\bin subdirectory so that the "gnuplot" command, executed by GH_modview, works properly. This subdirectory should also be included in the Path environment variable. * Using GENEHUNTER-MODSCORE To run the program just type the name of the executable and the shell will start up. To get a quick start in running the commands enter 'help' at the prompt. * How to cite or get more information on GENEHUNTER-MODSCORE K. Strauch. "Parametric linkage analysis with automatic optimization of the disease model parameters". American Journal of Human Genetics 73(Suppl1):A2624 (2003). K. Strauch, R. Fuerst, F. Rueschendorf, C. Windemuth, J. Dietter, A. Flaquer, M.P. Baur, and T.F. Wienker. "Linkage analysis of alcohol dependence using MOD scores". BMC Genetics 6(Suppl1):S162 (2005). J. Dietter, M. Mattheisen, R. Fuerst, F. Rueschendorf, T.F. Wienker, and K. Strauch. "Linkage analysis using sex-specific recombination fractions with GENEHUNTER-MODSCORE". Bioinformatics, in press (2006/2007). * How to cite or get more information on the imprinting analysis option K. Strauch, R. Fimmers, T. Kurz, K.A. Deichmann, T.F. Wienker, and M.P. Baur. "Parametric and Nonparametric Multipoint Linkage Analysis with Imprinting and Two-Locus-Trait Models: Application to Mite Sensitization". American Journal of Human Genetics 66:1945-1957 (June 2000). * How to cite or get more information on GENEHUNTER 2 L. Kruglyak, M.J. Daly, M.P. Reeve-Daly, and E.S. Lander. "Parametric and Nonparametric Linkage Analysis: A Unified Multipoint Approach". American Journal of Human Genetics 58:1347-1363 (June 1996). L. Kruglyak and E.S. Lander. "Faster Multipoint Linkage Analysis Using Fourier Transforms". Journal of Computational Biology 5:1-7 (1998).