Bioinformatics

HAPMAP & HAPLOVIEW

ASSIGNMENT # 2

 Question

 

1. What is the name of haploview format to use in this analysis?

 Answer: HAPMAP format

 

 2. Please show us the marker and individual quality control of the genotype data use in the analysis?

Answer: To show the markers, we have to first immport file of our query and then we will get the following table as shown below:

 

 and, the following window; for the parameters to assign:

 a. HW p-value cut off

 b. % genotype

 c. Maximum Mendelian error

 d. and, Minimum minor allele frequency

 

Now, in this window we can filter the threshold by: Clicking on the 'RESTORE' to refilter marker using new values. Threshold can be reset by pressing reset buttons. And, Markers can be selected and deselected by clicking manually on the rating check box. 

In my case, they are all checked. 

NOTE: IF ANY MARKER FAILS TO PASS ALL THE TEST; THEY ARE HIGHLIGHTED IN RED.

 

 

 

 

 Now, as shown in the picture above. We can first open the Hapmap bench work from the program files in our computer. 

 

Then, we have to follow the steps in the following way:

 

Picture 2. Picture showing "How to import the file of our query for LD plot and other information."

 

A) Click on the "HapMap Format"button and then import the data from the computer where you have saved. In my case, I have first copied all the query information in the "Notepad" program and then have saved in my computer desktop.

B) Then, press the "OK" button in the same window as you can see down in the above picture. 

C) Then, you will come up with the following window under "CHECK MARKER" tab. 

D) I have marked all the tabs and the places in the first picture of this assignment. So, the first big window containing the table under "CHECK MARKER" is the answer to the first part of second of the 2nd. no question in second assignment. 

E) For the second part of the second question; click on the "ADVANCE VIEW" button on the top middle of the windows showing table. 

F) We will then come up with a new window -- WINDOW 3. As, shown in the first figure of this second assignment. 

G) Now, click on the upper button of this new small window. i.e. button named no. 4

H) Then, we will get a new window -- "SUMMARY WINDOW". i.e Named as 'WINDOW NO. 5'. 

i) This new opened window contains: "FAMILY ID", "INDIVIDUAL ID", AND "PERCENTAGE GENOME". 

THIS IS THE ANSWER TO THE SECOND PART OF SECOND QUESTION OF SECOND ASSIGNMENT. 

 

We can also export this window information to a file of our desired format. And, we can view the information in the "MICROSOFT EXCEL" file easily. 

 

3. Please show us the LD map then explain what do you get from the LD map?

Answer:

  • LD (linkage disequilibrium): For a pair of SNP alleles, it’s a measure of deviation from random association (which assumes no recombination). Measured by D’, r2, LOD.


Before we should know what we get from the LD plot we have created from the provided data, it is necessary to know what is LIKAGE DISEQUILIBRIUM.

According to the simple definition; LD is In population genetics, linkage disequilibrium is the non-random association of alleles at two or more loci, not necessarily on the same chromosome. It is not the same as linkage, which describes the association of two or more loci on a chromosome with limited recombination between them. Linkage disequilibrium describes a situation in which some combinations of alleles or genetic markers occur more or less frequently in a population than would be expected from a random formation of haplotypes from alleles based on their frequencies. Non-random associations between polymorphisms at different loci are measured by the degree of linkage disequilibrium (LD).

An example is the prevalence of two rare diseases in Finland: there, compared to elsewhere in Europe, cystic fibrosis is less prevalent but congenital chloride diarrhea is more prevalent. Both diseases are due to mutations on chromosome 7, in adjacent genes.

The level of linkage disequilibrium is influenced by a number of factors including genetic linkage, the rate of recombination, the rate of mutation, genetic drift, non-random mating, and population structure. For example, some organisms (such as bacteria) may show linkage disequilibrium because they reproduce asexually and there is no recombination to break down the linkage disequilibrium.


LD PLOT; the block is created if 95% of informative (i.e. non-inconclusive) comparisons are "strong LD". This method by default ignores marker with MAF < 0.05 (i.e. Minimum Allele Frequency).

The MAF cutoff and the confidence bound cutoffs can be edited by choosing "Customize Block Definitions" in "ANALYSIS MENU".

 

 NOTE: We can get the below picture when we click on the LD PLOT tab in the same program after we import file of our query.

Picture 3. Picture showing first two blocks of LD plot.

 

 

 Picture 4. Picture showing third block of LD plot.

 

 From LD map, we get following things:

1. Different color scheme for identification of 'CONFIDENCE INTERVAL', r square, D' and gamet. It can be analyzed in the following way:

 

FOR R SQUARE

 When r square=0 → WHITE

 When 0<r square<1 → SHADES OF GREY

 When r square = 1 → Black

 

FOR D' AND LOD COLOR SCHEME AND INTERPRETATION

 

 

Low D’

High D’

Low LOD

WHITE

SHADE OF PINK

High LOD

WHITE

BLACK

 Beside this we can also get the following information in following ways:

 1.  Right clicking on marker number (or equivalent space in the zoomed out views) shows the marker name, minor allele frequency and any additional notes specified in the info file. This can be specially helpful in the zoomed out views which do not display marker names. The last such piece of popup information clicked will be shown at the top of the LD plot. This reminder can be dismissed by left clicking anywhere on the LD plot. 

 2. Right clicking on the pairwise LD comparison will show a more detailed summary of the LD between the two markers in question. This information is also shown at the top of the screen as described above and can be dismissed by left clicking anywhere on the LD plot. 

From LD plot we got three blocks and its haplotypes. The three blocks are of 18kb, 80kb, and 48kb respectively. And, the information on rs and MAF (MINIMUM ALLELE FREQUENCY). 

 

4. How many haplotype blocks in this region of Chromosome X, then explain how to interpret them?

 Answer: There are three haplotype blocks in my case in this region of Chromosome X. 

For this process, we first have to import the file of our query and then we will get the following window:

 

Picture 1

 

After this we have to click on the tab named 'HAPLOTYPES' and then we will come up with the following window:

Picture 2

 

and the following parameter down in the same window:P

 Picture 3

 

Now, these blocks are editable. i.e we can edit these blocks depending on the parameters of our interest. 

 As we can see above, I have got the haplotype blocks in letters. We can also view it in numbers and colored square in the following way, for more clear understanding of our data. In case of number options all these A,G,C,T are assigned an individual number. 

 and,

 

Picture 4

 We can view these haplotypes by selecting the BLOCKS and then view haplotyes for selected blocks by clicking on the Haplotypes tab or selecting "Haplotypes" from display menu. The haplotype display shows each haplotye in a block with its population frequency and connections form one block to the next. In the crossing areas, a value of multiallelic D' is shown. This represents the level of recombination between the two blocks. NOte that the value of multiallelic D' is computer for only the haplotypes (alleles) CURRENTLY DISPLAYED. This usually does not have a strong effect, as the rare haplotypes contribute only slightly to the overall value. 

 

The display can be edited using the controls at the bottom of the screen to display only more common haplotypes or to adjust the connecting lines. By default, alleles are displayed using A,C,G, T along with the special symbol 'X' which represents a fairly rare situation in which only one allele is unambiguously observed in phased data. The 'X' represents the allele of unknown identify. The display can also be changed to show the alleles numerically from 1-4 with 8 being the equivalent of 'X', or as blue and red boxes with blue being the major allele and red the minor. 

 

FOR TAGGING SNPs

 After we get the picture 4, we can now tag the SNPs by following ways:

 

 We will then get the following window:

 

 In the above picture, the rectangular block surrounding the down arrows signs are the tagged one. 

 

5.  Could you find out the tagging SNP in each haplotype block, then explain what the tagging SNPs?

Answer: Yes, I can find the tagging SNPs in the haplotype block, which is shown below:

 

A tag SNP is a representative single nucleotide polymorphism (SNP) in a region of the genome with high linkage disequilibrium (the non-random association of alleles at two or more loci). It is possible to identify genetic variation without genotyping every SNP in a chromosomal region. Tag SNPs are useful in whole-genome SNP association studies in which hundreds of thousands of SNPs across the entire genome are genotyped. For this reason, the International HapMap Project hopes to use tag SNPs to discover genes responsible for certain disorders.

 

When SNPs are in LD with each other and form haplotyeps there is redundent information contained within the haplotype. By knowing the marker at one locus you can make a prediction about the marker that will occur at the linked loci nearby. The accuracy with which you can make this prediction is dependent upon the strength of linkage disequilibrium between the loci and the allele freqeuncies.

 

Block based tagging

Block based tagging requires that hapltype "blocks" first be infered. In the majority of cases when you are investigating assocition within a candidate gene you are likely to start of with a large number of potential SNPs to choose from, and using various measures of linkage disequilibrium and inferred haplotypes it is possible to define 'haplotype blocks' of markers that are in strong LD with each other, but not with those in other blocks. The exact definition of a haplotype block is open to interpretation, and there are a number of different methods for choosing your haplotype blocks (Gabriel et al 2002, )

 

 

OR WE CAN ALSO USE THE METHOD USING THE TAGGER, SINCE by using the above method we will not get all the SNPs tagged. Because they are default. So, we use tagger to manually select the individual SNPs.

NOW, USING TAGGER WE CAN DO TAGGING IN THE FOLLOWING WAY.

 

1. First we have to remember the no. of the first block. In my case, it is 8 and 9. 

 So, i first unchecked all the boxes in the tagger under configuration and then I clicked on the two numbers in under 'capture this allele type?' tab.

And, then we have to run the tagger, we will come up with the following window:

 Then, in the similar way we have to remember the numbers of different blocks and then select the alleles of that block and have to run the tagger. 

We will come up with the following results for block second and third:

 

 

and,

 So, from the above picture we can see that there is one SNPs in 1 test captured 2 of 2. I.e we have selected two and we get 1 SNPs. 

 

In the same we can get for the second block and third block:

 

 and,

 for the THIRD BLOCK

 

 

 

So, in the third block we have one SNP out of 6 selected alleles. 

 

SO THESE COMPLETES THE ANSWERS FOR THE FIVE QUESTIONS FROM 2ND. SECOND ASSIGNMENT.

 

_____________THANK YOU____________

Featured Products

No featured products