perl - How can I plot p-values for SNPs that are spread across thousands of scaffolds on a single continuous axis? -


i have association mapping derived p-values snps scattered across thousands of scaffolds in non-model organism. plot p-value of each snp on manhattan-style plot. not care order of scaffolds, retain relative order , spacing of snp positions on respective scaffolds. want visualize how many genomic regions associated phenotype. example:

my data looks this:

scaffold    position 1           8967     1           8986     1           9002     1           9025     1           9064     2           60995    2           61091    2           61642    2           61898    2           61921    2           62034    2           62133    2           62202    2           62219    2           62220    3           731894   3           731907   3           731962   3           731999   3           732000   3           732050   3           732076   3           732097 

i write perl code create third column retains distance between snps on same scaffold, while arbitrarily spacing scaffolds number (100 in following example):

scaffold    position    continuous_axis 1           8967        8967 1           8986        8986 1           9002        9002 1           9025        9025 1           9064        9064 2           60995       9164 2           61091       9260 2           61642       9811 2           61898       10067 2           61921       10090 2           62034       10203 2           62133       10302 2           62202       10371 2           62219       10388 2           62220       10389 3           731894      10489 3           731907      10502 3           731962      10557 3           731999      10594 3           732000      10595 3           732050      10645 3           732076      10671 3           732097      10692 

thank might have strategy.

something following should work:

#!/usr/bin/env perl  use strict; use warnings;  use constant scaffold_spacing => 100;  ($last_scaffold, $last_position, $continuous_axis, $found_data);  $input = './input'; open $fh, "<$input"     or die "unable open '$input' reading : $!";  print join( "\t", qw( scaffold position continuous_axis ) ) . "\n"; # output header while (<$fh>) {     next unless m|\d|; # skip non-data lines      ($scaffold, $position) = split /\s+/; # split on whitespace      unless ($found_data++) {         # initialize         $last_scaffold   = $scaffold; # set first data value         $last_position   = $position; # set first data value         $continuous_axis = $position; # start continuous axis @ first position     }      $position_diff = $position - $last_position;     $scaffold_diff = $scaffold - $last_scaffold;      if ($scaffold_diff == 0) {         $continuous_axis += $position_diff;     } else {         $continuous_axis += scaffold_spacing;     }     print join( "\t", $scaffold, $position, $continuous_axis ) . "\n";      # update     $last_scaffold = $scaffold;     $last_position = $position; } 

Comments

Popular posts from this blog

html - Styling progress bar with inline style -

java - Oracle Sql developer error: could not install some modules -

How to use autoclose brackets in Jupyter notebook? -