can not find and count text with hypen in the name using awk -


using below awk seem returning incorrect count. ids (input) - in name not found though in file searched. not sure not right in command. thank :).

input

sept12 sept5-gp1bb sept9 hla-drb1 hla-drb5 

file

chr16 4837470 4837656 sept12 chr16 4837536 4837656 sept12 chr22 19711038 19711157 sept5-gp1bb chr22 19711038 19711157 sept5-gp1bb chr22 19711366 19711997 sept5-gp1bb chr22 19711367 19711997 sept5-gp1bb chr22 19711367 19711997 sept5-gp1bb chr17 75398130 75398795 sept9 chr17 75471590 75471995 sept9 chr17 75478215 75478427 sept9 chr6 32487136 32487438 hla-drb1 chr6 32489671 32489961 hla-drb1 chr6 32551875 32552165 hla-drb5 

current output

2 ids found sept5-gp1bb missing hla-drb1 missing hla-drb5 missing 

desired output

 5 ids found  

awk (missing.awk)

begin { fs="[[:space:]]+|-" } nr == fnr { seen[$0]; next } $4 in seen { found[$4]; delete seen[$4] } end { print length(found) " ids found"   (i in seen) print " missing" }  awk -f missing.awk input file > out 

try this:

awk '     nr==fnr { lookup[$0]++; next }     ($4 in lookup) { seen[$4]++ }      end {       print length(seen)" ids found";        (id in seen) delete lookup[id];        (id in lookup) print id " missing" }' input file 

Comments

Popular posts from this blog

html - Styling progress bar with inline style -

java - Oracle Sql developer error: could not install some modules -

How to use autoclose brackets in Jupyter notebook? -