can not find and count text with hypen in the name using awk -
using below awk
seem returning incorrect count. ids (input
) -
in name not found though in file
searched. not sure not right in command. thank :).
input
sept12 sept5-gp1bb sept9 hla-drb1 hla-drb5
file
chr16 4837470 4837656 sept12 chr16 4837536 4837656 sept12 chr22 19711038 19711157 sept5-gp1bb chr22 19711038 19711157 sept5-gp1bb chr22 19711366 19711997 sept5-gp1bb chr22 19711367 19711997 sept5-gp1bb chr22 19711367 19711997 sept5-gp1bb chr17 75398130 75398795 sept9 chr17 75471590 75471995 sept9 chr17 75478215 75478427 sept9 chr6 32487136 32487438 hla-drb1 chr6 32489671 32489961 hla-drb1 chr6 32551875 32552165 hla-drb5
current output
2 ids found sept5-gp1bb missing hla-drb1 missing hla-drb5 missing
desired output
5 ids found
awk (missing.awk)
begin { fs="[[:space:]]+|-" } nr == fnr { seen[$0]; next } $4 in seen { found[$4]; delete seen[$4] } end { print length(found) " ids found" (i in seen) print " missing" } awk -f missing.awk input file > out
try this:
awk ' nr==fnr { lookup[$0]++; next } ($4 in lookup) { seen[$4]++ } end { print length(seen)" ids found"; (id in seen) delete lookup[id]; (id in lookup) print id " missing" }' input file
Comments
Post a Comment