windows - Why would redirection work where piping fails? -
in theory, these 2 command-lines should equivalent:
1
type tmp.txt | test.exe
2
test.exe < tmp.txt
i have process involving #1 that, many years, worked fine; @ point within last year, started compile program newer version of visual studio, , fails due malformed input (see below). #2 succeeds (no exception , see expected output). why #2 succeed #1 fails?
i've been able reduce test.exe program below. our input file has 1 tab per line , uniformly uses cr/lf line endings. program should never write stderr:
#include <iostream> #include <string> int __cdecl main(int argc, char** argv) { std::istream* pis = &std::cin; std::string line; int lines = 0; while (!(pis->eof())) { if (!std::getline(*pis, line)) { break; } const char* pline = line.c_str(); int tabs = 0; while (pline) { pline = strchr(pline, '\t'); if (pline) { // move past tab pline++; tabs++; } } if (tabs > 1) { std::cerr << "we lost linebreak after " << lines << " lines.\n"; lines = -1; } lines++; } return 0; }
when run via #1, following output, same numbers every time (in each case, it's because getline has returned 2 concatenated lines no intervening linebreak); when run via #2, there's (correctly) no output:
we lost linebreak after 8977 lines. lost linebreak after 1468 lines. lost linebreak after 20985 lines. lost linebreak after 6982 lines. lost linebreak after 1150 lines. lost linebreak after 276 lines. lost linebreak after 12076 lines. lost linebreak after 2072 lines. lost linebreak after 4576 lines. lost linebreak after 401 lines. lost linebreak after 6428 lines. lost linebreak after 7228 lines. lost linebreak after 931 lines. lost linebreak after 1240 lines. lost linebreak after 2432 lines. lost linebreak after 553 lines. lost linebreak after 6550 lines. lost linebreak after 1591 lines. lost linebreak after 55 lines. lost linebreak after 2428 lines. lost linebreak after 1475 lines. lost linebreak after 3866 lines. lost linebreak after 3000 lines.
this turns out known issue:
the bug in fact in lower-level _read function, stdio library functions (including both fread , fgets) use read file descriptor.
the bug in _read follows: if…
- you reading text mode pipe,
- you call _read read n bytes,
- _read reads n bytes, and
- the last byte read carriage return (cr) character,
then _read function complete read return n-1 instead of n. cr or lf character @ end of result buffer not counted in return value.
in specific issue reported in bug, fread calls _read fill stream buffer. _read reports filled n-1 bytes of buffer , final cr or lf character lost.
the bug fundamentally timing-sensitive because whether _read can read n bytes pipe depends on how data has been written pipe. changing buffer size or changing when buffer flushed may reduce likelihood of problem, won’t work around problem in 100% of cases.
there several possible workarounds:
- use binary pipe , text mode crlf => lf translation manually on reader side. not particularly difficult (scan buffer crlf pairs; replace them single lf).
- call readfile _osfhnd(fh), bypassing crt’s i/o library on reader side entirely (though require manual text mode translation, since os won’t text mode translation you)
we have fixed bug next update universal crt. note universal crt operating system component , serviced independently visual c++ libraries. next update universal crt around same timeframe windows 10 anniversary update summer.
Comments
Post a Comment