Merge ~freddie-akeroyd/epics-base:msi_close_file into ~epics-core/epics-base/+git/epics-base:3.15

Proposed by Freddie Akeroyd
Status: Merged
Merge reported by: mdavidsaver
Merged at revision: f537096b8f405392fe54208bcaf4f78c93911b0b
Proposed branch: ~freddie-akeroyd/epics-base:msi_close_file
Merge into: ~epics-core/epics-base/+git/epics-base:3.15
Diff against target: 12 lines (+1/-0)
1 file modified
src/ioc/dbtemplate/msi.cpp (+1/-0)
Reviewer Review Type Date Requested Status
Andrew Johnson Approve
Martin Konrad (community) Approve
Review via email: mp+369763@code.launchpad.net

Commit message

Fix for truncation of MSI output on WIN32 as described in LP: #1835525

Description of the change

On WIN32 the reopened stdout file stream needs to be closed on program exit otherwise the output file is occasionally truncated.

To post a comment you must log in.
Revision history for this message
mdavidsaver (mdavidsaver) wrote :

This does match the MS documentation, though I'm not clear what is going on here. Does freopen() really return a different non-NULL pointer? Does "fprint(stdout, ..." still work afterwards?

https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/freopen-wfreopen?view=vs-2019

Despite my confusion, I don't think this would have any adverse effects if "outFP==stdout". The bug I can imagine would be somehow closing stdout twice on exit, which probably wouldn't be noticed.

Revision history for this message
Martin Konrad (info-martin-konrad) wrote :

Looks good to me. A quick test didn't reveal any problems on Linux.

@Michael: I looked at https://en.cppreference.com/w/c/io/freopen and http://www.cplusplus.com/reference/cstdio/freopen/ and I'm still a little confused. I don't think this change introduces any _new_ issues, though.

review: Approve
Revision history for this message
Freddie Akeroyd (freddie-akeroyd) wrote :

As you indicate, the standard docs for fp = freopen(stdout) have flcose(stdout) whereas the MS documentation has fclose(fp). On my Windows machine it looked like (fp == stdout) anyway but i don't know if that is guaranteed to always be the case, however the linux docs for freopen(stdout) say it should return stdout (or NULL on error) so hopefully fclose(fp) is thus safe everywhere.

@mdavidsaver I also found the MS example a little strange, it did stream=freopen(stderr) and then wrote to stream rather than stderr, which seemed to defeat the point of using freopen() rather than fopen()

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

I wonder if a final "fflush(stdout)" would work as well. I certainly know that the flushing behavior on windows differs from *NIX.

Revision history for this message
Freddie Akeroyd (freddie-akeroyd) wrote :

I think a final fflush could work too, i can check that if it would be preferred over closing stdout. All the freopen() examples seemed to use fclose(), but the internet is a bit divided over whether you should always fclose(stdout) at end of main as there could be an atexit() handler that wrote to stdout for example. I suppose fclose(stdout) could be registered via atexit as the last exit handler, but that may be overkill for the msi program

Revision history for this message
Andrew Johnson (anj) wrote :

However you decide the fix should be written, this resolves a problem that I had to work around in the src\ioc\dbtemplate\test\msi.plt test script, which detects an empty output file and retries running msi up to 5 times. Those retry loops can be removed once this fix has been merged (I can do that though, you don't have to do that here) -- thanks Freddie!

review: Approve
Revision history for this message
Freddie Akeroyd (freddie-akeroyd) wrote :

Hi Andrew, I went back to try using fflush myself and now the machine that failed regularly will not fail at all when reverted! I had come across this issue intermittently before and a re-run usually fixed it, I briefly had this one machine doing it regularly but not anymore :-( Going with fflush(outFP) over fclose(outFP) is fine with me if we have a reproducible test case that it fixes. An fclose() will close the underlying windows handle at that point as well as do a flush, the online windows example seems to do this but i don't know if it is important. Given that we know there are no atexit handles or further things with stdout that msi does it should be safe to use fclose() and that would certainly cover everything on windows.

Revision history for this message
Freddie Akeroyd (freddie-akeroyd) wrote :

I guess we have only covered the case of the -o option here and there is also stdout redirected to a file via the shell. Do we need to fclose / fflush stdout in all circumstances i.e.

    if (outFP != NULL) {
        fclose(outFP);
    } else {
        fclose(stdout);
    }

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

To me, the root issue here is the design of using the global stdout instead of a local (or file scope) FILE* which might point to stdout or a file.

What bothers me is closing 'stdout' when it might later be used. A bug in error handling code is doubly bad.

For this reason I would much prefer fflush(stdout), which I think can reasonably be used in all cases.

Actually I'd prefer changing MSI to not use the 'stdout' global, but that is more than I would ask to fix an issue like this one.

ce4f214... by Freddie Akeroyd

Flush stdout rather than fclose

ea0a500... by Freddie Akeroyd

Only remove file on error if not -D

Revision history for this message
Freddie Akeroyd (freddie-akeroyd) wrote :

Hi Michael, i've changed the original PR to just fflush but I agree that not reopening stdout is a much better approach. That doesn't look to be too hard and I've put something on a separate branch

https://git.launchpad.net/~freddie-akeroyd/epics-base/commit/?id=b4ba2f19baa082ea2733359d22125ad78c6ff0d8

I chose not to make outFP default to stdout due to the dual use of outFile with out_D, so outFP is only non NULL if a real file is opened.

I appreciate this has got larger than the original ticket, so if you prefer we can go with fflush for now and i can put the above branch into a separate ticket.

f537096... by Freddie Akeroyd

Revert "Only remove file on error if not -D"

This reverts commit ea0a5006500139dc5594a9e3d4b2345f4a874e2e.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

> That doesn't look to be too hard ...

I had a similar thought no long after I hit send. I didn't think about out_D though, so the result may not handle that case. (this is change can be disposable anyway)

https://github.com/mdavidsaver/epics-base/commit/d5e595111b1d35f492489ac19c5ca86e6a5c09d7

I also reverted the test re-try that Andrew mentioned, which is 29c069db3dcb78e77c03bb18153eec21456585c4

I'm waiting to see if appveyor CI will turn up any failures (of the msi test). 8 hours to go...

https://ci.appveyor.com/project/mdavidsaver/epics-base/builds/25992994

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

> I appreciate this has got larger than the original ticket ...

If you can confirm that a fflush() by itself fixes the observed truncation, then there is no reason to wait. I can handle the cleanup (squashing) myself.

Revision history for this message
Freddie Akeroyd (freddie-akeroyd) wrote :

Unfortunately the machine I tested on is no longer showing bad behaviour (it was always transient when i had it, but i had a reproducible failures for a while which fclose fixed). So i think fflush would work, but i'm afraid i can't confirm it with an actual test.

With regard to using fopen, if opt_D is specified then outFile is not opened for output but appears to have its name written to stdout, so fclose(outFP) wold then be fclose(stdout) but also abortExit() would do an
unlink(outFile) which didn't feel right, but i wondered if this was the desired behaviour by how it was usually called. The unlink(outFile) on an error is present in the existing code, I pushed and then reverted a test on opt_D as I though I should check if that was intentional with somebody first.

Revision history for this message
Andrew Johnson (anj) wrote :

Freddie, was your machine running Windows 7 or Windows 10? I have just disabled the retries in the unit test code and can't seem to trigger the failures any more (without adding the fflush()), although it may be that running the test program by hand doesn't do it. I probably saw them when I was using Windows 7 though, now I'm running Windows 10 and that might also have fixed the problem.

The opt_D flag is used by the EPICS build system to generate a make dependency rule for the output file. The expected behaviour is supposed to match GCC's -M flag and output the rule to stdout, using the -o argument to provide the name of the target filename to use in the rule. This code was a recent addition, and the author (most likely me) might not have considered the effect of an error that aborts the program. For it to call unlink(outFile) isn't a problem though, it ensures that GNUmake will have to run msi without the -D to (re)generate the output file, which *should* result in the same error and the build stopping.

I'm thinking we should add the flush(stdout), remove the retries from the test program and just wait to see whether we see any more failures.

Revision history for this message
Freddie Akeroyd (freddie-akeroyd) wrote :

Hi Andrew, the machine that was for a while recently repeatably showing the failure was Window 10, previously we have had it appearing randomly on Windows 7 but then it would go away next build or on a manual rerun of msi. I was able to confirm an fclose(stdout) fixed it when it was misbehaving, I suspect that an fflush() is the key part of the fclose() operation so going with fflush(stdout) until we see further problems seems a reasonable approach. I'm not a great fan of freopen() and liked the idea of removing it as Michael had done, but keeping freopen() and using fflush() does make "msi -o outfile" and "msi > outfile" behave the same internally and so could be better for now.

Thanks for clarifying the opt_D behaviour, that looks the right thing to do. It was the fact the program was removing a file that it had not itself generated that made me cautious, but it is a makefile target and if it can't generate the dependencies I guess that means the reliability of any existing target file is in question and so should be removed as is done.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

Committed as 771ad6a44264385d9282b9ca04435973e5266183. I've also reverted the test retry with b89494a840e62953471c7cce7b134849cf750bdd.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/src/ioc/dbtemplate/msi.cpp b/src/ioc/dbtemplate/msi.cpp
2index 49d8611..7062341 100644
3--- a/src/ioc/dbtemplate/msi.cpp
4+++ b/src/ioc/dbtemplate/msi.cpp
5@@ -215,6 +215,7 @@ int main(int argc,char **argv)
6 if (opt_D) {
7 printf("\n");
8 }
9+ fflush(stdout);
10 free(templateName);
11 return opt_V & 2;
12 }

Subscribers

People subscribed via source and target branches