Merge lp:~michihenning/unity-api/daemon-strong-exception-guarantee into lp:unity-api

Proposed by Michi Henning
Status: Merged
Approved by: Jussi Pakkanen
Approved revision: 47
Merged at revision: 46
Proposed branch: lp:~michihenning/unity-api/daemon-strong-exception-guarantee
Merge into: lp:unity-api
Diff against target: 427 lines (+149/-77)
4 files modified
CMakeLists.txt (+6/-1)
src/unity/util/internal/DaemonImpl.cpp (+71/-45)
test/gtest/unity/util/Daemon/Daemon_test.cpp (+57/-26)
test/gtest/unity/util/FileIO/FileIO_test.cpp (+15/-5)
To merge this branch: bzr merge lp:~michihenning/unity-api/daemon-strong-exception-guarantee
Reviewer Review Type Date Requested Status
Jussi Pakkanen (community) Approve
PS Jenkins bot (community) continuous-integration Approve
Review via email: mp+164286@code.launchpad.net

Commit message

Made daemonize_me() more robust when called at the file descriptor limit and
when memory is low. Provided strong exception guarantee for a few more corner
cases. Fixed coverage test for this to not report nonsense. (Previous version
had race conditions; depending on timing, coverage report was either OK or
reported covered things as uncovered.) Improved test to make sure SIGHUP is
correctly restored. Fixed build errors in release mode due to unused return values.

Description of the change

Made daemonize_me() more robust when called at the file descriptor limit and
when memory is low. Provided strong exception guarantee for a few more corner
cases. Fixed coverage test for this to not report nonsense. (Previous version
had race conditions; depending on timing, coverage report was either OK or
reported covered things as uncovered.) Improved test to make sure SIGHUP is
correctly restored. Fixed build errors in release mode due to unused return values.

To post a comment you must log in.
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
45. By Michi Henning

Made daemonize_me() more robust when called at the file descriptor limit and
when memory is low. Provided strong exception guarantee for a few more corner
cases. Fixed coverage test for this to not report nonsense. (Previous version
had race conditions; depending on timing, coverage report was either OK or
reported covered things as uncovered.) Improved test to make sure SIGHUP is
correctly restored. Fixed build errors in release mode due to unused return values.

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Jussi Pakkanen (jpakkane) wrote :

If we really want to be thorough, then the file description closing function needs some work. If the system runs out of memory and throws a bad_alloc, then the cleanup procedure should not do any more allocations. This one does:

- call to close (unavoidable, really)
- creation of temporary for swap

If bad_alloc is thrown when the fd vector is empty, the cleanup part releases no memory and thus goes into an eternal loop (probably).

The only time Linux memory allocation ever fails is when trying to alloc more memory in a single allocation than the computer has (or something like that). In this case when we have a few billion elements in the fd vector. The question then becomes is this an issue that we ever need to care about?

Also, in a test there is this:

 int rc __attribute__((unused)) = system("rm -fr empty; >empty");

Shell code in programs is of the devil. Libc should be used instead. Especially since the return value is ignored.

review: Needs Fixing
46. By Michi Henning

Replaced use of system() to create test files and directories with libc file I/O calls.

47. By Michi Henning

Removed recovery if vector throws bad_alloc. If the process is sailing this close to the wind,
it'll probably die in the next millisecond or so anyway, even if we recover here. The added
complexity isn't worth it.

Revision history for this message
Michi Henning (michihenning) wrote :
Download full text (3.2 KiB)

> If we really want to be thorough, then the file description closing function
> needs some work. If the system runs out of memory and throws a bad_alloc, then
> the cleanup procedure should not do any more allocations. This one does:
>
> - call to close (unavoidable, really)
> - creation of temporary for swap

The call to close does not allocate memory, so that's not a problem. The temporary vector is on the stack. If that allocation fails because the stack is full, we won't get bad_alloc. Instead, the kernel will kill the process. The default constructor of vector does not allocate memory either. The initial capacity is zero. (This is not guaranteed by the standard, but I have never seen an implementation that behaves otherwise. Doing otherwise would be monumentally stupid because it would penalize the default construtor with an (expensive) heap allocation.) So, as a recovery strategy, the swap idiom is safe and will work.

> If bad_alloc is thrown when the fd vector is empty, the cleanup part releases
> no memory and thus goes into an eternal loop (probably).

Good catch! This one could be easily fixed by counting the number of descriptors that were closed so far, and re-throwing if we haven't made progress since the last time we tried.

> The only time Linux memory allocation ever fails is when trying to alloc more
> memory in a single allocation than the computer has (or something like that).

It's determined by the virtual address space size: setrlimit(RLIMIT_AS, ...), which affects brk() (which is what doles out memory for malloc). That limit can be quite bit lower than the default.

> In this case when we have a few billion elements in the fd vector.

There could be far fewer, such as only a few a hundred.

> The question then becomes is this an issue that we ever need to care about?

That's the real question. If the process is sailing so close to the wind that it can't store a list of its file descriptors anymore, it'll probably go belly-up within the next few milliseconds anyway, even if we recover at this point.

So, there are two ways to fix this. I can add the test for making progress, which will prevent an infinite loop, or I can simply delete the recovery attempt. Pragmatically, I've done the latter.

> Also, in a test there is this:
>
> int rc __attribute__((unused)) = system("rm -fr empty; >empty");
>
> Shell code in programs is of the devil. Libc should be used instead.
> Especially since the return value is ignored.

Not sure why that's the devil. There is only one thing that command can do, which is what's needed for the test. The unused return value issue comes about when compiling in release mode where we use -Werror.

It's interesting that there is no warning for this when compiling in debug mode. Why is this? I would expect the warning in both release and debug mode, seeing that -Werror just says "stop on warnings", but it shouldn't add warnings of its own.

I've replaced the calls to system() regardless. In general though, there are times when using a shell script is easier than any other option if all I need is to set up a few files for testing. Why disallow this? The situation would be no better if I were to mov...

Read more...

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Jussi Pakkanen (jpakkane) wrote :

Using embedded shell code to do basic file manipulation is basically the same thing as, though slightly less evil than, this:

http://thedailywtf.com/Articles/The-Int-Divide.aspx

Keeping shell scripts in separate files is better in that it gives the functionality name such as setup_test_files.sh and makes it visible in the source tree. Shellcode is also not portable as it won't work on non-Unix platforms (which is probably not a major issue in this particular piece of code).

review: Approve
Revision history for this message
Michi Henning (michihenning) wrote :

> Using embedded shell code to do basic file manipulation is basically the same
> thing as, though slightly less evil than, this:
>
> http://thedailywtf.com/Articles/The-Int-Divide.aspx

I did enjoy reading this anecdote :-) I'm not sure it's a fair comparison though.

> Keeping shell scripts in separate files is better in that it gives the
> functionality name such as setup_test_files.sh and makes it visible in the
> source tree. Shellcode is also not portable as it won't work on non-Unix
> platforms (which is probably not a major issue in this particular piece of
> code)

The shell restriction really isn't an issue for us. I don't think we'll be porting stuff to Windows any time soon.

I like the idea of separate setup scripts, sure. But for many tests, I have to do little things with files and directories. Create a file with a certain content, and second file with some matching content, make a small directory hierarchy, run a test, and tear them down again.

The shell is really good at doing this sort of thing, whereas using C++ streams, libc, or whatever typically sucks in comparison. When you look at the diff for the FileIO test, the embedded system() call was simpler and easier to follow than the equivalent stuff written with libc even though the setup was utterly trivial.

Anyway, I agree with the concern in general: there is a point where embedded calls to system() will become unmaintainable. It's just that I'd rather not institute a blanket ban.

Revision history for this message
Jussi Pakkanen (jpakkane) wrote :

I'm totally cool with using the shell, but there needs to be a very good reason for it. Usually because it provides functionality that is prohibitively expensive to replicate in C/C++. Examples include complex globbing, running shell pipelines gotten from (trusted) third party components and so on.

However "it is simpler for me" is not a good reason because that leads to "it is simpler for me to just recode this thing in Java/Vala/ObjC/Perl/Scheme/Rust/Whatever".

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'CMakeLists.txt'
2--- CMakeLists.txt 2013-05-17 00:13:21 +0000
3+++ CMakeLists.txt 2013-05-20 22:54:25 +0000
4@@ -22,13 +22,18 @@
5 # * Switch build type to coverage (use ccmake or cmake-gui)
6 # * Invoke make, make test, make coverage (or ninja if you use that backend)
7 # * Find html report in subdir coveragereport
8-# * Find xml report feasible for jenkins in coverage.xml
9+# * Find xml report suitable for jenkins in coverage.xml
10 #####################################################################
11 if(cmake_build_type_lower MATCHES coverage)
12 set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} --coverage" )
13 set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} --coverage" )
14 set(CMAKE_MODULE_LINKER_FLAGS "${CMAKE_MODULE_LINKER_FLAGS} --coverage" )
15 set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} --coverage" )
16+
17+ # This allows us to skip the file descriptor closing test in Daemon_test
18+ # when coverage is enabled. (Closing file descriptors
19+ # in the test messes with coverage reporting.)
20+ add_definitions(-DCOVERAGE_ENABLED)
21 endif()
22
23 # Make sure we have all the needed symbols
24
25=== modified file 'src/unity/util/internal/DaemonImpl.cpp'
26--- src/unity/util/internal/DaemonImpl.cpp 2013-05-13 11:39:49 +0000
27+++ src/unity/util/internal/DaemonImpl.cpp 2013-05-20 22:54:25 +0000
28@@ -48,6 +48,11 @@
29 {
30 }
31
32+// LCOV_EXCL_START
33+
34+// This is tested, but only when coverage is disabled, because closing
35+// file descriptors interferes with writing the coverage results.
36+
37 void
38 DaemonImpl::
39 close_fds() noexcept
40@@ -55,6 +60,8 @@
41 close_fds_ = true;
42 }
43
44+// LCOV_EXCL_STOP
45+
46 void
47 DaemonImpl::
48 reset_signals() noexcept
49@@ -87,9 +94,23 @@
50 {
51 // Let's start by changing the working directory because that is the most likely thing to fail. If it does
52 // fail, we have not modified any other properties of the calling process.
53+ // We save the current working dir in case we need to restore it if a fork fails.
54+
55+ internal::ResourcePtr<int, std::function<void(int)>> old_working_dir(
56+ [](int fd)
57+ {
58+ if (fd != -1)
59+ {
60+ int rc __attribute__((unused))
61+ = fchdir(fd);
62+ close(fd);
63+ }
64+ }
65+ );
66
67 if (!working_directory_.empty())
68 {
69+ old_working_dir.reset(open(".", 0)); // Doesn't matter if this fails
70 if (chdir(working_directory_.c_str()) == -1)
71 {
72 ostringstream msg;
73@@ -104,15 +125,19 @@
74 {
75 case -1:
76 {
77+ // Strong exception guarantee: if working dir was changed, the old_working_dir
78+ // destructor will try to restore it. This will work if at least one spare
79+ // descriptor was avalable to start with. (Otherwise, old_working_dir won't
80+ // be set and and we won't restore the previous working directory.)
81 throw SyscallException("fork() failed", errno); // LCOV_EXCL_LINE
82 }
83 case 0:
84 {
85- break; // Child process
86+ break; // Child process
87 }
88 default:
89 {
90- exit(EXIT_SUCCESS); // Parent process, we are done.
91+ exit(EXIT_SUCCESS); // Parent process, we are done.
92 }
93 }
94
95@@ -124,9 +149,10 @@
96
97 // Set the umask if the caller asked for that. No error checking here because umask() cannot fail.
98
99+ mode_t old_umask = 0;
100 if (set_umask_)
101 {
102- umask(umask_);
103+ old_umask = umask(umask_);
104 }
105
106 // We are about to fork a second time, to prevent the process from re-acquiring a control terminal if it
107@@ -147,17 +173,31 @@
108 case -1:
109 {
110 // LCOV_EXCL_START
111+ if (set_umask_)
112+ {
113+ umask(old_umask); // Strong exception guarantee
114+ }
115 sigaction(SIGHUP, &old_action, nullptr); // Strong exception guarantee
116- throw SyscallException("fork() failed", errno); // Better than trying to muddle on despite the problem.
117+
118+ // Strong exception guarantee: if working dir was changed, the old_working_dir
119+ // destructor will try to restore it. This will work if at least one spare
120+ // descriptor was avalable to start with. (Otherwise, old_working_dir won't
121+ // be set and and we won't restore the previous working directory.)
122+ throw SyscallException("fork() failed", errno);
123 // LCOV_EXCL_STOP
124 }
125 case 0:
126 {
127- break; // Child process
128+ if (old_working_dir.has_resource())
129+ {
130+ close(old_working_dir.get()); // Reclaim file descriptor straight away
131+ old_working_dir.release(); // Don't restore previous working dir once we are done
132+ }
133+ break; // Child process
134 }
135 default:
136 {
137- exit(EXIT_SUCCESS); // Parent process, we are done.
138+ exit(EXIT_SUCCESS); // Parent process, we are done.
139 }
140 }
141
142@@ -201,67 +241,53 @@
143 DaemonImpl::
144 close_open_files() noexcept
145 {
146+ // We close the standard file descriptors first. This allows opendir() to work if we need to close
147+ // other files and are at the descriptor limit already.
148+
149+ close(0);
150+ close(1);
151+ close(2);
152+
153+ // LCOV_EXCL_START // Closing file descriptors interferes with coverage reporting
154 if (close_fds_)
155 {
156 // Close all open files. We use /proc to figure out what files are open.
157- // That's more efficient than calling close() potentially tens of thousands of times, once for each possible
158- // descriptor up to the process limit. We close the standard file descriptors as well because we need to
159- // re-open those later anyway.
160+ // That's more efficient than calling close() potentially tens of thousands of times,
161+ // once for each possible descriptor up to the process limit.
162
163 char const* proc_self_fd = "/proc/self/fd";
164
165- internal::ResourcePtr<DIR*, decltype(&closedir)> dir(opendir(proc_self_fd), closedir);
166- if (dir.get() == nullptr)
167+ DIR* dirp;
168+ if ((dirp = opendir(proc_self_fd)) == nullptr)
169 {
170- return; // This should never happen but, for diligence, we check anyway. // LCOV_EXCL_LINE
171+ return; // This should never happen but, for diligence, we handle it anyway.
172 }
173+ internal::ResourcePtr<DIR*, decltype(&closedir)> dir(dirp, closedir);
174
175 vector<int> descriptors; // We collect the file descriptors to close here
176
177- // We use the re-entrant version of readdir().
178-
179- size_t len = offsetof(struct dirent, d_name) + pathconf(proc_self_fd, _PC_NAME_MAX) + 1;
180- unique_ptr<struct dirent, decltype(&free)> entry(reinterpret_cast<struct dirent*>(malloc(len)), free);
181-
182 struct dirent* result_p;
183-
184- readdir_r(dir.get(), entry.get(), &result_p); // Read first entry
185-
186- while (result_p)
187+ while ((result_p = readdir(dir.get())) != nullptr)
188 {
189- if (result_p->d_name[0] != '.') // Ignore . and ..
190+ // Try to treat the file name as a number. If it doesn't look like a number, we are looking at "." or ".." or,
191+ // otherwise, something is seriously wrong because /proc/self/fd is supposed to contain only open file
192+ // descriptor numbers. Rather than giving up in that case, we keep going, closing as many file descriptors as we can.
193+ size_t pos;
194+ int fd = std::stoi(result_p->d_name, &pos);
195+ if (result_p->d_name[pos] == '\0') // The file name did parse as a number
196 {
197- // Try to treat the file name as a number. If it doesn't look like a number, something is seriously
198- // wrong because /proc/self/fd is supposed to contain only open file descriptor numbers. Rather than
199- // giving up in that case, we keep going, closing as many file descriptors as we can.
200-
201- size_t pos;
202- int fd = std::stoi(result_p->d_name, &pos);
203- if (result_p->d_name[pos] == '\0') // The file name did parse as a number
204- {
205- // We can't call close() here because that would modify the directory while we are iterating
206- // over it, which has undefined behavior.
207- descriptors.push_back(fd);
208- }
209+ // We can't call close() here because that would modify the directory while we are iterating
210+ // over it, which has undefined behavior.
211+ descriptors.push_back(fd);
212 }
213- readdir_r(dir.get(), entry.get(), &result_p); // Read next entry
214 }
215
216- // Close all open descriptors.
217-
218 for (auto fd : descriptors)
219 {
220 close(fd);
221 }
222 }
223- else
224- {
225- // Caller asked for file descriptors to be left alone, so we only close the standard three.
226-
227- close(0);
228- close(1);
229- close(2);
230- }
231+ // LCOV_EXCL_STOP
232 }
233
234 } // namespace internal
235
236=== modified file 'test/gtest/unity/util/Daemon/Daemon_test.cpp'
237--- test/gtest/unity/util/Daemon/Daemon_test.cpp 2013-03-15 11:25:22 +0000
238+++ test/gtest/unity/util/Daemon/Daemon_test.cpp 2013-05-20 22:54:25 +0000
239@@ -33,8 +33,8 @@
240 // as far as gtest is concerned, everything worked just fine.
241 //
242 // To deal with this, we create a file Daemon_test.out in the directory the test runs in. If the file is
243-// empty after the test, we know that the test succeeded. Otherwise, it contains any failure messages
244-// that were detected.
245+// empty after the test, we know that the test succeeded. Otherwise, it contains messages
246+// for any failure that were detected.
247 //
248
249 char const* error_file = "Daemon_test.out";
250@@ -173,22 +173,13 @@
251 error(__FILE__, __LINE__, "test file closed, should be open");
252 }
253
254- // Daemonize again, asking for open files to be closed. The file we opened earlier should be closed.
255-
256- d->close_fds();
257- d->daemonize_me();
258- check_std_descriptors();
259-
260- if (is_open(fd))
261- {
262- error(__FILE__, __LINE__, "test file open, should be closed");
263- }
264+ close(fd);
265 }
266
267 // Dummy signal handler
268
269 void
270-usr2_handler(int)
271+hup_handler(int)
272 {
273 }
274
275@@ -198,7 +189,7 @@
276
277 Daemon::UPtr d = Daemon::create();
278
279- // Set SIGUSR1 to be ignored and set SIGUSR2 to be caught.
280+ // Set SIGUSR1 to be ignored and set SIGHUP to be caught.
281
282 struct sigaction usr1_action;
283 memset(&usr1_action, 0, sizeof(usr1_action)); // To stop valgrind complaints
284@@ -209,12 +200,12 @@
285 abort();
286 }
287
288- struct sigaction usr2_action;
289- memset(&usr2_action, 0, sizeof(usr1_action)); // To stop valgrind complaints
290- usr2_action.sa_handler = usr2_handler;
291- if (sigaction(SIGUSR2, &usr2_action, nullptr) == -1)
292+ struct sigaction hup_action;
293+ memset(&hup_action, 0, sizeof(hup_action)); // To stop valgrind complaints
294+ hup_action.sa_handler = hup_handler;
295+ if (sigaction(SIGHUP, &hup_action, nullptr) == -1)
296 {
297- error(__FILE__, __LINE__, "cannot catch SIGUSR2");
298+ error(__FILE__, __LINE__, "cannot catch SIGHUP");
299 abort();
300 }
301
302@@ -232,14 +223,14 @@
303 error(__FILE__, __LINE__, "SIGUSR1 should have been left alone, but wasn't");
304 }
305
306- if (sigaction(SIGUSR2, &usr2_action, &prev_action) == -1)
307+ if (sigaction(SIGHUP, &hup_action, &prev_action) == -1)
308 {
309- error(__FILE__, __LINE__, "cannot restore SIGUSR2");
310+ error(__FILE__, __LINE__, "cannot restore SIGHUP");
311 abort();
312 }
313- if (prev_action.sa_handler != usr2_action.sa_handler)
314+ if (prev_action.sa_handler != hup_action.sa_handler)
315 {
316- error(__FILE__, __LINE__, "SIGUSR2 should have been left alone, but wasn't");
317+ error(__FILE__, __LINE__, "SIGHUP should have been left alone, but wasn't");
318 }
319
320 // Daemonize again, resetting signals, so we can check that they are at the defaults.
321@@ -257,14 +248,14 @@
322 error(__FILE__, __LINE__, "SIGUSR1 should have been reset, but wasn't");
323 }
324
325- if (sigaction(SIGUSR2, &usr2_action, &prev_action) == -1)
326+ if (sigaction(SIGHUP, &hup_action, &prev_action) == -1)
327 {
328- error(__FILE__, __LINE__, "cannot set SIGUSR2");
329+ error(__FILE__, __LINE__, "cannot set SIGHUP");
330 abort();
331 }
332 if (prev_action.sa_handler != SIG_DFL)
333 {
334- error(__FILE__, __LINE__, "SIGUSR2 should have been reset, but wasn't");
335+ error(__FILE__, __LINE__, "SIGHUP should have been reset, but wasn't");
336 }
337 }
338
339@@ -351,3 +342,43 @@
340 error(__FILE__, __LINE__, "re-acquired control terminal but should not have been able to");
341 }
342 }
343+
344+// Test that file descriptors are closed.
345+// We test this only when coverage is disabled because
346+// closing descriptors interferes with coverage reporting.
347+
348+#if !defined(COVERAGE_ENABLED)
349+
350+TEST(Daemon, file_close)
351+{
352+ Daemon::UPtr d = Daemon::create();
353+
354+ // Open a file so we can check that the file is closed after daemonizing.
355+
356+ int fd = open(".", O_RDONLY);
357+ if (fd == -1)
358+ {
359+ abort();
360+ }
361+
362+ int fd2 = open(".", O_RDONLY);
363+ if (fd2 == -1)
364+ {
365+ abort();
366+ }
367+
368+ d->close_fds();
369+ d->daemonize_me();
370+ check_std_descriptors();
371+
372+ if (is_open(fd))
373+ {
374+ error(__FILE__, __LINE__, "fd open, should be closed");
375+ }
376+ if (is_open(fd2))
377+ {
378+ error(__FILE__, __LINE__, "fd2 open, should be closed");
379+ }
380+}
381+
382+#endif
383
384=== modified file 'test/gtest/unity/util/FileIO/FileIO_test.cpp'
385--- test/gtest/unity/util/FileIO/FileIO_test.cpp 2013-05-13 12:37:12 +0000
386+++ test/gtest/unity/util/FileIO/FileIO_test.cpp 2013-05-20 22:54:25 +0000
387@@ -29,8 +29,13 @@
388
389 TEST(FileIO, basic)
390 {
391- int i __attribute__((unused))
392- = system("rm -f testfile; echo \"some chars\" >testfile");
393+ FILE* f;
394+
395+ remove("testfile");
396+ f = fopen("testfile", "w");
397+ EXPECT_NE(f, nullptr);
398+ fputs("some chars\n", f);
399+ fclose(f);
400
401 string s = read_text_file("testfile");
402 EXPECT_EQ("some chars\n", s);
403@@ -39,7 +44,11 @@
404 string contents("some chars\n");
405 EXPECT_EQ(vector<uint8_t>(contents.begin(), contents.end()), v);
406
407- system("rm -fr empty; >empty");
408+ remove("empty");
409+ f = fopen("empty", "w");
410+ EXPECT_NE(f, nullptr);
411+ fclose(f);
412+
413 s = read_text_file("empty");
414 EXPECT_TRUE(s.empty());
415 }
416@@ -59,8 +68,9 @@
417
418 try
419 {
420- int i __attribute__((unused))
421- = system("rm -fr testdir; mkdir testdir");
422+ remove("testdir");
423+ int rc = mkdir("testdir", 0777);
424+ EXPECT_NE(-1, rc);
425 read_text_file("testdir");
426 FAIL();
427 }

Subscribers

People subscribed via source and target branches

to all changes: