gozk

Code review comment for lp:~rogpeppe/gozk/factor-out-service

factor-out-service
Merge into trunk

Revision history for this message

Roger Peppe (rogpeppe) wrote on 2011-10-05:

On 5 October 2011 16:32, Gustavo Niemeyer <email address hidden> wrote:
> Review: Disapprove
>
> This is an interesting idea, but at the same time it's full of problems and
> not really necessary right now. I suggest we delay it until a future point
> when we actually need that abstraction, as we'll be in a better position
> to generalize it. At that point, we can even preserve for the internal
> implementation of the ZooKeeper server itself, without many problems.

ok, i'll do that if you don't mind.

the thing that was mostly bothering me was the mingling of the code
from Server and Service, and that the complexities of Service were
obscuring the essential simplicity of Server.

BTW many of the comments below are applicable even without
the factored-out service package.

> Here is some feedback, regardless, which I suggest you don't work on
> right now besides for [1] which should be integrated.
>
> [1]
>
> === added file 'reattach_test.go'
> --- reattach_test.go 1970-01-01 00:00:00 +0000
> +++ reattach_test.go 2011-10-05 14:37:11 +0000
>
> This should be in a separate merge proposal as it's important and unrelated
> to the strawman changes proposed.

oh, i thought it was in the earlier merge.

i've added to the update-server-interface, if that's ok.

> +type Interface interface {
> + // Directory gives the path to the directory containing
> + // the server's configuration and data files. It is assumed that
> + // this exists and can be modified.
> + Directory() string
>
> s/Directory/Dir/

actually maybe RunDir

> [3]
>
> +
> + // CheckAvailability should return an error if resources
> + // required by the server are not available; for instance
> + // if the server requires a network port which is not free.
> + CheckAvailability() os.Error
>
> We don't need this. The Command function can return an error if
> it knows it cannot start.

perhaps. i preferred to keep the availability checking separate
so that calling Command didn't imply that the service was
going to be started immediately.

> [4]
>
> + l, err := net.Listen("tcp", fmt.Sprintf("localhost:%d", port))
> + if err != nil {
> + return fmt.Errorf("cannot listen on port %v: %v", port, err)
> + }
> + l.Close()
>
> This doesn't really guarantee that when the service itself is started,
> the port will be available, so we can as well not do it.

it doesn't guarantee it, but it does mean that we can return
an early error rather than relying on the server to print
the error somewhere at some unspecified later time.
i found it very useful when testing.

i wonder if the Start should print stdout and stderr from
the server for the first half second or so after it's been started.
that's a possible solution, but makes things slower.

> [5]
>
> +// Process returns a reference to the identity
> +// of the last running server in the Service's directory.
> +// If the server was shut down, it returns (nil, nil).
>
> This shouldn't be in a generic service API. What if there
> are multiple processes handled by the service?

my plan was to change things so the server is started in
its own process group or session or whatever the appropriate
thing is these days.

> [6]
>
> +// If the server was shut down, it returns (nil, nil).
>
> If we're returning a nil *Process, we necessarily need err != nil.

changed in update-server-interface.

> [7]
>
> +// It stores the process ID of the server in the file "pid.txt"
> +// inside the server's directory. Stdout and stderr of the server
> +// are similarly written to "log.txt".
>
> That's not nice. Who owns the content of this directory, the
> interface implementation, or the Service? We can't write
> arbitrary content into a directory of different ownership.

i don't think that's a problem because the implementer of
the Service interface can always create another directory
inside the server's directory if need be which can be entirely
owned by the service. The names used are documented, so
the two can coexist happily if they like.

> Also, a generic interface would need to do better in terms
> of logging. We could have a file somewhere named output.txt, maybe,
> and let logging with the underlying implementation.

i'm not sure quite what you mean by that.
how would "output.txt" differ from "log.txt" above?

> [8]
>
> + if err := p.Kill(); err != nil && !strings.Contains(err.String(), "no such process") {
> + return fmt.Errorf("cannot kill server process: %v", err)
> + }
>
> Different services are stopped in different ways. Some prefer a SIGTERM,
> before a SIGKILL, for instance. And what happens if there are multiple
> processes?

yes, this is something that would might be good added to the
Service interface. if there was a generally applicable way of shutting
down a server (e.g. with SIGTERM) then i'd prefer that.
we wouldn't need this package to apply to *all* services - it
useful even if covers a reasonably large subset.

> + if err != nil && strings.Contains(err.String(), "no child processes") {
> + // If we can't wait for the server, it's possible that it was running
> + // but not as a child of this process, so give it a little while
> + // to exit. TODO poll with kill(no signal)?
> + time.Sleep(0.5e9)
> + }
>
> Comparing string errors like this is really bad. There should be
> an error code that we can check.

i'm of two minds about this. yes, there is an error code we can
check (ECHILD) but the check is not very nice:

if e, ok := err.(os.SyscallError); ok && e.Errno == os.ECHILD || err
== os.ECHILD {
}

it would be nice if os documented the type of its error returns.

> Then, yeah, a sleep isn't a nice
> approach.

unfortunately a sleep is the only way to do the wait in this case AFAIK.
the todo should be done though.

> [10]
>
> if s.zkServer != nil {
> - s.zkServer.Destroy()
> + s.zkServer.Stop() // TODO Destroy
> }
>
> This must be done.

yes, but i needed to see what was going on when the tests were failing.
i think gocheck should probably have an option to not delete
temp directories when the tests fail.

> [11]
>
> Where are the tests for the service?

none yet, because i suspected you might not like the idea.
seems i was right, sigh.

On 5 October 2011 16:32, Gustavo Niemeyer <gustavo@niemeyer.net> wrote:
> Review: Disapprove
>
> This is an interesting idea, but at the same time it's full of problems and
> not really necessary right now.  I suggest we delay it until a future point
> when we actually need that abstraction, as we'll be in a better position
> to generalize it. At that point, we can even preserve for the internal
> implementation of the ZooKeeper server itself, without many problems.

ok, i'll do that if you don't mind.

the thing that was mostly bothering me was the mingling of the code
from Server and Service, and that the complexities of Service were
obscuring the essential simplicity of Server.

BTW many of the comments below are applicable even without
the factored-out service package.

> Here is some feedback, regardless, which I suggest you don't work on
> right now besides for [1] which should be integrated.
>
> [1]
>
> === added file 'reattach_test.go'
> --- reattach_test.go    1970-01-01 00:00:00 +0000
> +++ reattach_test.go    2011-10-05 14:37:11 +0000
>
> This should be in a separate merge proposal as it's important and unrelated
> to the strawman changes proposed.

oh, i thought it was in the earlier merge.

i've added to the update-server-interface, if that's ok.

> +type Interface interface {
> +       // Directory gives the path to the directory containing
> +       // the server's configuration and data files. It is assumed that
> +       // this exists and can be modified.
> +       Directory() string
>
> s/Directory/Dir/

actually maybe RunDir

> [3]
>
> +
> +       // CheckAvailability should return an error if resources
> +       // required by the server are not available; for instance
> +       // if the server requires a network port which is not free.
> +       CheckAvailability() os.Error
>
> We don't need this.  The Command function can return an error if
> it knows it cannot start.

perhaps. i preferred to keep the availability checking separate
so that calling Command didn't imply that the service was
going to be started immediately.

> [4]
>
> +       l, err := net.Listen("tcp", fmt.Sprintf("localhost:%d", port))
> +       if err != nil {
> +               return fmt.Errorf("cannot listen on port %v: %v", port, err)
> +       }
> +       l.Close()
>
> This doesn't really guarantee that when the service itself is started,
> the port will be available, so we can as well not do it.

i wonder if the Start should print stdout and stderr from
the server for the first half second or so after it's been started.
that's a possible solution, but makes things slower.

> [5]
>
> +// Process returns a reference to the identity
> +// of the last running server in the Service's directory.
> +// If the server was shut down, it returns (nil, nil).
>
> This shouldn't be in a generic service API.  What if there
> are multiple processes handled by the service?

my plan was to change things so the server is started in
its own process group or session or whatever the appropriate
thing is these days.

> [6]
>
> +// If the server was shut down, it returns (nil, nil).
>
> If we're returning a nil *Process, we necessarily need err != nil.

changed in update-server-interface.

> [7]
>
> +// It stores the process ID of the server in the file "pid.txt"
> +// inside the server's directory. Stdout and stderr of the server
> +// are similarly written to "log.txt".
>
> That's not nice.  Who owns the content of this directory, the
> interface implementation, or the Service?  We can't write
> arbitrary content into a directory of different ownership.

> Also, a generic interface would need to do better in terms
> of logging. We could have a file somewhere named output.txt, maybe,
> and let logging with the underlying implementation.

i'm not sure quite what you mean by that.
how would "output.txt" differ from "log.txt" above?

> [8]
>
> +       if err := p.Kill(); err != nil && !strings.Contains(err.String(), "no such process") {
> +               return fmt.Errorf("cannot kill server process: %v", err)
> +       }
>
> Different services are stopped in different ways. Some prefer a SIGTERM,
> before a SIGKILL, for instance.  And what happens if there are multiple
> processes?

> +       if err != nil && strings.Contains(err.String(), "no child processes") {
> +               // If we can't wait for the server, it's possible that it was running
> +               // but not as a child of this process, so give it a little while
> +               // to exit. TODO poll with kill(no signal)?
> +               time.Sleep(0.5e9)
> +       }
>
> Comparing string errors like this is really bad. There should be
> an error code that we can check.

i'm of two minds about this. yes, there is an error code we can
check (ECHILD) but the check is not very nice:

if e, ok := err.(os.SyscallError); ok && e.Errno == os.ECHILD || err
== os.ECHILD {
}

it would be nice if os documented the type of its error returns.

> Then, yeah, a sleep isn't a nice
> approach.

unfortunately a sleep is the only way to do the wait in this case AFAIK.
the todo should be done though.

> [10]
>
>        if s.zkServer != nil {
> -               s.zkServer.Destroy()
> +               s.zkServer.Stop()               // TODO Destroy
>        }
>
> This must be done.

yes, but i needed to see what was going on when the tests were failing.
i think gocheck should probably have an option to not delete
temp directories when the tests fail.

> [11]
>
> Where are the tests for the service?

none yet, because i suspected you might not like the idea.
seems i was right, sigh.

« Back to merge proposal