Code review comment for lp:~themue/juju-core/057-tailer

Revision history for this message
Roger Peppe (rogpeppe) wrote :

Getting there! A few more thoughts and suggestions below.

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go
File utils/tailer/tailer.go (right):

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#newcode41
utils/tailer/tailer.go:41: // Writer. The reading beginns the specified
number of matching lines
s/beginns/begins/

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#newcode43
utils/tailer/tailer.go:43: func NewStandardTailer(readSeeker
io.ReadSeeker, writer io.Writer, lines int, filter TailerFilterFunc)
*Tailer {
Given that this function is going to be the one that everyone calls, I
think I'd name it NewTailer, but do we actually need to expose buffer
size and poll interval publicly except to the tailer tests?

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#newcode126
utils/tailer/tailer.go:126: readBuffer := make([]byte, t.bufsize)
This seems rather complex to me - I don't fully trust myself to vet the
logic. Also, I don't think we need to have a separate line buffer and
read buffer - we can use a single buffer and read directly into it. That
way we can avoid allocating a new buffer every time round the loop too,
although it is necessary to copy something from the start to the end.

I started trying to explain a better possiblity, but got carried away...
Here's the kind of thing I mean. Still somewhat complex, but I've tried
to keep the invariants simple. It needs testing (I've left in
one deliberate mistake that I hope to see a test for). I suggest
some internal tests that test this function in isolation with
quite a few different inputs.

// seekLastLines sets the read position of the ReadSeeker to the
// wanted number of filtered lines before the end.
func (t *Tailer) seekLastLines() error {
 offset, err := t.readSeeker.Seek(0, os.SEEK_END)
 if err != nil {
  return err
 }
 seekPos := int64(0)
 found := 0
 buf := make([]byte, minRead)
SeekLoop:
 for offset > 0 {
  // buf contains the data left over from the
  // previous iteration.
  space := cap(buf) - len(buf)
  if space < minRead {
   // grow buffer
   newBuf := make([]byte, len(buf), cap(buf)*2)
   copy(newBuf, buf)
   buf = newBuf
   space = cap(buf) - len(buf)

  }
  if int64(space) > offset {
   // Use exactly the right amount of space if there's
   // only a small amount remaining.
   space = int(offset)
  }
  // copy data remaining from last time to the end of the buffer,
  // so we can read into the right place.
  copy(buf[space:cap(buf)], buf)
  buf = buf[0 : len(buf)+space]

  offset -= int64(space)
  _, err := t.readSeeker.Seek(offset, os.SEEK_SET)
  if err != nil {
   return err
  }
  _, err = io.ReadFull(t.readSeeker, buf[0:space])
  if err != nil {
   return err
  }
  // Find the end of the last line in the buffer.
  // This will discard any unterminated line at the end
  // of the file.
  end := bytes.LastIndex(buf, delim)
  if end == -1 {
   // No end of line found - discard incomplete
   // line and continue looking. If this happens
   // at the beginning of the file, we don't care
   // because we're going to stop anyway.
   buf = buf[:0]
   continue
  }
  end++
  for {
   start := bytes.LastIndex(buf[0:end-1], delim)
   if start == -1 {
    break
   }
   start++
   if t.isValid(buf[start:end]) {
    found++
    if found >= t.lines {
     seekPos = offset + int64(start)
     break SeekLoop
    }
   }
   end = start - 1
  }
  // Leave the last line in buf, as we don't know whether
  // it's complete or not.
  buf = buf[0:end]
 }
 // Final positioning.
 t.readSeeker.Seek(seekPos, os.SEEK_SET)
 return nil
}

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#newcode198
utils/tailer/tailer.go:198: // Reached beginnig of data.
s/beginnig/beginning/

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#newcode212
utils/tailer/tailer.go:212: buffer, err := t.reader.ReadBytes(delimiter)
ReadBytes can't return ErrBufferFull.

I think you want to use ReadSlice.
Also, if the first call succeeds, we should just return the slice that
we got from ReadSlice - that way in the usual case that the line is
shorter than the bufio buffer we can avoid any allocation.

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#newcode226
utils/tailer/tailer.go:226: if len(line) == 0 || line[len(line)-1] ==
delimiter {
This can't happen. (from the docs: "returns err != nil if and only if
the returned data does not end in delim")

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#newcode234
utils/tailer/tailer.go:234: t.readSeeker.Seek(offset, os.SEEK_END)
This is racy. We can't seek from the end because the end might be
constantly changing. I think we need to keep track of where we're
reading in the file and seek back to the absolute offset of the start of
the partial line.

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#newcode247
utils/tailer/tailer.go:247: return false
Why?

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#newcode252
utils/tailer/tailer.go:252: return t.filter(line)
Consider trimming \r?\n from the end of the line before calling filter?

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer_test.go
File utils/tailer/tailer_test.go (right):

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer_test.go#newcode4
utils/tailer/tailer_test.go:4: package tailer_test
I'd like to see some unit tests of seekLastLines in here. There's a lot
of logic in there that is untested here (with respect to long lines,
line boundaries, read errors etc)

https://codereview.appspot.com/36540043/

« Back to merge proposal