https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#newcode43
utils/tailer/tailer.go:43: func NewStandardTailer(readSeeker
io.ReadSeeker, writer io.Writer, lines int, filter TailerFilterFunc)
*Tailer {
Given that this function is going to be the one that everyone calls, I
think I'd name it NewTailer, but do we actually need to expose buffer
size and poll interval publicly except to the tailer tests?
https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#newcode126
utils/tailer/tailer.go:126: readBuffer := make([]byte, t.bufsize)
This seems rather complex to me - I don't fully trust myself to vet the
logic. Also, I don't think we need to have a separate line buffer and
read buffer - we can use a single buffer and read directly into it. That
way we can avoid allocating a new buffer every time round the loop too,
although it is necessary to copy something from the start to the end.
I started trying to explain a better possiblity, but got carried away...
Here's the kind of thing I mean. Still somewhat complex, but I've tried
to keep the invariants simple. It needs testing (I've left in
one deliberate mistake that I hope to see a test for). I suggest
some internal tests that test this function in isolation with
quite a few different inputs.
// seekLastLines sets the read position of the ReadSeeker to the
// wanted number of filtered lines before the end.
func (t *Tailer) seekLastLines() error {
offset, err := t.readSeeker.Seek(0, os.SEEK_END)
if err != nil {
return err
}
seekPos := int64(0)
found := 0
buf := make([]byte, minRead)
SeekLoop:
for offset > 0 {
// buf contains the data left over from the
// previous iteration.
space := cap(buf) - len(buf)
if space < minRead {
// grow buffer
newBuf := make([]byte, len(buf), cap(buf)*2)
copy(newBuf, buf)
buf = newBuf
space = cap(buf) - len(buf)
}
if int64(space) > offset {
// Use exactly the right amount of space if there's
// only a small amount remaining.
space = int(offset)
}
// copy data remaining from last time to the end of the buffer,
// so we can read into the right place.
copy(buf[space:cap(buf)], buf)
buf = buf[0 : len(buf)+space]
offset -= int64(space)
_, err := t.readSeeker.Seek(offset, os.SEEK_SET)
if err != nil {
return err
}
_, err = io.ReadFull(t.readSeeker, buf[0:space])
if err != nil {
return err
}
// Find the end of the last line in the buffer.
// This will discard any unterminated line at the end
// of the file.
end := bytes.LastIndex(buf, delim)
if end == -1 {
// No end of line found - discard incomplete
// line and continue looking. If this happens
// at the beginning of the file, we don't care
// because we're going to stop anyway.
buf = buf[:0]
continue
}
end++
for {
start := bytes.LastIndex(buf[0:end-1], delim)
if start == -1 {
break
}
start++
if t.isValid(buf[start:end]) {
found++
if found >= t.lines {
seekPos = offset + int64(start)
break SeekLoop
}
}
end = start - 1
}
// Leave the last line in buf, as we don't know whether
// it's complete or not.
buf = buf[0:end]
}
// Final positioning.
t.readSeeker.Seek(seekPos, os.SEEK_SET)
return nil
}
I think you want to use ReadSlice.
Also, if the first call succeeds, we should just return the slice that
we got from ReadSlice - that way in the usual case that the line is
shorter than the bufio buffer we can avoid any allocation.
https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#newcode234
utils/tailer/tailer.go:234: t.readSeeker.Seek(offset, os.SEEK_END)
This is racy. We can't seek from the end because the end might be
constantly changing. I think we need to keep track of where we're
reading in the file and seek back to the absolute offset of the start of
the partial line.
Getting there! A few more thoughts and suggestions below.
https:/ /codereview. appspot. com/36540043/ diff/40001/ utils/tailer/ tailer. go tailer. go (right):
File utils/tailer/
https:/ /codereview. appspot. com/36540043/ diff/40001/ utils/tailer/ tailer. go#newcode41 tailer. go:41: // Writer. The reading beginns the specified
utils/tailer/
number of matching lines
s/beginns/begins/
https:/ /codereview. appspot. com/36540043/ diff/40001/ utils/tailer/ tailer. go#newcode43 tailer. go:43: func NewStandardTail er(readSeeker
utils/tailer/
io.ReadSeeker, writer io.Writer, lines int, filter TailerFilterFunc)
*Tailer {
Given that this function is going to be the one that everyone calls, I
think I'd name it NewTailer, but do we actually need to expose buffer
size and poll interval publicly except to the tailer tests?
https:/ /codereview. appspot. com/36540043/ diff/40001/ utils/tailer/ tailer. go#newcode126 tailer. go:126: readBuffer := make([]byte, t.bufsize)
utils/tailer/
This seems rather complex to me - I don't fully trust myself to vet the
logic. Also, I don't think we need to have a separate line buffer and
read buffer - we can use a single buffer and read directly into it. That
way we can avoid allocating a new buffer every time round the loop too,
although it is necessary to copy something from the start to the end.
I started trying to explain a better possiblity, but got carried away...
Here's the kind of thing I mean. Still somewhat complex, but I've tried
to keep the invariants simple. It needs testing (I've left in
one deliberate mistake that I hope to see a test for). I suggest
some internal tests that test this function in isolation with
quite a few different inputs.
// seekLastLines sets the read position of the ReadSeeker to the Seek(0, os.SEEK_END)
// wanted number of filtered lines before the end.
func (t *Tailer) seekLastLines() error {
offset, err := t.readSeeker.
if err != nil {
return err
}
seekPos := int64(0)
found := 0
buf := make([]byte, minRead)
SeekLoop:
for offset > 0 {
// buf contains the data left over from the
// previous iteration.
space := cap(buf) - len(buf)
if space < minRead {
// grow buffer
newBuf := make([]byte, len(buf), cap(buf)*2)
copy(newBuf, buf)
buf = newBuf
space = cap(buf) - len(buf)
} buf[space: cap(buf) ], buf)
if int64(space) > offset {
// Use exactly the right amount of space if there's
// only a small amount remaining.
space = int(offset)
}
// copy data remaining from last time to the end of the buffer,
// so we can read into the right place.
copy(
buf = buf[0 : len(buf)+space]
offset -= int64(space) Seek(offset, os.SEEK_SET) t.readSeeker, buf[0:space]) (buf, delim) (buf[0: end-1], delim) buf[start: end]) { Seek(seekPos, os.SEEK_SET)
_, err := t.readSeeker.
if err != nil {
return err
}
_, err = io.ReadFull(
if err != nil {
return err
}
// Find the end of the last line in the buffer.
// This will discard any unterminated line at the end
// of the file.
end := bytes.LastIndex
if end == -1 {
// No end of line found - discard incomplete
// line and continue looking. If this happens
// at the beginning of the file, we don't care
// because we're going to stop anyway.
buf = buf[:0]
continue
}
end++
for {
start := bytes.LastIndex
if start == -1 {
break
}
start++
if t.isValid(
found++
if found >= t.lines {
seekPos = offset + int64(start)
break SeekLoop
}
}
end = start - 1
}
// Leave the last line in buf, as we don't know whether
// it's complete or not.
buf = buf[0:end]
}
// Final positioning.
t.readSeeker.
return nil
}
https:/ /codereview. appspot. com/36540043/ diff/40001/ utils/tailer/ tailer. go#newcode198 tailer. go:198: // Reached beginnig of data. beginning/
utils/tailer/
s/beginnig/
https:/ /codereview. appspot. com/36540043/ diff/40001/ utils/tailer/ tailer. go#newcode212 tailer. go:212: buffer, err := t.reader. ReadBytes( delimiter)
utils/tailer/
ReadBytes can't return ErrBufferFull.
I think you want to use ReadSlice.
Also, if the first call succeeds, we should just return the slice that
we got from ReadSlice - that way in the usual case that the line is
shorter than the bufio buffer we can avoid any allocation.
https:/ /codereview. appspot. com/36540043/ diff/40001/ utils/tailer/ tailer. go#newcode226 tailer. go:226: if len(line) == 0 || line[len(line)-1] ==
utils/tailer/
delimiter {
This can't happen. (from the docs: "returns err != nil if and only if
the returned data does not end in delim")
https:/ /codereview. appspot. com/36540043/ diff/40001/ utils/tailer/ tailer. go#newcode234 tailer. go:234: t.readSeeker. Seek(offset, os.SEEK_END)
utils/tailer/
This is racy. We can't seek from the end because the end might be
constantly changing. I think we need to keep track of where we're
reading in the file and seek back to the absolute offset of the start of
the partial line.
https:/ /codereview. appspot. com/36540043/ diff/40001/ utils/tailer/ tailer. go#newcode247 tailer. go:247: return false
utils/tailer/
Why?
https:/ /codereview. appspot. com/36540043/ diff/40001/ utils/tailer/ tailer. go#newcode252 tailer. go:252: return t.filter(line)
utils/tailer/
Consider trimming \r?\n from the end of the line before calling filter?
https:/ /codereview. appspot. com/36540043/ diff/40001/ utils/tailer/ tailer_ test.go tailer_ test.go (right):
File utils/tailer/
https:/ /codereview. appspot. com/36540043/ diff/40001/ utils/tailer/ tailer_ test.go# newcode4 tailer_ test.go: 4: package tailer_test
utils/tailer/
I'd like to see some unit tests of seekLastLines in here. There's a lot
of logic in there that is untested here (with respect to long lines,
line boundaries, read errors etc)
https:/ /codereview. appspot. com/36540043/