searchkit:enhancement/faster-search-since-constraint-ed-extras

Last commit made on 2023-09-19
Get this branch:
git clone -b enhancement/faster-search-since-constraint-ed-extras https://git.launchpad.net/searchkit

Branch merges

Branch information

Name:
enhancement/faster-search-since-constraint-ed-extras
Repository:
lp:searchkit

Recent commits

8d440cc... by Edward Hope-Morley

Some fixups

 * logs
 * some code style
 * ensure first and last line checked first before
   full bisect

12a6c36... by Mustafa Kemal Gilor

searchkit/constraints: rewrite of binary search algorithm

Implemented a new binary search algorithm that no longer needs
filemarkers or knowing the lines beforehand, which reduces the
time spent applying a SearchConstraintSearchSince to a file,
especially if the file is large in size.

Removed the following classes which are no longer necessary:

- SkipRange
- SkipRangeOverlapException
- BinarySearchState
- FileMarkers (and respective unit tests)
- SeekInfo

Removed `test_logs_since_junk_not_allow_unverifiable` test case since
we're no longer parsing all lines in the file.

Removed following functions from BinarySeekSearchBase:

- _seek_and_validate
- _check_line
- _seek_next

Introduced the following new classes:

- LogFileDateSinceOffsetSeeker (the main binary search class)
- DateSearchFailedAtOffset (exception type)
- NoLogsFoundSince (exception type)
- NoDateFoundInLogs (exception type)

Signed-off-by: Mustafa Kemal Gilor <email address hidden>

b9e5674... by Edward Hope-Morley

Support alternate unicode decide error handlers

Instead of only supporting "strict" mode and silently skipping
files that raise a UnicodeDecodeError we now raise the error
and add a new "decode_errors" kwarg to FileSearcher that supports
setting alternate handlers such as backslashescape, ignore etc.

Also fixes unit tests logger.

94a264a... by Edward Hope-Morley

Configure logging handler if none exists

ab21741... by Edward Hope-Morley

Set logger name

cede539... by Edward Hope-Morley

Remove dependency on _gdbm

This import is not from stdlib so we now use dbm which is.

bcfd6e3... by Edward Hope-Morley

Improve full search result debug message

Adds search def tag to message to help identify inefficient
search expressions.

c4a0b3b... by Edward Hope-Morley

Fixes case where search paths overlap

If the paths used to register searches overlap once
expanded, they will cause the same file to be searched
concurrently which breaks the MPCache and is also
superfluous. This patch fixes that problem and applies
some minor optimisations to the way we extract
datetime from start of line to apply constraints.

Also fixes support for applying constraints to files
containing unicode characters by ensuring that we escape
rather than decode those charaters.

c75c1f4... by Edward Hope-Morley

Merge pull request #8 from nicolasbock/setup.cfg

Add setup.cfg

86ae817... by Nicolas Bock

Add setup.cfg

Signed-off-by: Nicolas Bock <email address hidden>