searchkit:ipc-improvements

Last commit made on 2024-05-11
Get this branch:
git clone -b ipc-improvements https://git.launchpad.net/searchkit

Branch merges

Branch information

Name:
ipc-improvements
Repository:
lp:searchkit

Recent commits

2e83b4c... by Edward Hope-Morley

Optimise data transfer between principle and worker procs

SearchResultMinimal is used to transder results between
the worker processes and the main collector. As a result
it must be a small as possible in order to keep transfer
fast and memory footprint low. There were several
unnecessary variables being stored and duplicated in this
object that have now been removed maing the transfer
faster and using less memory. Transfers are also batched
so as to reduce interruption on searches.

Also removes unnecssary use of multiprocessing.Queue for
single thread usecase.

f2c5882... by Edward Hope-Morley

Remove dup ms from logging format

e9bab7a... by Edward Hope-Morley

Remove pylint disables

Re-enables:
 * unidiomatic-typecheck
 * unsubscriptable-object
 * consider-using-with
 * fixme
 * no-value-for-parameter
 * attribute-defined-outside-init
 * arguments-differ
 * consider-using-dict-items
 * unused-private-member

f02f7c1... by Edward Hope-Morley

Fix single line constraint

When applying constraints to a single line, if we
are unable to we used to assume this meant a valid
line but the effect was that the searcher would
treat all subsequent lines as valid for that
searchdef which would lead to false positives. We
now treat this scenario as a pass *only* if it is
part of a set of constraints being applied to a
searchdef where *all* others have passed.

97b4ef4... by Edward Hope-Morley

Update tox and pylint

Fix lint errors as necessary.

79f44f3... by Edward Hope-Morley

Remove deprecated args and MPCacheSharded

6ae75b9... by Edward Hope-Morley

Cleanup the new binary search code and add extras

 * reduces sixe of log messages to essential information
 * cleanup code style consistency and docstring
 * ensure first and last line checked first before
   full bisect

d917a81... by Edward Hope-Morley

Merge pull request #10 from mustafakemalgilor/enhancement/faster-search-since-constraint

searchkit/constraints: rewrite of binary search algorithm

12a6c36... by Mustafa Kemal Gilor

searchkit/constraints: rewrite of binary search algorithm

Implemented a new binary search algorithm that no longer needs
filemarkers or knowing the lines beforehand, which reduces the
time spent applying a SearchConstraintSearchSince to a file,
especially if the file is large in size.

Removed the following classes which are no longer necessary:

- SkipRange
- SkipRangeOverlapException
- BinarySearchState
- FileMarkers (and respective unit tests)
- SeekInfo

Removed `test_logs_since_junk_not_allow_unverifiable` test case since
we're no longer parsing all lines in the file.

Removed following functions from BinarySeekSearchBase:

- _seek_and_validate
- _check_line
- _seek_next

Introduced the following new classes:

- LogFileDateSinceOffsetSeeker (the main binary search class)
- DateSearchFailedAtOffset (exception type)
- NoLogsFoundSince (exception type)
- NoDateFoundInLogs (exception type)

Signed-off-by: Mustafa Kemal Gilor <email address hidden>

b9e5674... by Edward Hope-Morley

Support alternate unicode decide error handlers

Instead of only supporting "strict" mode and silently skipping
files that raise a UnicodeDecodeError we now raise the error
and add a new "decode_errors" kwarg to FileSearcher that supports
setting alternate handlers such as backslashescape, ignore etc.

Also fixes unit tests logger.