Libcolumbus

Merge lp:~jpakkane/libcolumbus/hud-rework into lp:libcolumbus

hud-rework
Merge into trunk

Proposed by Jussi Pakkanen on 2014-02-28

Status:

Merged

Approved by:

Pete Woods on 2014-03-03

Approved revision:

485

Merged at revision:

461

Proposed branch:

lp:~jpakkane/libcolumbus/hud-rework

Merge into:

lp:libcolumbus

Diff against target:

468 lines (+291/-19)

9 files modified

CMakeLists.txt (+1/-1)
debian/changelog (+6/-0)
include/Matcher.hh (+11/-0)
include/WordStore.hh (+1/-0)
src/MatchResults.cc (+4/-1)
src/Matcher.cc (+71/-12)
src/WordStore.cc (+8/-4)
test/MatcherTest.cc (+171/-1)
test/TrieTest.cc (+18/-0)

To merge this branch:

bzr merge lp:~jpakkane/libcolumbus/hud-rework

Related bugs:

Bug #1276569: hud flaky or problematic autopilot tests detected in cu2d (02/2014)	High	Fix Released
Bug #1279490: Global menu searching obvious matches don't match	Undecided	Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
Pete Woods (community)		2014-02-28	Approve on 2014-03-03
PS Jenkins bot (community)	continuous-integration		Approve on 2014-02-28
Review via email: mp+208799@code.launchpad.net

Commit message

Add new online search mode to get better performance in HUD.

Description of the change

Add new online search mode to get better performance in HUD.

lp:~jpakkane/libcolumbus/hud-rework updated on 2014-02-28

483. By Jussi Pakkanen on 2014-02-28: Bumped version number.

Revision history for this message

Pete Woods (pete-woods) wrote on 2014-02-28:

Requests:

* Increment the debian version number
* Add some more "save foo" noise (before and after "save") to the save test

Revision history for this message

PS Jenkins bot (ps-jenkins) wrote on 2014-02-28:

PASSED: Continuous integration, rev:483
http://jenkins.qa.ubuntu.com/job/libcolumbus-ci/23/
Executed test runs:
    SUCCESS: http://jenkins.qa.ubuntu.com/job/libcolumbus-trusty-amd64-ci/1
    SUCCESS: http://jenkins.qa.ubuntu.com/job/libcolumbus-trusty-armhf-ci/1
    SUCCESS: http://jenkins.qa.ubuntu.com/job/libcolumbus-trusty-i386-ci/1

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/libcolumbus-ci/23/rebuild

review: Approve (continuous-integration)

lp:~jpakkane/libcolumbus/hud-rework updated on 2014-02-28

484. By Jussi Pakkanen on 2014-02-28: Some test cleanup.

Revision history for this message

Jussi Pakkanen (jpakkane) wrote on 2014-02-28:

Fixed.

Revision history for this message

PS Jenkins bot (ps-jenkins) wrote on 2014-02-28:

FAILED: Continuous integration, rev:484
http://jenkins.qa.ubuntu.com/job/libcolumbus-ci/24/
Executed test runs:
    FAILURE: http://jenkins.qa.ubuntu.com/job/libcolumbus-trusty-amd64-ci/2/console
    FAILURE: http://jenkins.qa.ubuntu.com/job/libcolumbus-trusty-armhf-ci/2/console
    FAILURE: http://jenkins.qa.ubuntu.com/job/libcolumbus-trusty-i386-ci/2/console

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/libcolumbus-ci/24/rebuild

review: Needs Fixing (continuous-integration)

lp:~jpakkane/libcolumbus/hud-rework updated on 2014-02-28

485. By Jussi Pakkanen on 2014-02-28: Blah.

Revision history for this message

PS Jenkins bot (ps-jenkins) wrote on 2014-02-28:

PASSED: Continuous integration, rev:485
http://jenkins.qa.ubuntu.com/job/libcolumbus-ci/25/
Executed test runs:
    SUCCESS: http://jenkins.qa.ubuntu.com/job/libcolumbus-trusty-amd64-ci/3
    SUCCESS: http://jenkins.qa.ubuntu.com/job/libcolumbus-trusty-armhf-ci/3
    SUCCESS: http://jenkins.qa.ubuntu.com/job/libcolumbus-trusty-i386-ci/3

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/libcolumbus-ci/25/rebuild

review: Approve (continuous-integration)

Revision history for this message

Pete Woods (pete-woods) wrote on 2014-03-03:

Looks good to me! Tried with the dependent branch of HUD (lp:~pete-woods/hud/tweak-search-parameters).

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Jussi Pakkanen

 === modified file 'CMakeLists.txt'
 --- CMakeLists.txt	2013-08-09 19:35:42 +0000
 +++ CMakeLists.txt	2014-02-28 14:46:51 +0000
@@ -38,7 +38,7 @@
  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fvisibility=hidden")
  set(SO_VERSION_MAJOR "1")
--set(SO_VERSION_MINOR "0")
++set(SO_VERSION_MINOR "1")
  set(SO_VERSION_PATCH "0")
  set(SO_VERSION "${SO_VERSION_MAJOR}.${SO_VERSION_MINOR}.${SO_VERSION_PATCH}")
 === modified file 'debian/changelog'
 --- debian/changelog	2014-01-20 19:43:49 +0000
 +++ debian/changelog	2014-02-28 14:46:51 +0000
@@ -1,3 +1,9 @@
++libcolumbus (1.1.0-0ubuntu1) UNRELEASED; urgency=medium
++
++  * New online search mode
++
++ -- Jussi Pakkanen <jussi.pakkanen@ubuntu.com>  Fri, 28 Feb 2014 15:44:25 +0200
++
  libcolumbus (1.0.0+14.04.20140120-0ubuntu1) trusty; urgency=low
    * New rebuild forced
 === modified file 'include/Matcher.hh'
 --- include/Matcher.hh	2013-08-07 12:20:52 +0000
 +++ include/Matcher.hh	2014-02-28 14:46:51 +0000
@@ -60,6 +60,17 @@
      void index(const Corpus &c);
      ErrorValues& getErrorValues();
      IndexWeights& getIndexWeights();
++    /*
++     * This function is optimized for online matches, that is, queries
++     * that are live updated during typing. It uses slightly different
++     * search heuristics to ensure results that "feel good" to humans.
++     *
++     * The second argument is the field that should be the primary focus.
++     * Usually it means having the text that will be shown to the user.
++     * As an example, in the HUD, this field would contain the command
++     * (and nothing else) that will be executed.
++     */
++    MatchResults onlineMatch(const WordList &query, const Word &primaryIndex);
  };
  COL_NAMESPACE_END
 === modified file 'include/WordStore.hh'
 --- include/WordStore.hh	2013-06-14 12:26:10 +0000
 +++ include/WordStore.hh	2014-02-28 14:46:51 +0000
@@ -49,6 +49,7 @@
      WordID getID(const Word &w);
++    bool hasWord(const Word &w) const;
      Word getWord(const WordID id) const;
      bool hasWord(const WordID id) const;
  };
 === modified file 'src/MatchResults.cc'
 --- src/MatchResults.cc	2013-06-20 12:25:11 +0000
 +++ src/MatchResults.cc	2014-02-28 14:46:51 +0000
@@ -96,7 +96,10 @@
      if(p->sorted)
          return;
      MatchResults *me = const_cast<MatchResults*>(this);
--    sort(me->p->results.rbegin(), me->p->results.rend());
++    stable_sort(me->p->results.rbegin(), me->p->results.rend(),
++            [](const pair<double, DocumentID> &a, const pair<double, DocumentID> &b) -> bool{
++        return a.first < b.first;
++    });
      me->p->sorted = true;
+ }
 === modified file 'src/Matcher.cc'
 --- src/Matcher.cc	2013-08-07 12:20:52 +0000
 +++ src/Matcher.cc	2014-02-28 14:46:51 +0000
@@ -36,6 +36,7 @@
  #include <stdexcept>
  #include <map>
  #include <vector>
++#include <algorithm>
  #ifdef HAS_SPARSE_HASH
  #include <google/sparse_hash_map>
@@ -98,6 +99,7 @@
      IndexWeights weights;
      MatcherStatistics stats;
      WordStore store;
++    map<pair<DocumentID, WordID>, size_t> originalSizes; // Lengths of original documents.
  };
  void ReverseIndex::add(const WordID wordID, const WordID indexID, const DocumentID id) {
@@ -230,15 +232,6 @@
+     }
+ }
--static void expandQuery(const WordList &query, WordList &expandedQuery) {
--    for(size_t i=0; i<query.size(); i++)
--        expandedQuery.addWord(query[i]);
--
--    for(size_t i=0; i<query.size()-1; i++) {
--        expandedQuery.addWord(query[i].join(query[i+1]));
--    }
--}
--
  static bool subtermsMatch(MatcherPrivate *p, const ResultFilter &filter, size_t term, DocumentID id) {
      for(size_t subTerm=0; subTerm < filter.numSubTerms(term); subTerm++) {
          const Word &filterName = filter.getField(term, subTerm);
@@ -286,6 +279,10 @@
              const Word &fieldName = textNames[ti];
              const WordID fieldID = p->store.getID(fieldName);
              const WordList &text = d.getText(fieldName);
++            pair<DocumentID, WordID> lengths;
++            lengths.first = d.getID();
++            lengths.second = fieldID;
++            p->originalSizes[lengths] = text.size();
              for(size_t wi=0; wi<text.size(); wi++) {
                  const Word &word = text[wi];
                  const WordID wordID = p->store.getID(word);
@@ -336,16 +333,14 @@
      const int maxIterations = 1;
      const int increment = LevenshteinIndex::getDefaultError();
      const size_t minMatches = 10;
--    WordList expandedQuery;
      MatchResults allMatches;
      if(query.size() == 0)
          return matchedDocuments;
--    expandQuery(query, expandedQuery);
      // Try to search with ever growing error until we find enough matches.
      for(int i=0; i<maxIterations; i++) {
          MatchResults matches;
--        matchWithRelevancy(expandedQuery, params, i*increment, matches);
++        matchWithRelevancy(query, params, i*increment, matches);
          if(matches.size() >= minMatches || i == maxIterations-1) {
              allMatches.addResults(matches);
              break;
@@ -392,5 +387,69 @@
      return p->weights;
+ }
++static map<DocumentID, size_t> countExacts(MatcherPrivate *p, const WordList &query, const WordID indexID) {
++    map<DocumentID, size_t> matchCounts;
++    for(size_t i=0; i<query.size(); i++) {
++        const Word &w = query[i];
++        if(w.length() == 0 || !p->store.hasWord(w)) {
++            continue;
++        }
++        WordID curWord = p->store.getID(w);
++        vector<DocumentID> exacts;
++        p->reverseIndex.findDocuments(curWord, indexID, exacts);
++        for(const auto &i : exacts) {
++            matchCounts[i]++; // Default is zero initialisation.
++        }
++    }
++    return matchCounts;
++}
++
++struct DocCount {
++    DocumentID id;
++    size_t matches;
++};
++
++MatchResults Matcher::onlineMatch(const WordList &query, const Word &primaryIndex) {
++    MatchResults results;
++    set<DocumentID> exactMatched;
++    map<DocumentID, double> accumulator;
++    if(!p->store.hasWord(primaryIndex)) {
++        string msg("Index named ");
++        msg += primaryIndex.asUtf8();
++        msg += " is not known";
++        throw invalid_argument(msg);
++    }
++    WordID indexID = p->store.getID(primaryIndex);
++    // How many times each document matched with zero error.
++    vector<DocCount> stats;
++    for(const auto &i : countExacts(p, query, indexID)) {
++        DocCount c;
++        pair<DocumentID, WordID> key;
++        exactMatched.insert(i.first);
++        key.first = i.first;
++        key.second = indexID;
++        c.id = i.first;
++        c.matches = i.second;
++        stats.push_back(c);
++    }
++    for(const auto &i: stats) {
++        accumulator[i.id] = 2*i.matches;
++        if(i.matches == query.size()
++                && i.matches == p->originalSizes[make_pair(i.id, indexID)]) { // Perfect match.
++            accumulator[i.id] += 100;
++        }
++    }
++    // Merge in fuzzy matches.
++    MatchResults fuzzyResults = match(query);
++    for(size_t i = 0; i<fuzzyResults.size(); i++) {
++        DocumentID docid = fuzzyResults.getDocumentID(i);
++        accumulator[docid] += fuzzyResults.getRelevancy(i);
++    }
++    for(const auto &i : accumulator) {
++        results.addResult(i.first, i.second);
++    }
++    return results;
++}
++
  COL_NAMESPACE_END
 === modified file 'src/WordStore.cc'
 --- src/WordStore.cc	2013-01-31 10:23:45 +0000
 +++ src/WordStore.cc	2014-02-28 14:46:51 +0000
@@ -53,15 +53,19 @@
+ }
  WordID WordStore::getID(const Word &w) {
--    TrieOffset node = p->words.findWord(w);
--    if(node)
--        return p->words.getWordID(node);
--    node = p->words.insertWord(w, p->wordIndex.size());
++    if(p->words.hasWord(w)) {
++        return p->words.getWordID(p->words.findWord(w));
++    }
++    TrieOffset node = p->words.insertWord(w, p->wordIndex.size());
      p->wordIndex.push_back(node);
      WordID result = p->wordIndex.size()-1;
      return result;
+ }
++bool WordStore::hasWord(const Word &w) const {
++    return p->words.hasWord(w);
++}
++
  Word WordStore::getWord(const WordID id) const {
      if(!hasWord(id)) {
          throw out_of_range("Tried to access non-existing WordID in WordStore.");
 === modified file 'test/MatcherTest.cc'
 --- test/MatcherTest.cc	2013-06-20 12:10:40 +0000
 +++ test/MatcherTest.cc	2014-02-28 14:46:51 +0000
@@ -23,6 +23,7 @@
  #include "WordList.hh"
  #include "Document.hh"
  #include "MatchResults.hh"
++#include "ColumbusHelpers.hh"
  #include <cassert>
  using namespace Columbus;
@@ -123,7 +124,169 @@
      c.addDocument(d2);
      m.index(c);
--    matches = m.match("Sara Michell Geller");
++    matches = m.match("Sari Michell Geller");
++    assert(matches.getDocumentID(0) == correct);
++}
++
++void testSentence() {
++    Corpus c;
++    DocumentID correct = 1;
++    DocumentID wrong = 0;
++    Document d1(correct);
++    Document d2(wrong);
++    Word fieldName("name");
++    Word secondName("context");
++    Matcher m;
++    MatchResults matches;
++
++    d1.addText(fieldName, "Fit Canvas to Layers");
++    d1.addText(secondName, "View Zoom (100%)");
++    d2.addText(fieldName, "Fit image in Window");
++    d2.addText(secondName, "Image");
++
++    c.addDocument(d1);
++    c.addDocument(d2);
++
++    m.index(c);
++    matches = m.match("fit canvas to layers");
++    assert(matches.getDocumentID(0) == correct);
++}
++
++void testExactOrder() {
++    Corpus c;
++    DocumentID correct = 1;
++    DocumentID wrong = 0;
++    DocumentID moreWrong = 100;
++    Document d1(correct);
++    Document d2(wrong);
++    Document d3(moreWrong);
++    Word fieldName("name");
++    Word secondName("context");
++    Matcher m;
++    MatchResults matches;
++    WordList q = splitToWords("fit canvas to layers");
++    d1.addText(fieldName, "Fit Canvas to Layers");
++    d1.addText(secondName, "View Zoom (100%)");
++    d2.addText(fieldName, "Fit image in Window");
++    d2.addText(secondName, "Image");
++    d3.addText(fieldName, "Not matching.");
++    d3.addText(secondName, "fit canvas to layers");
++    c.addDocument(d1);
++    c.addDocument(d2);
++    c.addDocument(d3);
++
++    m.index(c);
++    matches = m.onlineMatch(q, fieldName);
++    assert(matches.size() >= 1);
++    assert(matches.getDocumentID(0) == correct);
++}
++
++void testSmallestMatch() {
++    Corpus c;
++    DocumentID correct = 1;
++    DocumentID wrong = 0;
++    Document d1(correct);
++    Document d2(wrong);
++    Word fieldName("name");
++    Word field2("dummy");
++    Matcher m;
++    MatchResults matches;
++    WordList q = splitToWords("save");
++    d1.addText(fieldName, "save");
++    d1.addText(field2, "lots of text to ensure statistics of this field are ignored");
++    d2.addText(fieldName, "save as");
++    c.addDocument(d1);
++    c.addDocument(d2);
++
++    m.index(c);
++    matches = m.onlineMatch(q, fieldName);
++    assert(matches.size() == 2);
++    assert(matches.getDocumentID(0) == correct);
++}
++
++void noCommonMatch() {
++    Corpus c;
++    DocumentID correct = 1;
++    Document d1(correct);
++    Word fieldName("name");
++    Word field2("dummy");
++    Matcher m;
++    MatchResults matches;
++    WordList q = splitToWords("fit canvas to selection");
++    d1.addText(fieldName, "Preparing your Images for the Web");
++    d1.addText(fieldName, "Help user manual");
++    c.addDocument(d1);
++
++    m.index(c);
++    matches = m.onlineMatch(q, fieldName);
++    assert(matches.size() == 0);
++}
++
++void emptyMatch() {
++    Corpus c;
++    DocumentID correct = 1;
++    Document d1(correct);
++    Word fieldName("name");
++    Word field2("dummy");
++    Matcher m;
++    MatchResults matches;
++    WordList q;
++    d1.addText(fieldName, "Preparing your Images for the Web");
++    d1.addText(fieldName, "Help user manual");
++    c.addDocument(d1);
++
++    m.index(c);
++    matches = m.onlineMatch(q, fieldName);
++    assert(matches.size() == 0);
++}
++
++void testMatchCount() {
++    Corpus c;
++    DocumentID correct = 1;
++    DocumentID wrong = 0;
++    Document d1(correct);
++    Document d2(wrong);
++    Word fieldName("name");
++    Word secondName("context");
++    Matcher m;
++    MatchResults matches;
++    WordList q = splitToWords("fit canvas to selection");
++    d1.addText(fieldName, "Fit Canvas to Layers");
++    d1.addText(secondName, "View Zoom (100%)");
++    d2.addText(fieldName, "Selection editor");
++    d2.addText(secondName, "Windows dockable dialogs");
++    c.addDocument(d1);
++    c.addDocument(d2);
++
++    m.index(c);
++    matches = m.onlineMatch(q, fieldName);
++    assert(matches.size() == 2);
++    assert(matches.getDocumentID(0) == correct);
++}
++
++void testPerfect() {
++    Corpus c;
++    DocumentID correct = 0;
++    Document d1(1);
++    Document d2(correct);
++    Document d3(2);
++    Document d4(3);
++    Word fieldName("name");
++    Matcher m;
++    MatchResults matches;
++    WordList q = splitToWords("save");
++    d1.addText(fieldName, "Save as");
++    d2.addText(fieldName, "Save");
++    d3.addText(fieldName, "Save yourself");
++    d4.addText(fieldName, "Save the whales");
++    c.addDocument(d1);
++    c.addDocument(d2);
++    c.addDocument(d3);
++    c.addDocument(d4);
++
++    m.index(c);
++    matches = m.onlineMatch(q, fieldName);
++    assert(matches.size() >= 1);
      assert(matches.getDocumentID(0) == correct);
+ }
@@ -132,6 +295,13 @@
          testMatcher();
          testRelevancy();
          testMultiWord();
++        testSentence();
++        testExactOrder();
++        testSmallestMatch();
++        noCommonMatch();
++        emptyMatch();
++        testMatchCount();
++        testPerfect();
      } catch(const std::exception &e) {
          fprintf(stderr, "Fail: %s\n", e.what());
          return 666;
 === modified file 'test/TrieTest.cc'
 --- test/TrieTest.cc	2013-04-03 13:50:54 +0000
 +++ test/TrieTest.cc	2014-02-28 14:46:51 +0000
@@ -46,10 +46,28 @@
      assert(result == w2);
+ }
++void testHas() {
++    Trie t;
++    Word w1("abc");
++    Word w2("abd");
++    Word w3("a");
++    Word w4("x");
++    Word result;
++
++    WordID i1 = 1;
++
++    assert(t.numWords() == 0);
++    t.insertWord(w1, i1);
++    assert(t.hasWord(w1));
++    assert(!t.hasWord(w2));
++    assert(!t.hasWord(w3));
++    assert(!t.hasWord(w4));
++}
  int main(int /*argc*/, char **/*argv*/) {
      // Move basic tests from levtrietest here.
      testWordBuilding();
++    testHas();
      return 0;
+ }