Thursday, January 29, 2015

Lucene Jira Bugs


https://issues.apache.org/jira/browse/LUCENE-2606
Compares two strings with a collator, also looking to see if the the strings
   * are impacted by jdk bugs. may not avoid all jdk bugs in tests.
   * see https://bugs.openjdk.java.net/browse/JDK-8071862
java.text.Collator

To learn more
https://issues.apache.org/jira/browse/LUCENE-6192
Long overflow in LuceneXXSkipWriter can corrupt skip data
// be careful when cast from long to int etc.
skipBuffer.writeVInt((int) (curDocPointer - lastSkipDocPointer[level]));
To
skipBuffer.writeVLong(curDocPointer - lastSkipDocPointer[level]);


Wednesday, January 28, 2015

Learn Lucene-Solr Bugs

https://issues.apache.org/jira/browse/SOLR-6954
Considering changing SolrClient#shutdown to SolrClient#close.
SolrCient implments Closable interface
==> so we can use try-with-resource

Solr 7024: improve java detection and error message
 if [ -z "$JAVA" ]; then
    echo >&2 "The currently defined JAVA_HOME ($JAVA_HOME) refers"
    echo >&2 "to a location where Java could not be found.  Aborting."
    echo >&2 "Either fix the JAVA_HOME variable or remove it from the"
    echo >&2 "environment so that the system PATH will be searched."
    exit 1
  fi

 $JAVA -version >/dev/null 2>&1 || {
  echo >&2 "Java not found, or an error was encountered when running java."
  echo >&2 "A working Java 8 is required to run Solr!"
  echo >&2 "Please install Java 8 or fix JAVA_HOME before running this script."
  echo >&2 "Command that we tried: '${JAVA} -version'"
  echo >&2 "Active Path:"
  echo >&2 "${PATH}"
  exit 1
}

https://issues.apache.org/jira/browse/SOLR-6449
Add first class support for Real Time Get in Solrj
   public SolrDocumentList getById(Collection<String> ids, SolrParams params) throws SolrServerException {
    if (ids == null || ids.isEmpty()) {
      throw new IllegalArgumentException("Must provide an identifier of a document to retrieve.");
    }

    ModifiableSolrParams reqParams = new ModifiableSolrParams(params);
    if (StringUtils.isEmpty(reqParams.get(CommonParams.QT))) {
      reqParams.set(CommonParams.QT, "/get");
    }
    reqParams.set("ids", (String[]) ids.toArray());

    return query(reqParams).getResults();
  }

https://issues.apache.org/jira/browse/SOLR-7013
Unclear error message with solr script when lacking jar executable
 hasJar=$(which jar 2>/dev/null)
hasUnzip=$(which unzip 2>/dev/null)

if [ ${hasJar} ]; then
  unzipCommand="$hasJar xf"
else
  if [ ${hasUnzip} ]; then
    unzipCommand="$hasUnzip"
  else
    echo -e "This script requires extracting a WAR file with either the jar or unzip utility, please install these utilities or contact your administrator for assistance."
    exit 1
  fi
fi

https://issues.apache.org/jira/browse/SOLR-6521
 final Object lock = locks.get(Math.abs(Hash.murmurhash3_x86_32(collection, 0, collection.length(), 0) % locks.size()));

Add Long/FixedBitSet and replace usage of OpenBitSet
https://issues.apache.org/jira/browse/LUCENE-5440
http://lucene.markmail.org/thread/35gw3amo53dsqsqj
==> when a lot of data and computation, squeeze every uneeded operation
So clearly FBS is faster than OBS (perhaps unless you use fastSet/Get) since it doesn't need to do bounds checking.
Also, FBS lets your grow itself by offering a convenient copy constructor which allows to expand/shrink the set.

SOLR-7050: realtime get should internally load only fields specified in fl [Performance]
https://issues.apache.org/jira/browse/SOLR-7050
== Only load needed fields when call search.doc
StoredDocument luceneDocument = searcher.doc(docid);
changed to:
       StoredDocument luceneDocument = searcher.doc(docid, rsp.getReturnFields().getLuceneFieldNames());

SOLR-6845: Suggester tests start new cores instead of reloading
https://issues.apache.org/jira/browse/SOLR-6845
LOG.info("reload(" + name + ")");
init
else if (getStoreFile().exists()) {
        if (LOG.isDebugEnabled()) {
          LOG.debug("attempt reload of the stored lookup from file " + getStoreFile());
}

SOLR-6909: Extract atomic update handling logic into AtomicUpdateDocumentMerger
Allow pluggable atomic update merging logic
util method: int docid = searcher.getFirstMatch(new Term(idField.getName(), idBytes));

SOLR-6931: We should do a limited retry when using HttpClient.
// always call setUseRetry, whether it is in config or not
 HttpClientUtil.setUseRetry(httpClient, config.getBool(HttpClientUtil.PROP_USE_RETRY, true));

SOLR-6932: All HttpClient ConnectionManagers and SolrJ clients should always be shutdown in tests and regular code.
change HttpClient to CloseableHttpClient
all of these type of things should be made closeable - including SolrJ clients for 5.0 (rather than shutdown).

SOLR-6324: Set finite default timeouts for select and update.
Currently HttpShardHandlerFactory and UpdateShardHandler default to infinite timeouts for socket connection and read. This can lead to undesirable behaviour, for example, if a machine crashes, then searches in progress will wait forever for a result to come back and end up using threads which will only get terminated at shutdown.
clientParams.set(HttpClientUtil.PROP_CONNECTION_TIMEOUT, connectionTimeout);
this.defaultClient = HttpClientUtil.createClient(clientParams);
set socketTimeout and connTimeout for shardHandlerFactory in solr.xml

SOLR-6643: Fix error reporting & logging of low level JVM Errors that occur when loading/reloading a SolrCore
Great example about how to reproduce the problem and add test cases.
CoreContainerCoreInitFailuresTest.testJavaLangErrorFromHandlerOnStartup

SOLR-4839: Upgrade to Jetty 9
set persistTempDirectory to true
Jetty 9 has builtin support for disabling protocols (POODLE)
excludeProtocols: SSLv3


SOLR-6950: Ensure TransactionLogs are closed with test ObjectReleaseTracker.
assert ObjectReleaseTracker.track(this);
assert ObjectReleaseTracker.release(this);
// integration test in assert mode
// use ObjectReleaseTracker to make sure resource is closed and released

Windows Bat
SOLR-6928: solr.cmd stop works only in english
change
  For /f "tokens=5" %%j in ('netstat -aon ^| find /i "listening" ^| find ":%SOLR_PORT%"') do (
to
  For /f "tokens=5" %%j in ('netstat -aon ^| find "TCP " ^| find ":%SOLR_PORT%"') do (
One related edit is that the find command should look for ":8983 " (with a space after the port number) to avoid matching other ports, e.g. the following stop command would select two lines in netstat output since :1234 will also match :12345
solr start -p 1234
solr start -p 12345
solr stop -p 1234

SOLR-7016: Fix bin\solr.cmd to work in a directory with spaces in the name.
Add "": "%SOLR_TIP%\bin"
 START /B "Solr-%SOLR_PORT%" /D "%SOLR_SERVER_DIR%" "%JAVA%" -server -Xss256k %SOLR_JAVA_MEM% %START_OPTS% -Dlog4j.configuration="%LOG4J_CONFIG%" -DSTOP.PORT=!STOP_PORT! -DSTOP.KEY=%STOP_KEY% ^

SOLR-7024: bin/solr: Improve java detection and error messages
SOLR-7013: use unzip if jar is not available (merged from r1653943)
https://issues.apache.org/jira/browse/SOLR-6787

Minors
https://issues.apache.org/jira/browse/SOLR-7059
Using paramset with multi-valued keys leads to a 500
Actually map in MapSolrParams is changed from Map<String,String> to Map<String,Object>

https://issues.apache.org/jira/browse/SOLR-7046
NullPointerException when group.function uses query() function
(Map) context = ValueSource.newContext(searcher); 
The variable context is always null because it's scope is local to this function, but it gets passed on to another function later.

Not fixed yet
https://issues.apache.org/jira/browse/SOLR-6640



Don't understand
regression in /update/extract ? ref guide examples of fmap & xpath don't seem to be working
https://issues.apache.org/jira/browse/SOLR-6856


Tuesday, January 20, 2015

Solr Custom Score



Using CustomScoreQuery For Custom Solr/Lucene Scoring
When you implement your own Lucene query, you’re taking control of two things:

Matching – what documents should be included in the search results
Scoring – what score should be assigned to a document (and therefore what order should they appear in)



  1. You’ve utilized Solr’s extensive set of query parsers & features including function queries, joins, etc. None of this solved your problem
  2. You’ve exhausted the ecosystem of plugins that extend on the capabilities in (1). That didn’t work.
  3. You’ve implemented your own query parser plugin that takes user input and generates existing Lucene queries to do this work. This still didn’t solve your problem.
  4. You’ve thought carefully about your analyzers – massaging your data so that at index time and query time, text lines up exactly as it should to optimize the behavior of existing search scoring. This still didn’t get what you wanted.
  5. You’ve implemented your own custom Similarity that modifies how Lucene calculates the traditional relevancy statistics – query norms, term frequency, etc.
  6. You’ve tried to use Lucene’s CustomScoreQuery to wrap an existing Query and alter each documents score via a callback. This still wasn’t low-level enough for you, you needed even more control.
http://dev.fernandobrito.com/2012/10/building-your-own-lucene-scorer/
Query query = parser.parse("searching something");
 
CustomScoreQuery customQuery = new MyOwnScoreQuery(query);
 
ScoreDoc[] hits = searcher.search(customQuery.createWeight(searcher), null, numHits).scoreDocs;