![]() We held a webinar with Batman (Mike Cohen) to show you how to do this – you can watch it here. All you need to do is learn, simplify, and practice. In a space where everyone is looking for a competitive advantage to land the best talent, this could be yours. The truth is very few recruiters write their own Boolean strings and even fewer have mastered it. If you use it without realizing it, you can learn a few Boolean operators that will drastically improve your current sourcing efforts. ![]() Getting a Better Understanding of Lucene's Search OperatorsĪ summary of that thread is included in this answer about the NOT operator.If you have ever Googled something, you have already created Boolean search strings. ![]() The so-called boolean operators AND, OR, NOT were added later on, as a kind of "syntactic sugar" for these original operators - introduced for people who were more familiar with AND, OR and NOT from other contexts. I don't know, but I think Lucene originally did not include AND, OR and NOT - but instead used + (must include), - (must not include) and "nothing" (may include). So, a document field must contain foo, bar and bat - and must not contain baz. Regarding NOT, since we mentioned it - that takes prescendence over AND. Use parentheses to explicitly make your intentions clear, when using AND and OR and also NOT. Which is parsed to this, where the parentheses are retained: (+foo +bar) (+baz +bat) ![]() It's the same result as if you had written this: foo AND bar AND baz AND batīut not the same as this: (foo AND bar) OR (baz AND bat) This is because the AND operators are transformed to + operators for every term, rendering the OR redundant. This parses to the following: +foo +bar +baz +bat You can encounter some potentially surprising situations.Ĭonsider this: foo AND bar OR baz AND bat The use of AND forces the terms on either side of the AND to be required. Requires that the term after the "+" symbol exist somewhere in the fieldĪnd the lack of a + operator means the default of "may" as in "may contain" - meaning the term is optional: it does not need to be present, if there is some other clause in the query which does match a document. The + operator is the "required" operator. and then printing the resulting string representation of the query. You can see this for yourself by taking your original query string and parsing it: Query query = parser.parse(queryString) This means the document's field must contain apache and website, but may also include jakarta (for a higher relevance score). The Lucene query jakarta OR apache AND website is equivalent to the following: jakarta +apache +website They are not purely concerned with TRUE/FALSE inclusion/exclusion, but also concerned with how to score results so that the more relevant results have higher scores than less relevant results. Lucene boolean operators serve a subtly different purpose. One key thing to understand is that Lucene boolean operators are not really "boolean" in the sense that you may think, based on Boolean algebra, where you use parentheses to help avoid ambiguity (or where you need to know what rules a programming language may be applying) - and where everything evaluates to TRUE or FALSE. If the documentation for this does exist, that would be great to see.) (Unfortunately I have never seen any official documentation which provides a citation for these precedence rules - but instead I am relying on empirical observations. You can verify this for yourself by parsing your query string and seeing how it converts AND and OR to the "required" and "optional" operators.Īnd the NOT operator takes precendence over the AND operator, since we are discussing precedence.īut you need to be very careful when dealing with Lucene's so-called "boolean" operators, as they do not behave the way you may expect based on their collective name ("boolean"). So, you are effectively doing this: jakarta OR (apache AND website) In Lucene, the AND operator takes precedence over the OR operator.
0 Comments
Leave a Reply. |