BrazilianAnalyzer produces wrong results

Submitted by depaula on Mon, 2007-10-29 12:54.Troubleshooting

I'm using BrazilianAnalyzer as Global Analyzer and as Column Analyzer.

A search by processo de evaporação (without ") show no results, but "processo de evaporação" show 8 results. "Processo de evaporação" is like "evaporation process", that in Brazilian portuguese is graphed as "process of evaporation", with preposition de (=of)

Well, BrazilianAnalyzer removes the preposition "de" and stores only word's root:

BrazilianAnalyzer ->[process][evapor]

but this wouldn't be problem, if the same Analyzer will be used during of the query search parse.

I guess the query searcher is using another Analyzer as "numer or lowercase". Can someone give help?

How I can debug what Analyzer is used by query searcher?

Here, another query: "tese de doutorado" (without and with "):

tese de doutorado
=====================================================
DEBUG 10-29 19:49:51,346|search index: especialistas
INFO 10-29 19:49:51,346|templateName:espec01 templateFile:/templates/especialistas/espec01/main.vm
DEBUG 10-29 19:49:51,346|Got config: 0
DEBUG 10-29 19:49:51,346|Got searcher: 0
DEBUG 10-29 19:49:51,356|Advanced query: null
DEBUG 10-29 19:49:51,356|Start parse query: tese de doutorado
DEBUG 10-29 19:49:51,356|parsed query: tese de doutorado
DEBUG 10-29 19:49:51,356|translated query: +(codpes:tese^3.0 nompes:tes nomarecnh:tes dscesp:tes palchaesp:tes) +(codpes:de^3.0) +(codpes:doutorado^3.0 nompes:doutor nomarecnh:doutor dscesp:doutor palchaesp:doutor)
DEBUG 10-29 19:49:51,366|Start Searching: 20
DEBUG 10-29 19:49:51,366|Got docs from disk: 20
INFO 10-29 19:49:51,366|Found 0 MATCHING with "tese de doutorado" in 0 milliseconds
=================================================

"tese de doutorado"
=================================================
DEBUG 10-29 19:51:58,799|search index: especialistas
INFO 10-29 19:51:58,809|templateName:espec01 templateFile:/templates/especialist
as/espec01/main.vm
DEBUG 10-29 19:51:58,839|Got config: 40
DEBUG 10-29 19:51:58,860|Got searcher: 61
DEBUG 10-29 19:51:58,870|Advanced query: null
DEBUG 10-29 19:51:58,880|Start parse query: "tese de doutorado"
DEBUG 10-29 19:51:58,900|parsed query: "tese de doutorado"
DEBUG 10-29 19:51:58,920|translated query: +(codpes:tese de doutorado^3.0 nompes:"tes doutor" nomarecnh:"tes doutor" dscesp:"tes doutor" palchaesp:"tes doutor")
DEBUG 10-29 19:51:58,960|Start Searching: 161
DEBUG 10-29 19:51:59,130|Got docs from disk: 331
INFO 10-29 19:51:59,150|Found 361 MATCHING with ""tese de doutorado"" in 20 milliseconds
===========================================================

Thanks in advance,
Silvio

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Submitted by will on Mon, 2007-10-29 16:56.

The translated query is the final query that's sent to Lucene.

Just from the parsed query, it looks like the field "codpes" is some what different from other fields. Did it use some different analyzer? If so, you can choose not to include it as "searchable".

Submitted by depaula on Tue, 2007-10-30 07:02.

Thanks Will!

The field 'codpes' is primary key (numeric), and is setting as 'keyword' by default. If this field isn't included as "searchable" all works fine.

I noted that this isn't a bug, but general behavior, if you have a Analyser that preserves word's root, reject some words, the search setting is "and" and there are keywords fields that doesn't have the rejected term.

Thank you!
Silvio