Home » Server Options » Text & interMedia » Text search to eliminate "?" using Contains Clause (Oracle 11.2.0.1 , Win XP)
Text search to eliminate "?" using Contains Clause [message #566326] Wed, 12 September 2012 14:35 Go to next message
rajivn786
Messages: 161
Registered: January 2010
Senior Member
Hi,

I have a clob field in a table..I need to eliminate text which has a "?" followed by text..

The clob has the following text..(truncated to fit here)

Ex : "? 459.2N Inferior Vena Cava Obstruction
Cont lovenox but suspect IVC clot is exacerbating LE edema" -- Text

Select * from  t
where   contains(report_text,'near((clot,  inferior vena cava), 10)') > 0
and not(contains(report_text,'?clot') > 0) 



The above doesn't have "?clot" in the text...
So we should get the above text if we use the above query..
but I couldn't figure it out why the above record is not getting returned..
Re: Text search to eliminate "?" using Contains Clause [message #566329 is a reply to message #566326] Wed, 12 September 2012 15:48 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
The ? is not tokenized and indexed by default. In order to index it, you have to create a lexer that sets it as a printjoin, then use that lexer in your index creation. Also, the ? means fuzzy to Oracle Text. In order to treat ? as regular text, you have to escape it by either placing a \ in front of it or enclosing the entire token between { and }. Please see the demonstration below.

-- table and test data:
SCOTT@orcl_11gR2> create table t (report_text  clob)
  2  /

Table created.

SCOTT@orcl_11gR2> insert into t values (
  2  '? 459.2N Inferior Vena Cava Obstruction
  3  Cont lovenox but suspect IVC clot is exacerbating LE edema')
  4  /

1 row created.

SCOTT@orcl_11gR2> insert into t values ('?clot')
  2  /

1 row created.

SCOTT@orcl_11gR2> insert into t values ('other data')
  2  /

1 row created.

SCOTT@orcl_11gR2> select * from t
  2  /

REPORT_TEXT
--------------------------------------------------------------------------------
? 459.2N Inferior Vena Cava Obstruction
Cont lovenox but suspect IVC clot is exacerbating LE edema

?clot
other data

3 rows selected.


-- lexer, index, and what is tokenized and indexed:
SCOTT@orcl_11gR2> begin
  2    ctx_ddl.create_preference ('test_lex', 'basic_lexer');
  3    ctx_ddl.set_attribute ('test_lex', 'printjoins', '?');
  4  end;
  5  /

PL/SQL procedure successfully completed.

SCOTT@orcl_11gR2> create index t_idx on t (report_text)
  2  indextype is ctxsys.context
  3  parameters ('lexer test_lex')
  4  /

Index created.

SCOTT@orcl_11gR2> select token_text from dr$t_idx$i
  2  /

TOKEN_TEXT
----------------------------------------------------------------
459.2N
?CLOT
CAVA
CLOT
CONT
DATA
EDEMA
EXACERBATING
INFERIOR
IVC
LE
LOVENOX
OBSTRUCTION
OTHER
SUSPECT
VENA

16 rows selected.


-- sample queries:
SCOTT@orcl_11gR2> Select * from  t
  2  where   contains (report_text, 'near ((clot,  inferior vena cava), 10)') > 0
  3  /

REPORT_TEXT
--------------------------------------------------------------------------------
? 459.2N Inferior Vena Cava Obstruction
Cont lovenox but suspect IVC clot is exacerbating LE edema


1 row selected.

SCOTT@orcl_11gR2> Select * from  t
  2  where not contains (report_text, '\?clot') > 0
  3  /

REPORT_TEXT
--------------------------------------------------------------------------------
? 459.2N Inferior Vena Cava Obstruction
Cont lovenox but suspect IVC clot is exacerbating LE edema

other data

2 rows selected.

SCOTT@orcl_11gR2> Select * from  t
  2  where not contains (report_text, '{?clot}') > 0
  3  /

REPORT_TEXT
--------------------------------------------------------------------------------
? 459.2N Inferior Vena Cava Obstruction
Cont lovenox but suspect IVC clot is exacerbating LE edema

other data

2 rows selected.

SCOTT@orcl_11gR2> Select * from  t
  2  where   contains (report_text, 'near ((clot,  inferior vena cava), 10)') > 0
  3  and not contains (report_text, '\?clot') > 0
  4  /

REPORT_TEXT
--------------------------------------------------------------------------------
? 459.2N Inferior Vena Cava Obstruction
Cont lovenox but suspect IVC clot is exacerbating LE edema


1 row selected.

SCOTT@orcl_11gR2> Select * from  t
  2  where   contains (report_text, 'near ((clot,  inferior vena cava), 10)') > 0
  3  and not contains (report_text, '{?clot}') > 0
  4  /

REPORT_TEXT
--------------------------------------------------------------------------------
? 459.2N Inferior Vena Cava Obstruction
Cont lovenox but suspect IVC clot is exacerbating LE edema


1 row selected.

Re: Text search to eliminate "?" using Contains Clause [message #566330 is a reply to message #566329] Wed, 12 September 2012 15:55 Go to previous messageGo to next message
rajivn786
Messages: 161
Registered: January 2010
Senior Member
Barbara,

Thanks for the excellent explanation...I already had an index on report_text with stop list...

   create /*+ parallel(t 32) */  index rep_idx on t(report_text) indextype
            is ctxsys.context parameters ('stoplist ctxsys.empty_stoplist')
        parallel 32;


So how do I add the lexer to the above>>

Re: Text search to eliminate "?" using Contains Clause [message #566331 is a reply to message #566330] Wed, 12 September 2012 16:03 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
begin
  ctx_ddl.create_preference ('test_lex', 'basic_lexer');
  ctx_ddl.set_attribute ('test_lex', 'printjoins', '?');
end;
/
create /*+ parallel(t 32) */ index rep_idx on t (report_text)
indextype is ctxsys.context
parameters 
  ('lexer     test_lex
    stoplist  ctxsys.empty_stoplist')
parallel 32;

Re: Text search to eliminate "?" using Contains Clause [message #566332 is a reply to message #566331] Wed, 12 September 2012 16:15 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
Or, if you don't want to drop and recreate the whole index, you can alter and rebuild, which might be faster:

begin
  ctx_ddl.create_preference ('test_lex', 'basic_lexer');
  ctx_ddl.set_attribute ('test_lex', 'printjoins', '?');
end;
/
alter index rep_idx rebuild parameters ('replace lexer test_lex')
/

Re: Text search to eliminate "?" using Contains Clause [message #566333 is a reply to message #566331] Wed, 12 September 2012 16:17 Go to previous message
rajivn786
Messages: 161
Registered: January 2010
Senior Member
Thanks Barbara...As always u come up with the correct results...
Previous Topic: ORACLE Text Search -DEFINESCORE with COMPLETION
Next Topic: PDF Document search & Indexing NOT WORKING
Goto Forum:
  


Current Time: Thu Mar 28 04:56:07 CDT 2024