Home » Server Options » Text & interMedia » 19c Orace Text Index(CTXSYS.CONTEXT ) - String search anomaly (Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production)
19c Orace Text Index(CTXSYS.CONTEXT ) - String search anomaly [message #679399] Thu, 27 February 2020 09:32 Go to next message
sabarijsnv
Messages: 1
Registered: February 2020
Junior Member
We have a table with a blob column that stores the blob of various files like PDF, TIF, WORD, etc and the blob field has Oracle text index INDEXTYPE IS CTXSYS.CONTEXT for us to be able to search for text within the blob’s.
We are seeing that the search query using the CONTAINS is giving unexpected results.

Below are the steps we followed:
1) Create the table with the CTXSYS.CONTEXT index on the blob field
2) Populate the table with one blob (PDF) which is OCR’d and has words like ‘executed’, ‘correspondence’, ‘purchase’, ‘original’, ‘marathon’ BUT doesn’t have the word ‘w2’
3) Execute ctx_ddl.sync_index to sync the index.
4) Run the search query.

Why would the search results be weird? The same works as expected in Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production.
I have compared all the stopwords between 19 vs 12.1.0.2 and couldn't any anything different.

Thanks,

Below are the scripts.
======================================================================
-- TABLE SCRIPT

CREATE TABLE LAGN_IMG.OTXT_ORC_IMAGES_TGT_TEST4
(
ISN NUMBER NOT NULL,
DOCUMENT_BLOB BLOB,
DOCUMENT_BLOB_FORMAT VARCHAR2(30 BYTE)
)
LOB (DOCUMENT_BLOB) STORE AS BASICFILE (
TABLESPACE LAG_IMAGES
DISABLE STORAGE IN ROW
CHUNK 8192
RETENTION
NOCACHE
LOGGING
STORAGE (
INITIAL 64K
NEXT 1M
MINEXTENTS 1
MAXEXTENTS UNLIMITED
PCTINCREASE 0
BUFFER_POOL DEFAULT
))
TABLESPACE LAG_IMAGES
PCTUSED 0
PCTFREE 10
INITRANS 1
MAXTRANS 255
STORAGE (
INITIAL 64K
NEXT 1M
MINEXTENTS 1
MAXEXTENTS UNLIMITED
PCTINCREASE 0
BUFFER_POOL DEFAULT
)
LOGGING
NOCOMPRESS
NOCACHE
MONITORING;


CREATE INDEX LAGN_IMG.DOC_IMG_BLOB_TGT_TEST4 ON LAGN_IMG.OTXT_ORC_IMAGES_TGT_TEST4
(DOCUMENT_BLOB)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS(' DATASTORE CTXSYS.DEFAULT_DATASTORE
FORMAT COLUMN DOCUMENT_BLOB_FORMAT
NOPOPULATE ');

CREATE UNIQUE INDEX LAGN_IMG.PK_OTXT_ORC_IMAGES_TGT_TEST4 ON LAGN_IMG.OTXT_ORC_IMAGES_TGT_TEST4
(ISN)
LOGGING
TABLESPACE LAG_DATA
PCTFREE 10
INITRANS 2
MAXTRANS 255
STORAGE (
INITIAL 64K
NEXT 1M
MINEXTENTS 1
MAXEXTENTS UNLIMITED
PCTINCREASE 0
BUFFER_POOL DEFAULT
);

ALTER TABLE LAGN_IMG.OTXT_ORC_IMAGES_TGT_TEST4 ADD (
CONSTRAINT PK_OTXT_ORC_IMAGES_TGT_TEST4
PRIMARY KEY
(ISN)
USING INDEX LAGN_IMG.PK_OTXT_ORC_IMAGES_TGT_TEST4
ENABLE VALIDATE);

======================================================================

-- INDEX OPTIMIZE
BEGIN
ctx_ddl.sync_index ('DOC_IMG_BLOB_TGT_TEST4'
,'500M'
,NULL
,4);
END;
/
======================================================================

- SEARCH QUERY

SELECT COUNT (1)
FROM OTXT_ORC_IMAGES_TGT_TEST4 i;

COUNT(1)
----------
1
1 row selected.

- The following query returns 1 but the file doesn’t have the word ‘w2’
SELECT COUNT (1)
FROM OTXT_ORC_IMAGES_TGT_TEST4 i
WHERE 1 = 1 AND isn = 29035 AND CONTAINS (i.document_blob, 'w2') > 0;

COUNT(1)
----------
1
1 row selected.


- The following query returns 0 but the file has all the search words
SELECT COUNT (1)
FROM OTXT_ORC_IMAGES_TGT_TEST4 i
WHERE 1 = 1
AND isn = 29035
AND CONTAINS (
i.document_blob
,'marathon or executed or purchase or original or correspondence') >
0;

COUNT(1)
----------
0
1 row selected.

Re: 19c Orace Text Index(CTXSYS.CONTEXT ) - String search anomaly [message #679400 is a reply to message #679399] Thu, 27 February 2020 09:52 Go to previous message
Michel Cadot
Messages: 67227
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Welcome to the forum.
Please read the OraFAQ Forum Guide and How to use [code] tags and make your code easier to read.
Make sure that lines of code do not exceed 100 characters when you format.
Indent the code, use code tags and align the columns in result.

Never post schema or tablespace name or any segment storage parameter: we have not the same in our place.

Previous Topic: pixel (rgb) values extraction function
Next Topic: ctxhx process and Oracle Text indexes
Goto Forum:
  


Current Time: Sat Jul 04 19:54:25 CDT 2020