Home » Server Options » Text & interMedia » Multi-lingual searches (Oracle 11gR2)
Multi-lingual searches [message #563330] Mon, 13 August 2012 00:48 Go to next message
srimansamjo
Messages: 2
Registered: August 2012
Location: Sydney, Australia
Junior Member
Hey Guys, We are building a multi-lingual application that need to cater to English, German and Arabic character sets. While input and rendering of the data (number/currency and date formats ) is not an issue, query requirement has me stumped!

I am sure Oracle Text and the use of lexers might have a solution to the problem I have.

Requirement - Say I have a column that stores the last name of an individual. This last name, depending on the geography where it is entered from, could contain Latin, German or Arabic character sets. This is fine. So, for example, Smith could be entered in all three character sets. Now the query function on this column should return data entered in all languages regardless of the current language and keyboard mapping chosen by the user.

I have seen the use of thesaurus package ctx_thes and the world lexer etc. Is it possible to do this?

I was thinking about having another column, for every such column that needs to have language-independent query requirement, shadowing the user-enterable column. Upon user entry somehow translate the entry into a generic language that would be used then for the query function. Not sure there is any smart language translators available - especially proper nouns.

Thanks in anticipation

Regards

Sridhar
Re: Multi-lingual searches [message #563351 is a reply to message #563330] Mon, 13 August 2012 02:34 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
Can you provide some sample data, what you would search for, and what you want that search to return based on the sample data?
Re: Multi-lingual searches [message #563453 is a reply to message #563351] Mon, 13 August 2012 21:09 Go to previous messageGo to next message
srimansamjo
Messages: 2
Registered: August 2012
Location: Sydney, Australia
Junior Member
Hey Barbara, Here is an example

insert into test( id, lastname )
values ( 1, 'Muller' );

insert into test( id, lastname )
values ( 2, 'Müller' );

insert into test( id, lastname )
values ( 3, '<Muller in arabic however its written>' );

Sure the table has many other entries. Now, the requirement is that when a user, with an English keyboard mapping, queries the table test for Muller, all 3 records should be returned. If mapped to the German keyboard, a query on Müller should return all 3 records and likewise for Arabic keyboard mapping.

With the thesaurus it may be possible but we will need to have the lookups done for all possible proper nouns and key words which may not be possible.

The other option I was wondering was if I could have a 'hidden' column alongside the 'lastname' and populate this automagically using some translation tools into one unified characterset, if available. If this were possible, I could route the query operation to the 'hidden' column regardless of what key board mapping and NLS is used by the user session.

Regards

Sridhar
Re: Multi-lingual searches [message #563457 is a reply to message #563453] Mon, 13 August 2012 22:11 Go to previous message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
You can use a lexer with base_letter conversion to automatically convert any characters with diacritical marks, such as ü to their base letter, such as u, for indexing and searching, as shown below. The rest of your requirements are unclear; I would require additional sample data.

SCOTT@orcl_11gR2> create table test
  2    (id	  number,
  3  	lastname  varchar2(30))
  4  /

Table created.

SCOTT@orcl_11gR2> insert into test( id, lastname )
  2  values ( 1, 'Muller' )
  3  /

1 row created.

SCOTT@orcl_11gR2> insert into test( id, lastname )
  2  values ( 2, 'Müller' )
  3  /

1 row created.

SCOTT@orcl_11gR2> insert into test( id, lastname )
  2  values ( 3, 'other' )
  3  /

1 row created.

SCOTT@orcl_11gR2> commit
  2  /

Commit complete.

SCOTT@orcl_11gR2> select * from test
  2  /

        ID LASTNAME
---------- ------------------------------
         1 Muller
         2 Müller
         3 other

3 rows selected.

SCOTT@orcl_11gR2> begin
  2    ctx_ddl.create_preference ('test_lex', 'basic_lexer');
  3    ctx_ddl.set_attribute ('test_lex', 'base_letter', 'yes');
  4    ctx_ddl.set_attribute ('test_lex', 'base_letter_type', 'generic');
  5  end;
  6  /

PL/SQL procedure successfully completed.

SCOTT@orcl_11gR2> create index test_idx
  2  on test (lastname)
  3  indextype is ctxsys.context
  4  parameters ('lexer  test_lex')
  5  /

Index created.

SCOTT@orcl_11gR2> select token_text from dr$test_idx$i
  2  /

TOKEN_TEXT
----------------------------------------------------------------
MULLER
OTHER

2 rows selected.

SCOTT@orcl_11gR2> select * from test where contains (lastname, 'Muller') > 0
  2  /

        ID LASTNAME
---------- ------------------------------
         1 Muller
         2 Müller

2 rows selected.

SCOTT@orcl_11gR2> select * from test where contains (lastname, 'Müller') > 0
  2  /

        ID LASTNAME
---------- ------------------------------
         1 Muller
         2 Müller

2 rows selected.



Previous Topic: (INSERT/UPDATE/MERGE/DELETE) Slow performance on table after creating CTXSYS.CONTEXT text index on o
Next Topic: ORACLE Text Search -DEFINESCORE with COMPLETION
Goto Forum:
  


Current Time: Thu Mar 28 05:27:37 CDT 2024