Home » Server Options » Text & interMedia » PDF to HTML convert using ctxsys.auto_filter different result db 11.2 and 12.1 (Database 11.2.0.1.0 / 12.1.0.1.0)
icon5.gif  PDF to HTML convert using ctxsys.auto_filter different result db 11.2 and 12.1 [message #650141] Sun, 17 April 2016 04:25 Go to previous message
bwelter
Messages: 4
Registered: January 2012
Location: Netherlands
Junior Member
Converting the same PDF doc gives different result between Oracle 11.2 and 12.1.
Using plaintext => false to get HTML output

Code:
declare
l_blob blob; -- holding PDF
l_clob clob; -- result of conversion
begin
--loading blob with pdf:
...
-- set policy:
ctx_ddl.create_policy('test_policy','ctxsys.auto_filter');
......
-- convert PDF:
ctx_doc.policy_filter( policy_name => 'test_policy' , document => l_blob , restab => l_clob , plaintext => false);
l_clob := replace(trim(g_clob), chr(13), chr(10));
l_clob := replace(g_clob, chr(10), chr(32) || '<<EOL>>' || chr(10)||'<<BOL>>');
....
end;

In the Oracle 12 database I get in l_clob:
<<BOL>><div class="c" style="top:592px;left:218px;font-size:9px;font-family:Arial, sans-serif;" <<EOL>>
<<BOL>>>TRANSFORMER SINGLE PHASE, PR AC440V SEC AC220/5,</div> <<EOL>>
<<BOL>><div class="c" style="top:592px;left:38px;font-size:9px;font-family:Arial, sans-serif;" <<EOL>>


In the Oracle 11 database I get with the same PDF the following result in l_clob:
<<BOL>> <<EOL>>
<<BOL>><p><font size="1" face="Arial">TRANSFORMER SINGLE PHASE, PR AC440V SEC AC220/5,</font></p> <<EOL>>
<<BOL>> <<EOL>>

I explicitly need this part of the converted PDF content:
..top:592px;left:218px..

Maybe it has something to do with settings?
What is the solution?

NB: I am aware of the fact that not all PDF documents contain nicely formatted texts and x-y positions. For my purpose now this is however a good solution.

[Updated on: Sun, 17 April 2016 04:27]

Report message to a moderator

 
Read Message icon5.gif
Read Message
Read Message
Read Message
Read Message
Previous Topic: contains query not returning expected results
Next Topic: Fulltext search
Goto Forum:
  


Current Time: Tue Apr 23 18:29:17 CDT 2024