Home » Developer & Programmer » Precompilers, OCI & OCCI » extproc OCI LOB Question (11g2 win32)
extproc OCI LOB Question [message #475196] Mon, 13 September 2010 12:04 Go to next message
cikic
Messages: 12
Registered: September 2006
Location: Austria
Junior Member
Hello!

I have never done something with C and would need some help. I need to bind libtidy tidy.sourceforge.net to oracle and I was able to compile a running example with char* and varchar2. But in real case I need to use clob, and I have no idea how to do so - since I was not able to find helpful documents on web.

Can you help me out (see comments):
//test3.dll

#include "tidy.h" 
#include "buffio.h" 
#include <stdio.h> 
#include <errno.h> 
#include <oci.h> 
#include <ociextp.h>

void parseTidy( 
   OCIExtProcContext *ctx 
  ,OCILobLocator *clobinput 
  ,int *rc 
  ,OCILobLocator **cloboutxml 
  ,OCILobLocator **clobouterr 
  /* ... alle options ... */ 
) 
{ 
  // Need help with this one:
  //char *input = OCILobRead "<title>Foo</title><p>Foo!"; 

  char *input = "<title>Foo</title><p>Foo!"; // just to test
  TidyBuffer output = {0}; 
  TidyBuffer errbuf = {0}; 
  Bool ok; 

  TidyDoc tdoc = tidyCreate();                     // Initialize "document" 

  /* set options */ 
  ok = tidyOptSetBool( tdoc, TidyXhtmlOut, yes );  // Convert to XHTML 
  

  *rc = tidySetErrorBuffer( tdoc, &errbuf );        // Capture diagnostics 
  if ( *rc >= 0 ) 
    *rc = tidyParseString( tdoc, input );           // Parse the input 
  if ( *rc >= 0 ) 
    *rc = tidyCleanAndRepair( tdoc );               // Tidy it up! 
  if ( *rc >= 0 ) 
    *rc = tidyRunDiagnostics( tdoc );               // Kvetch 
  if ( *rc > 1 )                                    // If error, force output. 
    *rc = ( tidyOptSetBool(tdoc, TidyForceOutput, yes) ? rc : -1 ); 
  if ( *rc >= 0 ) 
    *rc = tidySaveBuffer( tdoc, &output );          // Pretty Print 

  // And need help with this one:
  //clobouterr = errbuf.bp 
  //cloboutxml = output.bp 

  tidyBufFree( &output ); 
  tidyBufFree( &errbuf ); 
  tidyRelease( tdoc ); 
}


SQL+
create library TEST3_LIB is 'C:\app\XPMUser\product\11.2.0\dbhome_1\BIN\test3.dll';
/

set serveroutput on

declare 
  o_xml clob := ' ';
  o_err clob := ' ';
  rc pls_integer := -1;

  procedure test( 
     i_xml IN CLOB   
    ,o_rc  OUT PLS_INTEGER 
    ,o_xml IN OUT CLOB 
    ,o_err IN OUT CLOB 
  )   
  AS LANGUAGE C 
  NAME "parseTidy" 
  LIBRARY test3_lib 
  WITH CONTEXT 
  PARAMETERS ( 
     CONTEXT 
    ,i_xml
    ,o_rc BY REFERENCE 
    ,o_xml BY REFERENCE 
    ,o_err BY REFERENCE 
  ); 
begin 
  test('<title>Foo</title><p>Foo!',rc,o_xml,o_err);
  dbms_output.put_line(rc);
  rollback;
end; 
/


Thank you very much!
Christian
Re: extproc OCI LOB Question [message #475197 is a reply to message #475196] Mon, 13 September 2010 12:08 Go to previous messageGo to next message
Michel Cadot
Messages: 68624
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Maybe you should mail to tidy author.

But why do you use extproc? What is your actual problem?

Regards
Michel
Re: extproc OCI LOB Question [message #475198 is a reply to message #475197] Mon, 13 September 2010 12:24 Go to previous messageGo to next message
cikic
Messages: 12
Registered: September 2006
Location: Austria
Junior Member
On the tidyside everythink is fine. I have actually no clue of how to read write to a clob in C language.

How do I have to do this - not working pseudo code

char *input = OCILobRead(clobinput); 


and how do I have to write to a clob:
OCILobRead(*clobouterr, errbuf.bp); // errbuf.bp is type of byte*


Thanks
Re: extproc OCI LOB Question [message #475199 is a reply to message #475198] Mon, 13 September 2010 12:39 Go to previous messageGo to next message
Michel Cadot
Messages: 68624
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Once again:
Quote:
But why do you use extproc? What is your actual problem?

If I show you you don't need such complex and unsafe stuff does this not answer the question?

Regards
Michel
Re: extproc OCI LOB Question [message #475200 is a reply to message #475199] Mon, 13 September 2010 12:52 Go to previous messageGo to next message
cikic
Messages: 12
Registered: September 2006
Location: Austria
Junior Member
As long as you do not offer the java solution jtidy, everything will be welcome Wink Two reasons: 1st I will be compatible to XE edition and therefore I have to use extproc. 2nd I have tried allready to loadjava jtidy.jar (not knowing xe do not support java) ... but it took me one day to solve the dependecies ... this cannot be a "better" solution anyway Smile

Thanks
Chris

PS I need to parse html content via XDB - therefore I need to clean html to correct xml
Re: extproc OCI LOB Question [message #475201 is a reply to message #475200] Mon, 13 September 2010 12:54 Go to previous messageGo to next message
cikic
Messages: 12
Registered: September 2006
Location: Austria
Junior Member
and Quote:
What is your actual problem?


My knowledge of OCI is not good enough to read write to clob type, and therefore Iam begging for help Smile
Re: extproc OCI LOB Question [message #475202 is a reply to message #475200] Mon, 13 September 2010 12:59 Go to previous messageGo to next message
Michel Cadot
Messages: 68624
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Quote:
therefore I need to clean html to correct xml

It may be possible in PL/SQL, depends on the html.
See an example there:
http://www.orafaq.com/forum/m/467784/102589/?#msg_467784

Regards
Michel
Re: extproc OCI LOB Question [message #475207 is a reply to message #475202] Mon, 13 September 2010 13:25 Go to previous messageGo to next message
cikic
Messages: 12
Registered: September 2006
Location: Austria
Junior Member
Wow I did not expect you do this via plan vanilla plsql. As far as I read your posted code I am not sure if this can hanlde malformed html/xml like i.e. "<title>Foo</title><p>Foo!" -> end tag missing.

a quick test shows not

set def off
set serveroutput on

declare 
  l_page clob := '<title>Foo</title><p>Foo!';

  procedure normalize
    --Normalize the l_page content to be a simple "TABLE" XML page
  is
  begin
    -- Replace any contiguous space string by a single space
    l_page := regexp_replace(l_page, '[[:space:]]+', ' ');
    -- Replace "&nbsp;" string by a space (Oracle seems to not like "&nbsp;")
    l_page := replace(l_page, '&nbsp;', ' ');
    -- Remove IMG element (any character case)
    l_page := regexp_replace(l_page, '<IMG[^>]+>', '', 1, 0, 'i');
    -- Remove all attributes in tags
    l_page := regexp_replace(l_page, '[[:alpha:]]+=[^>]+', '');
    -- Remove <SUP> parts (references to footnote in page) (any character case)
    l_page := regexp_replace(l_page, '<SUP *>[[:digit:]]+</SUP *>', '', 1, 0, 'i');
    -- Remove <A> tags (any character case)
    l_page := regexp_replace(l_page, '<[/]{0,1}A *>', '', 1, 0, 'i');
    -- Remove <DIV> tags (any character case)
    l_page := regexp_replace(l_page, '<[/]{0,1}DIV *>', '', 1, 0, 'i');
    -- Put remaining tags in upper case as Oracle XML query is case sensitive
    l_page := replace(l_page, '</table>', '</TABLE>');
    l_page := replace(l_page, '<tbody>', '<TBODY>');
    l_page := replace(l_page, '</tbody>', '</TBODY>');
    l_page := regexp_replace(l_page, 'td *>', 'TD>');
    l_page := regexp_replace(l_page, 'tr *>', 'TR>');
  end;
begin
  normalize;
  dbms_output.put_line(l_page);
end;
/


ok no suprise here since I can see in the code just some regexp replaces.
Good way, what is missing is: that I receive malformed xml(html) and btw I cannot stripe tags for webscraping purposes. Is there a built in Oracle tool to "tidy" malformed xml? Rewrite libtidy in pl/sql seems to workloaded to me ...

Many thanks your link didnt solve my actual problem but helped me a lot on other issues! Do you have more ideas how to clean malformed xml?
Chris



Re: extproc OCI LOB Question [message #475208 is a reply to message #475207] Mon, 13 September 2010 13:35 Go to previous messageGo to next message
Michel Cadot
Messages: 68624
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
This code was specific to Littlefoot's case.
I don't know tidy but depending on your case I think you can do it in PL/SQL with not too much effort.
Of course if you want to gobble ANY html and transform it to xml it could be much harder, html and xml are very different and have different purpose.

Regards
Michel

[Updated on: Mon, 13 September 2010 13:37]

Report message to a moderator

Re: extproc OCI LOB Question [message #475213 is a reply to message #475208] Mon, 13 September 2010 14:13 Go to previous messageGo to next message
cikic
Messages: 12
Registered: September 2006
Location: Austria
Junior Member
I want to do some webscraping like tasks (reread dependend websites and get evey link or somethink like that) - with dynamic (pl)sql. So I do not know (at compile time) what website the parser will get and what the user wants to parse ...

... sooo I think back to libtidy, this works with every html and is under maintenance (i.e. html 5) ... even if I will be able to clean html via plsql with less effort I have to maintenence the code by myself ...

you do not know how to handle clob in C? Smile)

Chris
Re: extproc OCI LOB Question [message #475218 is a reply to message #475213] Mon, 13 September 2010 14:24 Go to previous messageGo to next message
Michel Cadot
Messages: 68624
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
This is explained in
Oracle C++ Call Interface Programmer's Guide

Regards
Michel
Re: extproc OCI LOB Question [message #475499 is a reply to message #475196] Wed, 15 September 2010 05:40 Go to previous messageGo to next message
cikic
Messages: 12
Registered: September 2006
Location: Austria
Junior Member
Finally I was able to get the library wokring. If someone is interested to the code, I have sumitted the whole code as patch to libtidy - I hope they can take profit from it Smile

https://sourceforge.net/tracker/?func=detail&aid=3066732&group_id=27659&atid=390965

Cheers
Chris
Re: extproc OCI LOB Question [message #475502 is a reply to message #475499] Wed, 15 September 2010 05:49 Go to previous message
Michel Cadot
Messages: 68624
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Thanks for the feedback and sharing the code.

Regards
Michel
Previous Topic: XMLELEMENT TAG
Next Topic: OCIStmtFetch2() nrows argument
Goto Forum:
  


Current Time: Thu Mar 28 08:38:10 CDT 2024