RE: Remove CDATA Blocks via Smooks

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

RE: Remove CDATA Blocks via Smooks

orbeonX

I was able to set up a class that extends the DefaultSerializer, and was able to remove the cdata from the text but cannot see the changes in my serialized bean.

 

Here is smooks config, I set up a resource-config using the new serializer, create my java bean mappings, and have a final visitor that routes the bean to a message queue.  In my serializier I have a printline that shows the cdata being removed from the file, but the xml message on the queue afterwards still has the CData. Thoughts.

 

<smooks-resource-list

      xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"

      xmlns:jb="http://www.milyn.org/xsd/smooks/javabean-1.2.xsd"

      xmlns:core="http://www.milyn.org/xsd/smooks/smooks-core-1.3.xsd"

      xmlns:g="http://www.milyn.org/xsd/smooks/groovy-1.1.xsd">

     

      <core:filterSettings type="SAX" defaultSerialization="false" />

     

      <resource-config selector="programs/program/title,programs/program/description,programs/program/id">       

            <resource>com.tms.relay.etl.visitor.NoCdataSAXElementSerializer</resource>

      </resource-config>

     

     

     

      <!-- Create the Mappable Program -->

      <jb:bean beanId="mappableProgram" class="com.tms.relay.loader.model.RelayProgramVO"      createOnElement="programs/program">

            <jb:value property="remoteProgramId" data="programs/program/id" />

            <jb:wiring property="descriptions" beanIdRef="descriptions" />

            <jb:wiring property="titles" beanIdRef="titles" />

      </jb:bean>

 

      <jb:bean beanId="titles" class="java.util.ArrayList" createOnElement="programs/program">

            <jb:wiring beanIdRef="title" />

      </jb:bean>

     

      <jb:bean beanId="title" class="com.tms.relay.loader.model.DescriptionVO" createOnElement="programs/program/title">

            <jb:value property="text" data="programs/program/title" />

      </jb:bean>

 

      <jb:bean beanId="descriptions" class="java.util.ArrayList" createOnElement="programs/program">

            <jb:wiring beanIdRef="description" />

      </jb:bean>

     

      <jb:bean beanId="description" class="com.tms.relay.loader.model.DescriptionVO" createOnElement="programs/program/description">

            <jb:value property="text" data="programs/program/description" /> 

            <jb:value property="language" data="junk/path" default="'English'" />

            <jb:value property="sequence" data="junk/path" default="1" />

      </jb:bean>

           

           

      <!-- Route mappableProgram to optimizer queue -->

      <resource-config selector="programs/program">       

            <resource>com.tms.relay.etl.visitor.OptimizerVisitor</resource>

            <param name="loadableType">MAPPING</param>

            <param name="mappingSchemeId">284</param>

            <param name="importToken">unit_vod</param>

            <param name="jmsPropertiesFile">file:/relay/etl/conf/activemq.properties</param>

            <param name="beanIdToRoute">mappableProgram</param> 

      </resource-config>

 

     

</smooks-resource-list>

 

 

Test XML File Contents:

<programs>

      <program>

            <id>1</id>

            <title>title 1 - no cdata in file</title>

            <description>desc 1 - no cdata in file</description>

      </program>

      <program>

            <id>2</id>

            <title><![CDATA[title 2 - cdata in file]]></title>

            <description><![CDATA[desc 2 - cdata in file]]></description>

      </program>

      <program>

            <id>3</id>

            <title>title 3 - no cdata in file</title>

            <description>desc 3 - no cdata in file</description>

      </program>

</programs>

 

 

Code:

public class NoCdataSAXElementSerializer extends DefaultSAXElementSerializer {

     

      private SAXVisitor writerOwner = this;

    private boolean rewriteEntities = true;

 

    public void setWriterOwner(SAXVisitor writerOwner) {

        this.writerOwner = writerOwner;

    }

 

    @ConfigParam(name = Filter.ENTITIES_REWRITE, defaultVal = "true")

    public void setRewriteEntities(boolean rewriteEntities) {

        this.rewriteEntities = rewriteEntities;

    }

     

      @Override

      public void onChildText(SAXElement element, SAXText text, ExecutionContext executionContext) throws SmooksException, IOException {

        System.out.println("IN MY NEW VISITOR - SHOULD REMOVE CDATA");

            writeStartElement(element);

        if(element.isWriterOwner(writerOwner)) {

            System.out.println("NEW VISITOR - ABOUT TO REMOVE CDATA");

            NoCdataSAXText saxText = new NoCdataSAXText(text);

            saxText.toWriter(element.getWriter(writerOwner), rewriteEntities);

        }

    }

 

}

------

public class NoCdataSAXText {

     

      private SAXText saxText;

     

      public NoCdataSAXText() {

           

      }

     

      public NoCdataSAXText(SAXText inSaxText) {

            saxText = inSaxText;

      }

     

     

      /**

     * Write the text to the supplied writer.

     * <p/>

     * It wraps the text based on its {@link #getType() type}.

     *

     * @param writer The writer.

     * @param encodeSpecialChars Encode special XML characters.

     * @throws IOException Write exception.

     */

    public void toWriter(Writer writer, boolean encodeSpecialChars) throws IOException {

        if(writer != null) {

            if(saxText.getType() == saxText.getType().TEXT) {

                if(encodeSpecialChars) {

                    XmlUtil.encodeTextValue(saxText.getCharacters(), saxText.getOffset(), saxText.getLength(), writer);

                } else {

                    writer.write(saxText.getCharacters(), saxText.getOffset(), saxText.getLength());

                }

            } else if(saxText.getType() == saxText.getType().COMMENT) {

                writer.write("<!--");

                writer.write(saxText.getCharacters(), saxText.getOffset(), saxText.getLength());

                writer.write("-->");

            } else if(saxText.getType() == saxText.getType().CDATA) {

                //writer.write("<![CDATA[");

                  StringBuffer sb = new StringBuffer();

                  for (char c : saxText.getCharacters()) {

                        sb.append(c);

                  }

                  System.out.println("REMOVING SAX - final text is: " + sb.toString());

                writer.write(saxText.getCharacters(), saxText.getOffset(), saxText.getLength());

                //writer.write("]]>");

            } else if(saxText.getType() == saxText.getType().ENTITY) {

                writer.write("&");

                writer.write(HTMLEntityLookup.getEntityRef(saxText.getCharacters()[0]));

                writer.write(';');

            }

        }

    }

 

}

 

 

-----Original Message-----

From: Tom Fennelly [hidden email]

Sent: Thursday, March 15, 2012 5:32 AM

To: [hidden email]

Subject: Re: [milyn-user] Remove CDATA Blocks via Smooks

 

Hi Patrick.

 

By default, smooks serializes all events in the SAX event stream, including CDATA etc.  It does this using a default element Visitor impl.  To override this, you need to implement a Visitor that overrides this default (or just extend the Default serializer and modify how the text events are serialized).

 

I'll assume you're using SAX filtering (as opposed to DOM filtering - SAX is the default now since v1.5).  Take a look at the DefaultSAXElementSerializer class (http://goo.gl/619t7).  You could override this class and modify the behavior of the onChildText method. 

The SAXText arg to this method holds the text for the text event.  The event type (CDATA etc is held in there in en enum).  You can then configure your extended DefaultSAXElementSerializer in your config.

 

Hope that helps.

 

T.

 

On 14/03/2012 19:07, Patrick Van Schaick wrote:

> I am trying to transform an XML document that has reoccurring CDATA

> blocks. The CDATA blocks can be in the value of any tag. Is there any

> way to remove CDATA blocks so that transformation won’t include it?

> Example:

> <start>

> <tag><![CDATA[String context goes here]]></tag>

> <tag>String context goes here</tag>

> <parent>

> <child>><![CDATA[String context goes here]]></child

> </parent>

> </start>

> Thank you in advance.

> -Patrick

Reply | Threaded
Open this post in threaded view
|

RE: Remove CDATA Blocks via Smooks

leechj
I know this a few years too late, but I recently had this same problem. Here is how I solved it. Would appreciate a cleaner answer if there is one however.

I had to separate out the parsing. Putting the resource-config in the same file as the xml-to-java information was producing the behavior you described. So I did something like this:


        Smooks smooks = new Smooks("smooks/xml-removecdata-xml.xml");
        StringResult xmlResult = new StringResult();
        smooks.filterSource(new StreamSource(byteArrayInputStream), xmlResult);
        byteArrayInputStream = new ByteArrayInputStream(xmlResult.getResult().getBytes());
        smooks = new Smooks("smooks/xml-to-java.xml");


The xml-removecdata-xml.xml has the configuration for resource-config.

<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
                      xmlns:core="http://www.milyn.org/xsd/smooks/smooks-core-1.4.xsd">

    <core:filterSettings type="SAX" defaultSerialization="true" terminateOnException="true"
                     readerPoolSize="3" closeSource="true" closeResult="true" rewriteEntities="true"  />

    <resource-config selector="field1, field2, etc">
        <resource>com.mycompany.CdataSAXElementSerializer</resource>
    </resource-config>

</smooks-resource-list>
Reply | Threaded
Open this post in threaded view
|

RE: Remove CDATA Blocks via Smooks

Tom Fennelly
Remove CDATA blocks from where? Remove them completely or just unwrap
them? Would a custom SAX parser help?

On 30/12/2014 16:28, leechj wrote:

> I know this a few years too late, but I recently had this same problem. Here
> is how I solved it. Would appreciate a cleaner answer if there is one
> however.
>
> I had to separate out the parsing. Putting the resource-config in the same
> file as the xml-to-java information was producing the behavior you
> described. So I did something like this:
>
>
>          Smooks smooks = new Smooks("smooks/xml-removecdata-xml.xml");
>          StringResult xmlResult = new StringResult();
>          smooks.filterSource(new StreamSource(byteArrayInputStream),
> xmlResult);
>          byteArrayInputStream = new
> ByteArrayInputStream(xmlResult.getResult().getBytes());
>          smooks = new Smooks("smooks/xml-to-java.xml");
>
>
> The xml-removecdata-xml.xml has the configuration for resource-config.
>
> <smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
>                      
> xmlns:core="http://www.milyn.org/xsd/smooks/smooks-core-1.4.xsd">
>
>      <core:filterSettings type="SAX" defaultSerialization="true"
> terminateOnException="true"
>                       readerPoolSize="3" closeSource="true"
> closeResult="true" rewriteEntities="true"  />
>
>      <resource-config selector="field1, field2, etc">
>          <resource>com.mycompany.CdataSAXElementSerializer</resource>
>      </resource-config>
>
> </smooks-resource-list>
>
>
>
>
> --
> View this message in context: http://milyn.996300.n3.nabble.com/RE-Remove-CDATA-Blocks-via-Smooks-tp359p8969.html
> Sent from the milyn - user mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe from this list, please visit:
>
>      http://xircles.codehaus.org/manage_email
>
>


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email