Ignore whitespace for now, annotations and other changes for a large schema by charlesmoore99 · Pull Request #48 · ekrich/exip

charlesmoore99 · 2026-02-25T22:00:23Z

I don't write a lot of PRs so please be gentle.

I need to use exipg to generate a schema aware exi processor. the xsd is large. (~7500 complex type and elements and a bunch of abstract base complex types split across two files). I uses nagasina to generate .xsd.exi files for the two xsds and fed them into exipg -static -schema=one.xsd.exi,two.xsd.exi exi_proc.c &2>1

I ran into a couple problems with exipg.

First, crashes. some of the buffers were too small for the xsd filenames on the command line, and some of the internal buffers were too small for some of the rather verbose type names .

Second, the two XSDs were loaded up with whitespace and annotation tags which exipg wasn't able to process.

I inflated the buffer sizes and added code to skip annotation and whiteSpace tags. It works for my bloated xsds and thought I should PR this before I got pulled away to a new task so someone else might be able to find value in it.... Or so that someone can tell me there was a simpler way to do this.

Here's what this PR does

Increases buffer sizes to handle large filenames and element names
Moves an argIndex inside a while loop so that it is not incremented when an annotation is skipped (skipping the annotation does not allocate memory for it in the dyn array, on cleanup that dyn array gets free'd which results in unallocated memory being free'd, which causes a crash and truncates the output.c file).
Adds a string length guard to limit string length to buffer length on a string copy

increases buffer sizes to handle large filenames and elementnames Moves argIndex inside a while loop so that it is not increments when an annotation is skipped. Added a string length guard to limit string length to buffer length

ekrich · 2026-02-26T23:42:16Z

Hey no worries at all. I adopted this repo as we are trying to use it at work so I am not a super expert EXI but learning.

A few questions.

Do you need to use C for your project or can you use Java or Scala?
Did you take a look at https://ekrich.github.io/exip/exip-user-guide.pdf, the last section for Schema Information?
How did you find out about this repo?

If you can use Java then using EXIficient would be a good idea since EXIP is not production quality or fully functional. More info here: https://exificient.github.io/java/ Either way I would use the EXIficient GUI project to transform your schema into EXI for EXIP.

Forgive me if you already know but when EXIP talks about out-of-band options like the -opts argument, this means there is no EXI header (literally EXI$) so you must know the encode options and supply those options for decode. When you use the EXI header and options, they are specified in the header and thus the decoder knows what to do.

So using the EXIficient GUI you can encode your XSD - if you use the EXI header then when you use exipg then you won't have to supply the -opts. Select Configure Advanced EncodingOptions and select the first two options for the EXI cookie and EXI options. If you don't select anything else (all defaults) you should be good. You shouldn't have any comments or other things you don't need and then maybe things will work as you expect.

I am wondering if you try this, you may not need to change EXIP. I am not opposed to making changes but I would need to be really confident about the changes.

Edit: where did you get the Nasagena?

ekrich · 2026-02-27T19:01:53Z

I think I misunderstood a few things. The annotations/documents are not the same as the documentation you can put in and XML instance - I was confusing schema encoding to EXI vs document encoding.

charlesmoore99 · 2026-03-02T20:46:44Z

Thanks for taking the time to respond.

to answer your questions:

If it were just me, I'd probably use Java, but the shop uses mostly c++ and python. Python is too slow for this particular use case, and I don't really want to add Java to the tech stack.
I have. It has been helpful.
I started out using exip-0.5.4 from sourceforge and made the changes for whitespace and annotation using that code base. Then I found your project. I compared the code bases and it looked like they were equivalent enough that the whitespace and annotation changes would work, so I forked, updated to c99, made the changes, and PR'd them. I've tested them against a number of our doc types. To find your project, I think I googled exi 0.5.5 on a whim. I want to say that the first time I looked for exi tools it didn't show up. but now it does every time.

I think I got the nagasena at openexi.sourceforge.net.

I have a quick and dirty java program that uses the nagasena and nagasenst-rt jar files to convert the XSD to and .xsd.exi (with the preserve prefixes option bit set and my namespace hard coded). This is for a project with a frequently updated onerously large xsd that is updated every couple of weeks. There's going to be a build pipeline for building the .xsd.exi

Using that .xsd.exi I have been able to both encode documents to exi and decode them back to xml. I've also tested with the Exificient GUI, and with nagasena to check interoperability. So far I'm able to decode messages exip enecded, but havent tried the other way around yet.

Note: I ran into (what I think is a bug) in Nagasena in that it decoded the documents to the wrong qname. Nagasena was setting the qname to the xst:type Overriding the decoders SAX XMLFilterImpl to use prefix:localname instead fixed it. but I'm left wondering if I'm doing something wrong with the encoding or if its a nagasena bug.

ekrich

Thank-you for submitting the changes. If we could get a small XSD as a reproducer so I could add a test and see for myself that would be really helpful.

In a code base like this some changes can make a big impact so I need to be very conservative. Also, my experience and expertise in this area is limited.

ekrich · 2026-03-20T16:34:48Z