Ignore whitespace for now, annotations and other changes for a large schema#48
Ignore whitespace for now, annotations and other changes for a large schema#48charlesmoore99 wants to merge 2 commits intoekrich:mainfrom
Conversation
increases buffer sizes to handle large filenames and elementnames Moves argIndex inside a while loop so that it is not increments when an annotation is skipped. Added a string length guard to limit string length to buffer length
|
Hey no worries at all. I adopted this repo as we are trying to use it at work so I am not a super expert EXI but learning. A few questions.
If you can use Java then using EXIficient would be a good idea since EXIP is not production quality or fully functional. More info here: https://exificient.github.io/java/ Either way I would use the EXIficient GUI project to transform your schema into EXI for EXIP. Forgive me if you already know but when EXIP talks about out-of-band options like the So using the EXIficient GUI you can encode your XSD - if you use the EXI header then when you use I am wondering if you try this, you may not need to change EXIP. I am not opposed to making changes but I would need to be really confident about the changes. Edit: where did you get the Nasagena? |
|
I think I misunderstood a few things. The annotations/documents are not the same as the documentation you can put in and XML instance - I was confusing schema encoding to EXI vs document encoding. |
|
Thanks for taking the time to respond. to answer your questions:
I think I got the nagasena at openexi.sourceforge.net. I have a quick and dirty java program that uses the nagasena and nagasenst-rt jar files to convert the XSD to and .xsd.exi (with the preserve prefixes option bit set and my namespace hard coded). This is for a project with a frequently updated onerously large xsd that is updated every couple of weeks. There's going to be a build pipeline for building the .xsd.exi Using that .xsd.exi I have been able to both encode documents to exi and decode them back to xml. I've also tested with the Exificient GUI, and with nagasena to check interoperability. So far I'm able to decode messages exip enecded, but havent tried the other way around yet. Note: I ran into (what I think is a bug) in Nagasena in that it decoded the documents to the wrong qname. Nagasena was setting the qname to the xst:type Overriding the decoders SAX XMLFilterImpl to use prefix:localname instead fixed it. but I'm left wondering if I'm doing something wrong with the encoding or if its a nagasena bug. |
ekrich
left a comment
There was a problem hiding this comment.
Thank-you for submitting the changes. If we could get a small XSD as a reproducer so I could add a test and see for myself that would be really helpful.
In a code base like this some changes can make a big impact so I need to be very conservative. Also, my experience and expertise in this area is limited.
| { | ||
| TRY(getComplexTypeProtoGrammar(ctx, &treeTEntry->entry->child, &pg)); | ||
| } | ||
| else if(treeTEntry->entry->child.entry->element == ELEMENT_ANNOTATION) |
There was a problem hiding this comment.
This change seems ok. The schema doesn't have documentation tags in side the annotation or does skipping here avoid that?
| } | ||
| else | ||
| { | ||
| destroyDynArray(&partGrammarTbl.dynArray); |
There was a problem hiding this comment.
If I understand this correctly, this was a memory leak and this is the fix?
| @@ -1775,17 +1781,20 @@ static errorCode getRestrictionSimpleProtoGrammar(BuildContext* ctx, QualifiedTr | |||
| // TODO: needs to be implemented. It is also needed for the XML Schema grammars | |||
| // COMMENT #SCHEMA#: ignore for now | |||
| DEBUG_MSG(INFO, DEBUG_GRAMMAR_GEN, ("\n>Type facet pattern is not implemented: at %s, line %d.", __FILE__, __LINE__)); | |||
There was a problem hiding this comment.
Does this debug message show up normally or d you have to set DEBUG mode?
| { | ||
| SET_TYPE_FACET(newSimpleType.content, TYPE_FACET_WHITE_SPACE); | ||
| return EXIP_NOT_IMPLEMENTED_YET; | ||
| // skip whiteSpace |
There was a problem hiding this comment.
Here we need to leave those two lines. Since we are not implementing it now we should probably have a debug message like above on R1783.
| } | ||
| enumEntry = enumEntry->next; | ||
| enumIter++; | ||
| } |
There was a problem hiding this comment.
The change above seems fine but the diff would be less if we kept the same order. Put the ANNOTATION in the else if.
| if(strstr(argv[argIndex], "-schema") != NULL) | ||
| { | ||
| char *xsdList = argv[argIndex] + 7; | ||
| char *xsdList = argv[argIndex] + 8; |
There was a problem hiding this comment.
Need some help on the reason for this change.
| FILE *schemaFile; | ||
| BinaryBuffer buffer[MAX_XSD_FILES_COUNT]; // up to 10 XSD files | ||
| char schemaFileName[50]; | ||
| char schemaFileName[2048]; |
There was a problem hiding this comment.
Here we should probably use C stdio.h FILENAME_MAX.
|
|
||
| // Maximum number of characters in a variable name buffer | ||
| #define VAR_BUFFER_MAX_LENGTH 200 | ||
| #define VAR_BUFFER_MAX_LENGTH 2048 |
There was a problem hiding this comment.
Wondering why this value was chosen.
| char elemGrammar[20]; | ||
| char typeGrammar[20]; | ||
| char elemGrammar[256]; | ||
| char typeGrammar[256]; |
There was a problem hiding this comment.
This one is also a very large buffer change and other places in the code based use 20. It would be really helpful to have a small subset of your schema or equivalent as a reproducer so I could see the problems myself and add it to the tests.
| size_t cpyLen = str->length < (VAR_BUFFER_MAX_LENGTH-1) ? str->length : (VAR_BUFFER_MAX_LENGTH-1); | ||
| strncpy(displayStr, str->str, cpyLen); | ||
| displayStr[cpyLen] = '\0'; | ||
| } |
I don't write a lot of PRs so please be gentle.
I need to use exipg to generate a schema aware exi processor. the xsd is large. (~7500 complex type and elements and a bunch of abstract base complex types split across two files). I uses nagasina to generate .xsd.exi files for the two xsds and fed them into exipg -static -schema=one.xsd.exi,two.xsd.exi exi_proc.c &2>1
I ran into a couple problems with exipg.
First, crashes. some of the buffers were too small for the xsd filenames on the command line, and some of the internal buffers were too small for some of the rather verbose type names .
Second, the two XSDs were loaded up with whitespace and annotation tags which exipg wasn't able to process.
I inflated the buffer sizes and added code to skip annotation and whiteSpace tags. It works for my bloated xsds and thought I should PR this before I got pulled away to a new task so someone else might be able to find value in it.... Or so that someone can tell me there was a simpler way to do this.
Here's what this PR does