Skip to content

analysis/cdecl: custom parser for a large enough subset of C's declaration syntax.#1

Draft
eddyb wants to merge 3 commits intoFriz64:rewrite-xmlfrom
LykenSol:cdecl
Draft

analysis/cdecl: custom parser for a large enough subset of C's declaration syntax.#1
eddyb wants to merge 3 commits intoFriz64:rewrite-xmlfrom
LykenSol:cdecl

Conversation

@eddyb
Copy link
Copy Markdown

@eddyb eddyb commented Jan 28, 2024

For some background, see ash-rs#745 (comment).

Example dump output (from second commit): https://gist.github.com/eddyb/5ae0247468892ea7dd2a136ed3afa869.

I haven't spent much time trying to tweak the exact CType AST etc. - I assume that ash would want to also account for differences between C and Rust and other such things that I'm not familiar with.

So the goal here was to be lossless: all the information is included, in an easily-traversable form.

Copy link
Copy Markdown
Owner

@Friz64 Friz64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some very minor comments, but overall this looks fantastic. Thank you very much!

Comment thread analysis/src/cdecl.rs
};

// FIXME(eddyb) deduplicate qualifier parsing somehow.
let mut const_qualif = match left {
Copy link
Copy Markdown
Owner

@Friz64 Friz64 Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this be an Option<()> kind of feels unnecessary to me. It could just as well be a bool.

Here's a patch, to demonstrate what I'm talking about (click to expand)
diff --git a/analysis/src/cdecl.rs b/analysis/src/cdecl.rs
index 105ef89..dcf4522 100644
--- a/analysis/src/cdecl.rs
+++ b/analysis/src/cdecl.rs
@@ -1,4 +1,4 @@
-use std::num::NonZeroU8;
+use std::{mem, num::NonZeroU8};
 
 /// Identifier-category-aware minimal tokenization of a subset of C syntax,
 /// sufficient for parsing the C declarations used in `vk.xml`.
@@ -155,15 +155,6 @@ impl<'a> CDecl<'a> {
     ) -> Result<CDecl<'a>, CDeclParseErrorKind<'a>> {
         use CDeclParseErrorKind as ErrorKind;
 
-        trait InsertIfNone<T> {
-            fn insert_if_none(&mut self, value: T) -> Option<&mut T>;
-        }
-        impl<T> InsertIfNone<T> for Option<T> {
-            fn insert_if_none(&mut self, value: T) -> Option<&mut T> {
-                self.is_none().then(|| self.insert(value))
-            }
-        }
-
         let (mut left, decl_name, mut right) = {
             let mut decl_names =
                 tokens
@@ -214,9 +205,9 @@ impl<'a> CDecl<'a> {
         let mut const_qualif = match left {
             [CTok::Kw("const"), rest @ ..] => {
                 left = rest;
-                Some(())
+                true
             }
-            _ => None,
+            _ => false,
         };
 
         let mut ty = CType::Base(match left {
@@ -262,14 +253,16 @@ impl<'a> CDecl<'a> {
             while let Some((&leftmost, after_leftmost)) = left.split_first() {
                 match leftmost {
                     CTok::Kw("const") => {
-                        const_qualif
-                            .insert_if_none(())
-                            .ok_or(ErrorKind::Multiple("const"))?;
+                        if const_qualif {
+                            return Err(ErrorKind::Multiple("const"));
+                        }
+
+                        const_qualif = true;
                     }
                     CTok::Punct('*') => {
                         ty = CType::Ptr {
                             implicit_for_decay: false,
-                            is_const: const_qualif.take().is_some(),
+                            is_const: mem::take(&mut const_qualif),
                             pointee: Box::new(ty),
                         };
                     }
@@ -336,7 +329,7 @@ impl<'a> CDecl<'a> {
                         };
                     }
                     [CTok::Punct('('), params @ .., CTok::Punct(')')] => {
-                        if const_qualif.is_some() {
+                        if const_qualif {
                             return Err(ErrorKind::Unused("const"));
                         }
 
@@ -375,12 +368,12 @@ impl<'a> CDecl<'a> {
         if let (CDeclMode::FuncParam, CType::Array { .. }) = (mode, &ty) {
             ty = CType::Ptr {
                 implicit_for_decay: true,
-                is_const: const_qualif.take().is_some(),
+                is_const: mem::take(&mut const_qualif),
                 pointee: Box::new(ty),
             };
         }
 
-        if const_qualif.is_some() {
+        if const_qualif {
             return Err(ErrorKind::Unused("const"));
         }

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had more uses for insert_if_none and there's a lot more things in the C syntax that work similarly, but are not used in here, I'll have to take a second look.

Booleans aren't great for stuff like this for a few different reasons, but I'm almost certain insert_if_none had more call sites before I refactored all of this to be more general and work for the function pointer typedefs, so I should clean it up.

Comment thread analysis/src/cdecl.rs Outdated
@eddyb
Copy link
Copy Markdown
Author

eddyb commented Mar 4, 2024

One thing I should really do, besides addressing the comments, is to go over the terminology used in comments/variable name, since I have mentally validated my approach against the kind of C grammar I used to rely on, and should try to actually use standard nomenclature instead of my L/R stuff.

One subtle thing is that L/R really are prefix/suffix to the declaration, because e.g. T *p, x; implies both *p : T and x : T separately, i.e. equivalent to T *p; and T x;.

The confounding factor is that T* is valid syntax on its own only in e.g. C++ templates (foo<T*>) or some C features (sizeof(T*)/typeof(T*) etc.), but in those positions T[n] is also valid, so they really act like the full declarator syntax for a single variable but omitting the name being declared, and that syntax isn't relevant here.

@Friz64 Friz64 force-pushed the rewrite-xml branch 3 times, most recently from 278849a to d610c44 Compare March 16, 2024 07:08
Comment thread analysis/src/xml.rs Outdated
@Friz64
Copy link
Copy Markdown
Owner

Friz64 commented Mar 16, 2024

I have a rebased and "improved" version of this branch up at https://github.com/Friz64/ash/tree/rewrite-xml-cdecl.

@eddyb eddyb marked this pull request as draft May 25, 2024 05:41
@eddyb
Copy link
Copy Markdown
Author

eddyb commented May 25, 2024

Working on this again, marking as draft to avoid accidental merging.

Copy link
Copy Markdown
Owner

@Friz64 Friz64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eddyb I've rebased the first commit (as part of ash-rs#867): 6e89a89

:)

Comment thread analysis/src/cdecl.rs
}

#[derive(Debug)]
pub struct UnsupportedCTok<'a>(&'a str);
Copy link
Copy Markdown
Owner

@Friz64 Friz64 Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should implement Error.

impl<'a> fmt::Display for UnsupportedCTok<'a> {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        write!(f, "unsupported C token {:?}", self.0)
    }
}

impl<'a> Error for UnsupportedCTok<'a> {}

@Friz64 Friz64 force-pushed the rewrite-xml branch 2 times, most recently from 16901ad to 9fe1ce2 Compare December 4, 2025 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants