Skip to content

RFC: Derive Struct and Typed instances for structs using GHC.Generics #539

@RyanGlScott

Description

@RyanGlScott

One of the more tedious aspects of using structs in Copilot is the need to manually define Struct and Typed instances for each Haskell data type that represents a struct. Each of the instances requires an amount of boilerplate code that is grows linearly with the number of fields in the data type. (For an example of this, see this struct-related example in the copilot repo.)

I think the experience of using structs would be greatly improved if GHC could automate as much of the process of defining these instances as possible. Luckily, such a thing is possible using GHC.Generics, and I propose that we make it possible to leverage GHC.Generics to derive Struct and Typed instances. (This proposal was inspired the similar work in #516, although this proposal differs from that work slightly.)

Recap: the Struct and Typed classes

To illustrate what this proposal will do, let's use the Volts data type from the copilot repo as a running example:

-- | Definition for `Volts`.
data Volts = Volts
{ numVolts :: Field "numVolts" Word16
, flag :: Field "flag" Bool
}

This is a simple struct-related data type with two fields, nunVolts and flag. In order to profitably use Volts, it requires an instance of the Struct class, which looks like this:

-- | `Struct` instance for `Volts`.
instance Struct Volts where
typeName _ = "volts"
toValues volts = [ Value Word16 (numVolts volts)
, Value Bool (flag volts)
]

Struct has three methods:

  • typeName: The name of the struct to use in the Copilot-generated code (usually C).
  • toValues: Converts all of the Fields to Values.
  • updateField: Dispatches on a particular Field and updates the Value contained inside. (Note that updateField isn't explicitly implemented in the example above, but you can see an example of how one would do it here.)

The toValues and updateField methods are extremely mechanical to implement, as their implementations are determined entirely by the type and names of each Field. These are prime candidates for automating via GHC.Generics.

The typeName method is less mechanical, as it requires a choice on the behalf of the programmer to determine how the struct will be named in the Copilot-generated code. In the example above, the programmer chooses to use the all-lowercase name volts instead of the CamelCase name Volts. As such, it's unclear if it is possible (or even desirable) to automate the generation of the typeName method. (We'll return to this later.)

Volts also requires a Typed instance, which looks like this:

-- | `Volts` instance for `Typed`.
instance Typed Volts where
typeOf = Struct (Volts (Field 0) (Field False))

Using GHC.Generics

I propose to design things in such a way that Struct and Typed are (mostly) automated using GHC.Generics. There are a number of ways that we could go about this, but regardless of which way we pick, we will need to give Volts an instance of the Generic class. This can be done using the DeriveGeneric language extension:

{-# LANGUAGE DeriveGeneric #-}

import GHC.Generics (Generic)

data Volts = Volts
  { numVolts :: Field "numVolts" Word16
  , flag     :: Field "flag"     Bool
  } deriving Generic

Option 1: Explicitly using generic defaults

This option is the least invasive, as it would not require changing anything about the current definitions of the Struct or Typed classes. The idea is that in copilot-core, we would define "default" versions of toValues, updateField, and typeOf with roughly these type signatures:

defaultToValues :: (Generic a, ...) => a -> [Value a]
defaultUpdateField :: (Generic a, ...) => a -> Value t -> a
defaultTypeOf :: (Typed a, Struct a, Generic a, ...) => Type a

And then when implementing Struct and Typed instances for a data type with a Generic instance, one can simply define them like so:

data Volts = Volts
  { numVolts :: Field "numVolts" Word16
  , flag     :: Field "flag"     Bool
  } deriving Generic

instance Struct Volts where
  typeName _ = "volts"
  toValues = defaultToValues
  updateField = defaultUpdateField
 
instance Typed Volts where
  typeOf = defaultTypeOf

There is still some boilerplate required in defining the instance, but only a constant amount. (Note that we still require the user to implement typeName manually.)

Option 2a: Implicitly using generic defaults

We can reduce the amount of boilerplate required if we are willing to change the definitions of the Struct and Typed classes a bit. Currently, they're defined as:

-- | The value of that is a product or struct, defined as a constructor with
-- several fields.
class Struct a where
-- | Returns the name of struct in the target language.
typeName :: a -> String
-- | Transforms all the struct's fields into a list of values.
toValues :: a -> [Value a]
updateField :: a -> Value t -> a
updateField = error "Field updates not supported for this type."

And:

-- | A typed expression, from which we can obtain the two type representations
-- used by Copilot: 'Type' and 'SimpleType'.
class (Show a, Typeable a) => Typed a where
typeOf :: Type a
simpleType :: Type a -> SimpleType
simpleType _ = SStruct

Note that some methods (updateField and simpleType) already have default implementations if you define instances without giving them explicit implementations. As an alternative to option (1), we can change the default implementations so that they use the Generic-based defaults instead. That is:

class Struct a where
  typeName :: a -> String

  toValues :: a -> [Value a]
  default toValues :: (Generic a, ...) => a -> [Value a]
  toValues = defaultToValues

  updateField :: a -> Value t -> a
  default updateField :: (Generic a, ...) => a -> Value t -> a
  updateField = defaultUpdateField

class (Show a, Typeable a) => Typed a where
  typeOf     :: Type a
  default typeOf :: (Struct a, Generic a, ...) => Type a
  typeOf = defaultTypeOf

  simpleType :: Type a -> SimpleType
  simpleType _ = SStruct

With these defaults, you can now define Struct and Typed instances like so:

data Volts = Volts
  { numVolts :: Field "numVolts" Word16
  , flag     :: Field "flag"     Bool
  } deriving Generic

instance Struct Volts where
  typeName _ = "volts"
instance Typed Volts

Note that we have changed the default implementation for updateField to require a Generic constraint, so it is no longer possible to omit an updateField implementation unless the data type has a Generic instance. (More on this later.)

Option 2b: Using deriving-related GHC extensions

Advanced Haskellers will recognize that we don't need to write out the Struct and Typed instances in a standalone fashion. Instead, we can derive them just as we derive the Generic instance. To do so, we will need to make use of some additional GHC language extensions that augment the deriving keyword with extra powers:

{-# LANGUAGE DerivingStrategies #-}
{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE DerivingVia #-}

data Volts = Volts
  { numVolts :: Field "numVolts" Word16
  , flag     :: Field "flag"     Bool
  } deriving stock Generic
    deriving anyclass Typed
    deriving Struct via (GenericStruct "volts" Volts)

Now there are no instance declarations anymore: just deriving!

How does this all work? Let's go through this bit by bit:

  • The DerivingStrategies language extension allows specifying strategies to use with each use of the deriving keyword. For instance, the stock strategy is the usual strategy that derivable classes mentioned in the Haskell Report use.

    (Note that we could just as well leave off the stock strategy and write deriving Generic instead of deriving stock Generic—they're equivalent ways of writing the same thing. I need to use other deriving strategies elsewhere in this program, however, so I decided to be explicit here about which deriving strategy is in use.)

  • The DerivingAnyClass and DerivingVia language extensions allow using the anyclass and via deriving strategies.

  • Writing deriving anyclass Typed derives a Typed instance as though you had written a separate instance Typed Volts declaration (using default implementations for all methods). Again, we could technically leave off the anyclass strategy here, but I decided to be explicit.

  • deriving Struct via (GenericStruct "volts" Volts) is the most interesting part. In order for this to work, copilot needs to offer a GenericStruct newtype:

    newtype GenericStruct (typeName :: Symbol) a = GenericStruct a

    This newtype should also come equipped with a Struct instance that leverages Generic-based defaults:

    instance (KnownSymbol typeName, Generic a, ...) => Struct (GenericStruct typeName a) where
      typeName _ = symbolVal (Proxy @typeName)
      toValues (GenericStruct x) = coerce $ defaultToValues x
      updateField (GenericStruct x) v = GenericStruct (defaultUpdateField x v)

    Now, one can use DerivingVia to derive a Struct instance that reuses the existing Struct instance for GenericStruct. This works because Volts and GenericStruct s Volts have the same underlying representation. The s in GenericStruct s specifies how typeName should be implemented, and this is the only place where the programmer has to make a choice.

Note that this approach (ption 2b) is fully compatible with option 2a above, as both options are different syntaxes for accomplishing the same thing. As such, advanced Haskellers can use this approach if they want, but if the use of deriving-related GHC extensions is too much, one can always fall back to the (comparatively less advanced) approach used in option 2a. I'll collectively refer to both option 2a and 2b as "option 2".

Option 1 or 2?

As noted above, option 1 does not require any changes to the defaults in the Struct and Typed classes, while option 2 does. As such, option 2 requires a backwards-compatible API change, as there would be existing Struct/Typed instances that would no longer compile unless the user added a Generic instance to their struct types.

Personally, I am in favor of option 2, even with the need for an API change. Given how simple it is to derive a Generic instance, migrating existing code to the new defaults should be very straightforward. As such, I think this is an acceptable price to pay.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions