One of the more tedious aspects of using structs in Copilot is the need to manually define Struct and Typed instances for each Haskell data type that represents a struct. Each of the instances requires an amount of boilerplate code that is grows linearly with the number of fields in the data type. (For an example of this, see this struct-related example in the copilot repo.)
I think the experience of using structs would be greatly improved if GHC could automate as much of the process of defining these instances as possible. Luckily, such a thing is possible using GHC.Generics, and I propose that we make it possible to leverage GHC.Generics to derive Struct and Typed instances. (This proposal was inspired the similar work in #516, although this proposal differs from that work slightly.)
Recap: the Struct and Typed classes
To illustrate what this proposal will do, let's use the Volts data type from the copilot repo as a running example:
|
-- | Definition for `Volts`. |
|
data Volts = Volts |
|
{ numVolts :: Field "numVolts" Word16 |
|
, flag :: Field "flag" Bool |
|
} |
This is a simple struct-related data type with two fields, nunVolts and flag. In order to profitably use Volts, it requires an instance of the Struct class, which looks like this:
|
-- | `Struct` instance for `Volts`. |
|
instance Struct Volts where |
|
typeName _ = "volts" |
|
toValues volts = [ Value Word16 (numVolts volts) |
|
, Value Bool (flag volts) |
|
] |
Struct has three methods:
typeName: The name of the struct to use in the Copilot-generated code (usually C).
toValues: Converts all of the Fields to Values.
updateField: Dispatches on a particular Field and updates the Value contained inside. (Note that updateField isn't explicitly implemented in the example above, but you can see an example of how one would do it here.)
The toValues and updateField methods are extremely mechanical to implement, as their implementations are determined entirely by the type and names of each Field. These are prime candidates for automating via GHC.Generics.
The typeName method is less mechanical, as it requires a choice on the behalf of the programmer to determine how the struct will be named in the Copilot-generated code. In the example above, the programmer chooses to use the all-lowercase name volts instead of the CamelCase name Volts. As such, it's unclear if it is possible (or even desirable) to automate the generation of the typeName method. (We'll return to this later.)
Volts also requires a Typed instance, which looks like this:
|
-- | `Volts` instance for `Typed`. |
|
instance Typed Volts where |
|
typeOf = Struct (Volts (Field 0) (Field False)) |
Using GHC.Generics
I propose to design things in such a way that Struct and Typed are (mostly) automated using GHC.Generics. There are a number of ways that we could go about this, but regardless of which way we pick, we will need to give Volts an instance of the Generic class. This can be done using the DeriveGeneric language extension:
{-# LANGUAGE DeriveGeneric #-}
import GHC.Generics (Generic)
data Volts = Volts
{ numVolts :: Field "numVolts" Word16
, flag :: Field "flag" Bool
} deriving Generic
Option 1: Explicitly using generic defaults
This option is the least invasive, as it would not require changing anything about the current definitions of the Struct or Typed classes. The idea is that in copilot-core, we would define "default" versions of toValues, updateField, and typeOf with roughly these type signatures:
defaultToValues :: (Generic a, ...) => a -> [Value a]
defaultUpdateField :: (Generic a, ...) => a -> Value t -> a
defaultTypeOf :: (Typed a, Struct a, Generic a, ...) => Type a
And then when implementing Struct and Typed instances for a data type with a Generic instance, one can simply define them like so:
data Volts = Volts
{ numVolts :: Field "numVolts" Word16
, flag :: Field "flag" Bool
} deriving Generic
instance Struct Volts where
typeName _ = "volts"
toValues = defaultToValues
updateField = defaultUpdateField
instance Typed Volts where
typeOf = defaultTypeOf
There is still some boilerplate required in defining the instance, but only a constant amount. (Note that we still require the user to implement typeName manually.)
Option 2a: Implicitly using generic defaults
We can reduce the amount of boilerplate required if we are willing to change the definitions of the Struct and Typed classes a bit. Currently, they're defined as:
|
-- | The value of that is a product or struct, defined as a constructor with |
|
-- several fields. |
|
class Struct a where |
|
-- | Returns the name of struct in the target language. |
|
typeName :: a -> String |
|
|
|
-- | Transforms all the struct's fields into a list of values. |
|
toValues :: a -> [Value a] |
|
|
|
updateField :: a -> Value t -> a |
|
updateField = error "Field updates not supported for this type." |
And:
|
-- | A typed expression, from which we can obtain the two type representations |
|
-- used by Copilot: 'Type' and 'SimpleType'. |
|
class (Show a, Typeable a) => Typed a where |
|
typeOf :: Type a |
|
simpleType :: Type a -> SimpleType |
|
simpleType _ = SStruct |
Note that some methods (updateField and simpleType) already have default implementations if you define instances without giving them explicit implementations. As an alternative to option (1), we can change the default implementations so that they use the Generic-based defaults instead. That is:
class Struct a where
typeName :: a -> String
toValues :: a -> [Value a]
default toValues :: (Generic a, ...) => a -> [Value a]
toValues = defaultToValues
updateField :: a -> Value t -> a
default updateField :: (Generic a, ...) => a -> Value t -> a
updateField = defaultUpdateField
class (Show a, Typeable a) => Typed a where
typeOf :: Type a
default typeOf :: (Struct a, Generic a, ...) => Type a
typeOf = defaultTypeOf
simpleType :: Type a -> SimpleType
simpleType _ = SStruct
With these defaults, you can now define Struct and Typed instances like so:
data Volts = Volts
{ numVolts :: Field "numVolts" Word16
, flag :: Field "flag" Bool
} deriving Generic
instance Struct Volts where
typeName _ = "volts"
instance Typed Volts
Note that we have changed the default implementation for updateField to require a Generic constraint, so it is no longer possible to omit an updateField implementation unless the data type has a Generic instance. (More on this later.)
Option 2b: Using deriving-related GHC extensions
Advanced Haskellers will recognize that we don't need to write out the Struct and Typed instances in a standalone fashion. Instead, we can derive them just as we derive the Generic instance. To do so, we will need to make use of some additional GHC language extensions that augment the deriving keyword with extra powers:
{-# LANGUAGE DerivingStrategies #-}
{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE DerivingVia #-}
data Volts = Volts
{ numVolts :: Field "numVolts" Word16
, flag :: Field "flag" Bool
} deriving stock Generic
deriving anyclass Typed
deriving Struct via (GenericStruct "volts" Volts)
Now there are no instance declarations anymore: just deriving!
How does this all work? Let's go through this bit by bit:
-
The DerivingStrategies language extension allows specifying strategies to use with each use of the deriving keyword. For instance, the stock strategy is the usual strategy that derivable classes mentioned in the Haskell Report use.
(Note that we could just as well leave off the stock strategy and write deriving Generic instead of deriving stock Generic—they're equivalent ways of writing the same thing. I need to use other deriving strategies elsewhere in this program, however, so I decided to be explicit here about which deriving strategy is in use.)
-
The DerivingAnyClass and DerivingVia language extensions allow using the anyclass and via deriving strategies.
-
Writing deriving anyclass Typed derives a Typed instance as though you had written a separate instance Typed Volts declaration (using default implementations for all methods). Again, we could technically leave off the anyclass strategy here, but I decided to be explicit.
-
deriving Struct via (GenericStruct "volts" Volts) is the most interesting part. In order for this to work, copilot needs to offer a GenericStruct newtype:
newtype GenericStruct (typeName :: Symbol) a = GenericStruct a
This newtype should also come equipped with a Struct instance that leverages Generic-based defaults:
instance (KnownSymbol typeName, Generic a, ...) => Struct (GenericStruct typeName a) where
typeName _ = symbolVal (Proxy @typeName)
toValues (GenericStruct x) = coerce $ defaultToValues x
updateField (GenericStruct x) v = GenericStruct (defaultUpdateField x v)
Now, one can use DerivingVia to derive a Struct instance that reuses the existing Struct instance for GenericStruct. This works because Volts and GenericStruct s Volts have the same underlying representation. The s in GenericStruct s specifies how typeName should be implemented, and this is the only place where the programmer has to make a choice.
Note that this approach (ption 2b) is fully compatible with option 2a above, as both options are different syntaxes for accomplishing the same thing. As such, advanced Haskellers can use this approach if they want, but if the use of deriving-related GHC extensions is too much, one can always fall back to the (comparatively less advanced) approach used in option 2a. I'll collectively refer to both option 2a and 2b as "option 2".
Option 1 or 2?
As noted above, option 1 does not require any changes to the defaults in the Struct and Typed classes, while option 2 does. As such, option 2 requires a backwards-compatible API change, as there would be existing Struct/Typed instances that would no longer compile unless the user added a Generic instance to their struct types.
Personally, I am in favor of option 2, even with the need for an API change. Given how simple it is to derive a Generic instance, migrating existing code to the new defaults should be very straightforward. As such, I think this is an acceptable price to pay.
One of the more tedious aspects of using structs in Copilot is the need to manually define
StructandTypedinstances for each Haskell data type that represents a struct. Each of the instances requires an amount of boilerplate code that is grows linearly with the number of fields in the data type. (For an example of this, see this struct-related example in thecopilotrepo.)I think the experience of using structs would be greatly improved if GHC could automate as much of the process of defining these instances as possible. Luckily, such a thing is possible using
GHC.Generics, and I propose that we make it possible to leverageGHC.Genericsto deriveStructandTypedinstances. (This proposal was inspired the similar work in #516, although this proposal differs from that work slightly.)Recap: the
StructandTypedclassesTo illustrate what this proposal will do, let's use the
Voltsdata type from thecopilotrepo as a running example:copilot/copilot/examples/Structs.hs
Lines 14 to 18 in 068c06d
This is a simple struct-related data type with two fields,
nunVoltsandflag. In order to profitably useVolts, it requires an instance of theStructclass, which looks like this:copilot/copilot/examples/Structs.hs
Lines 20 to 25 in 068c06d
Structhas three methods:typeName: The name of the struct to use in the Copilot-generated code (usually C).toValues: Converts all of theFields toValues.updateField: Dispatches on a particularFieldand updates theValuecontained inside. (Note thatupdateFieldisn't explicitly implemented in the example above, but you can see an example of how one would do it here.)The
toValuesandupdateFieldmethods are extremely mechanical to implement, as their implementations are determined entirely by the type and names of eachField. These are prime candidates for automating viaGHC.Generics.The
typeNamemethod is less mechanical, as it requires a choice on the behalf of the programmer to determine how the struct will be named in the Copilot-generated code. In the example above, the programmer chooses to use the all-lowercase namevoltsinstead of the CamelCase nameVolts. As such, it's unclear if it is possible (or even desirable) to automate the generation of thetypeNamemethod. (We'll return to this later.)Voltsalso requires aTypedinstance, which looks like this:copilot/copilot/examples/Structs.hs
Lines 27 to 29 in 068c06d
Using
GHC.GenericsI propose to design things in such a way that
StructandTypedare (mostly) automated usingGHC.Generics. There are a number of ways that we could go about this, but regardless of which way we pick, we will need to giveVoltsan instance of theGenericclass. This can be done using theDeriveGenericlanguage extension:{-# LANGUAGE DeriveGeneric #-} import GHC.Generics (Generic) data Volts = Volts { numVolts :: Field "numVolts" Word16 , flag :: Field "flag" Bool } deriving GenericOption 1: Explicitly using generic defaults
This option is the least invasive, as it would not require changing anything about the current definitions of the
StructorTypedclasses. The idea is that incopilot-core, we would define "default" versions oftoValues,updateField, andtypeOfwith roughly these type signatures:And then when implementing
StructandTypedinstances for a data type with aGenericinstance, one can simply define them like so:There is still some boilerplate required in defining the instance, but only a constant amount. (Note that we still require the user to implement
typeNamemanually.)Option 2a: Implicitly using generic defaults
We can reduce the amount of boilerplate required if we are willing to change the definitions of the
StructandTypedclasses a bit. Currently, they're defined as:copilot/copilot-core/src/Copilot/Core/Type.hs
Lines 54 to 64 in 068c06d
And:
copilot/copilot-core/src/Copilot/Core/Type.hs
Lines 192 to 197 in 068c06d
Note that some methods (
updateFieldandsimpleType) already have default implementations if you define instances without giving them explicit implementations. As an alternative to option (1), we can change the default implementations so that they use theGeneric-based defaults instead. That is:With these defaults, you can now define
StructandTypedinstances like so:Note that we have changed the default implementation for
updateFieldto require aGenericconstraint, so it is no longer possible to omit anupdateFieldimplementation unless the data type has aGenericinstance. (More on this later.)Option 2b: Using
deriving-related GHC extensionsAdvanced Haskellers will recognize that we don't need to write out the
StructandTypedinstances in a standalone fashion. Instead, we can derive them just as we derive theGenericinstance. To do so, we will need to make use of some additional GHC language extensions that augment thederivingkeyword with extra powers:{-# LANGUAGE DerivingStrategies #-} {-# LANGUAGE DeriveAnyClass #-} {-# LANGUAGE DerivingVia #-} data Volts = Volts { numVolts :: Field "numVolts" Word16 , flag :: Field "flag" Bool } deriving stock Generic deriving anyclass Typed deriving Struct via (GenericStruct "volts" Volts)Now there are no
instancedeclarations anymore: justderiving!How does this all work? Let's go through this bit by bit:
The
DerivingStrategieslanguage extension allows specifying strategies to use with each use of thederivingkeyword. For instance, thestockstrategy is the usual strategy that derivable classes mentioned in the Haskell Report use.(Note that we could just as well leave off the
stockstrategy and writederiving Genericinstead ofderiving stock Generic—they're equivalent ways of writing the same thing. I need to use other deriving strategies elsewhere in this program, however, so I decided to be explicit here about which deriving strategy is in use.)The
DerivingAnyClassandDerivingVialanguage extensions allow using theanyclassandviaderiving strategies.Writing
deriving anyclass Typedderives aTypedinstance as though you had written a separateinstance Typed Voltsdeclaration (using default implementations for all methods). Again, we could technically leave off theanyclassstrategy here, but I decided to be explicit.deriving Struct via (GenericStruct "volts" Volts)is the most interesting part. In order for this to work,copilotneeds to offer aGenericStructnewtype:This newtype should also come equipped with a
Structinstance that leveragesGeneric-based defaults:Now, one can use
DerivingViato derive aStructinstance that reuses the existingStructinstance forGenericStruct. This works becauseVoltsandGenericStruct s Voltshave the same underlying representation. ThesinGenericStruct sspecifies howtypeNameshould be implemented, and this is the only place where the programmer has to make a choice.Note that this approach (ption 2b) is fully compatible with option 2a above, as both options are different syntaxes for accomplishing the same thing. As such, advanced Haskellers can use this approach if they want, but if the use of
deriving-related GHC extensions is too much, one can always fall back to the (comparatively less advanced) approach used in option 2a. I'll collectively refer to both option 2a and 2b as "option 2".Option 1 or 2?
As noted above, option 1 does not require any changes to the defaults in the
StructandTypedclasses, while option 2 does. As such, option 2 requires a backwards-compatible API change, as there would be existingStruct/Typedinstances that would no longer compile unless the user added aGenericinstance to their struct types.Personally, I am in favor of option 2, even with the need for an API change. Given how simple it is to derive a
Genericinstance, migrating existing code to the new defaults should be very straightforward. As such, I think this is an acceptable price to pay.