First of all, you should go on the
gobject-introspection website
and read the page on
how to write bindable
API
. What
I’m going to write here is going to build upon what’s already documented, or
will update the best practices, so if you maintain a GObject/C library, or
you’re writing one, you
must
be familiar with the basics of
gobject-introspection. It’s 2023: it’s already too bad we’re still writing C
libraries, we should
at the very least
be responsible about it.
A specific note for people maintaining an existing GObject/C library with an
API
designed
before
the mainstream establishment of gobject-introspection
(basically, anything written prior to 2011): you should
really
consider
writing all new types and entry points with gobject-introspection in mind,
and you should also consider phasing out older
API
and replacing it
piecemeal with a bindable one. You should have done this 10 years ago, and I
can already hear the objections, but:
too bad
. Just because you made an
effort 10 years ago it doesn’t mean things are frozen in time, and you don’t
get to fix things. Maintenance means constantly tending to your code, and
that doubly applies if you’re exposing an
API
to other people.
Let’s take the “how to write bindable
API
” recommendations, and elaborate
them a bit.
Structures with custom memory management
The recommendation is to use
GBoxed
as a way to specify a copy and a free
function, in order to clearly define the memory management semantics of a type.
The important
caveat
is that boxed types are necessary for:
-
opaque types that can only be heap allocated
-
using a type as a GObject property
-
using a type as an argument or return value for a GObject signal
You don’t
need
a boxed type for the following cases:
-
your type is an argument or return value for a method, function, or
virtual function
-
your type can be placed on the stack, or can be allocated with
malloc()
/
free()
Additionally, starting with gobject-introspection 1.76, you can specify the
copy and free function of a type
without
necessarily registering a boxed
type, which leaves boxed types for the thing they were created: signals and properties.
Addendum: object types
Boxed types should only ever be used for plain old data types; if you need
inheritance, then the strong recommendation is to use
GObject
. You can use
GTypeInstance
, but
only if you know what you’re doing
; for more
information on that, see
my old blog post about typed instances
.
Functionality only accessible through a C macro
This ought to be fairly uncontroversial. C pre-processor symbols don’t exist
at the
ABI
level, and gobject-introspection is a mechanism to describe a C
ABI
. Never,
ever
expose
API
only through C macros; those are for C
developers. C macros can be used to create convenience wrappers, but
remember that anything they call must be public
API
, and that other people
will need to re-implement the convenience wrappers themselves, so don’t
overdo it. C developers deserve some convenience, but not at the expense of
everyone else.
Addendum: inline functions
Static inline functions are also not part of the
ABI
of a library, because
they cannot be used with
dlsym()
; you can provide inlined functions for
performance reasons, but remember to always provide their non-inlined equivalent.
Direct C structure access for objects
Again, another fairly uncontroversial rule. You shouldn’t be putting
anything into an instance structure, as it makes your
API
harder to
future-proof, and direct access cannot do things like change notification,
or
memoization
.
Always provide accessor functions.
va_list
Variadic argument functions are mainly C convenience. Yes, some languages
can support them, but it’s a bad idea to have this kind of
API
exposed as
the only way to do things.
Any variadic argument function should have two additional variants:
-
a vector based version, using C arrays (zero terminated, or with an
explicit length)
-
a
va_list
version, to be used when creating wrappers with variadic
arguments themselves
The
va_list
variant is kind of optional, since not many people go around
writing variadic argument C wrappers, these days, but at the end of the day
you might be going to write an internal function that takes a
va_list
anyway, so it’s not particularly strange to expose it as part of your public
API
.
The vector-based variant, on the other hand, is fundamental.
Incidentally, if you’re using variadic arguments as a way to collect
similarly typed values, e.g.:
// void
// some_object_method (SomeObject *self,
// ...) G_GNUC_NULL_TERMINATED
some_object_method (obj, "foo", "bar", "baz", NULL);
there’s very little difference to using a vector and C99’s compound literals:
// void
// some_object_method (SomeObject *self,
// const char *args[])
some_object_method (obj, (const char *[]) {
"foo",
"bar",
"baz",
NULL,
});
Except that now the compiler will be able to do some basic type check and
scream at you if you’re doing something egregiously bad.
Compound literals and designated initialisers also help when dealing with
key/value pairs:
typedef struct {
int column;
union {
const char *v_str;
int v_int;
} value;
} ColumnValue;
enum {
COLUMN_NAME,
COLUMN_AGE,
N_COLUMNS
};
// void
// some_object_method (SomeObject *self,
// size_t n_columns,
// const ColumnValue values[])
some_object_method (obj, 2,
(ColumnValue []) {
{ .column = COLUMN_NAME, .data = { .v_str = "Emmanuele" } },
{ .column = COLUMN_AGE, .data = { .v_int = 42 } },
});
So you should seriously reconsider the amount of variadic arguments
convenience functions you expose.
Multiple out parameters
Using a structured type with a
out
direction is a good recommendation as a
way to both limit the amount of
out
arguments
and
provide some
future-proofing for your
API
. It’s easy to expand an opaque pointer type
with accessors, whereas adding more
out
arguments requires an
ABI
break.
Addendum:
inout
arguments
Don’t use in-out arguments. Just don’t.
Pass an
in
argument to the callable for its input, and take an
out
argument or a return value for the output.
Memory management and ownership of
inout
arguments is
incredibly
hard to
capture with static annotations; it mainly works for scalar values, so:
void
some_object_update_matrix (SomeObject *self,
double *xx,
double *yy,
double *xy,
double *yx)
can work with
xx
,
yy
,
xy
,
yx
as
inout
arguments, because there’s
no ownership transfer; but as soon as you start throwing things in like
pointers to structures, or vectors of string, you open yourself to questions like:
-
who allocates the argument when it goes in?
-
who is responsible for freeing the argument when it comes out?
-
what happens if the function frees the argument in the
in
direction and
then re-allocates the
out
?
-
what happens if the function uses a different allocator than the one used
by the caller?
-
what happens if the function has to allocate more memory?
-
what happens if the function modifies the argument and frees memory?
Even if gobject-introspection nailed down the rules, they could not be
enforced, or validated, and could lead to leaks or, worse, crashes.
So, once again: don’t use
inout
arguments. If your
API
already exposes
inout
arguments, especially for non-scalar types, consider deprecations
and adding new entry points.
Addendum:
GValue
Sadly,
GValue
is one of the most notable cases of
inout
abuse. The
oldest parts of the
GNOME
stack use
GValue
in a way that requires
inout
annotations because they expect the caller to:
-
initialise a
GValue
with the desired type
-
pass the address of the value
-
let the function fill the value
The caller is then left with calling
g_value_unset()
in order to free the
resources associated with a
GValue
. This means that you’re passing an
initialised value to a callable, the callable will do something to it (which
may or may not even entail re-allocating the value) and then you’re going to
get it back at the same address.
It would be a lot easier if the
API
left the job of initialising the
GValue
to the callee; then functions could annotate the
GValue
argument
with
out
and
caller-allocates=1
. This would leave the ownership to the
caller, and remove a whole lot of uncertainty.
Various new (comparatively speaking)
API
allow the caller to pass an
unitialised
GValue
, and will leave initialisation to the caller, which is
how it should be, but this kind of change isn’t always possible in a
backward compatible way.
Arrays
You can use three types of C arrays in your
API
:
-
zero-terminated arrays, which are the easiest to use, especially for
pointers and strings
-
fixed-size arrays
-
arrays with length arguments
Addendum: strings and byte arrays
A
const char*
argument for C strings with a length argument is
not
an array:
/**
* some_object_load_data:
* @self: ...
* @str: the data to load
* @len: length of @str in bytes, or -1
*
* ...
*/
void
some_object_load_data (SomeObject *self,
const char *str,
ssize_t len)
Never
annotate the
str
argument with
array length=len
. Ideally, this
kind of function
should not exist in the first place
. You should always
use
const char*
for
NUL
-terminated strings, possibly
UTF
-8 encoded; if
you allow embedded
NUL
characters then use a bytes array:
/**
* some_object_load_data:
* @self: ...
* @data: (array length=len) (element-type uint8): the data to load
* @len: the length of the data in bytes
*
* ...
*/
void
some_object_load_data (SomeObject *self,
const unsigned char *data,
size_t len)
Instead of
unsigned char
you can also use
uint8_t
, just to drive the
point home.
Yes, it’s slightly nicer to have a single entry point for strings and byte
arrays, but that’s just a C convenience: decent languages will have a proper
string type, which always comes with a length; and string types are not
binary data.
Addendum:
GArray
,
GPtrArray
,
GByteArray
Whatever you do, however low you feel on the day, whatever particular
tragedy befell your family at some point, please:
never
use GLib array
types in your
API
. Nothing good will ever come of it, and you’ll just spend
your days regretting this choice.
Yes: gobject-introspection transparently converts between GLib array types
and C types, to the point of allowing you to annotate the contents of the
array. The problem is that that information is static, and only exists at
the introspection level. There’s nothing that prevents you from putting
other random data into a
GPtrArray
, as long as it’s pointer-sized.
There’s nothing that prevents a version of a library from saying that you
own the data inside a
GArray
, and have the next version assign a clear
function to the array to avoid leaking it all over the place on error
conditions, or when using
g_autoptr
.
Adding support for GLib array types in the introspection was a
well-intentioned mistake that worked in very specific cases—for instance, in
a library that is private to an application. Any well-behaved, well-designed
general purpose library should not expose this kind of
API
to its consumers.
You should use
GArray
,
GPtrArray
, and
GByteArray
internally; they are
good types, and remove a lot of the pain of dealing with C arrays. Those
types should never be exposed at the
API
boundary: always convert them to C
arrays, or wrap them into your own data types, with proper argument
validation and ownership rules.
Addendum:
GHashTable
What’s worse than a type that contains data with unclear ownership rules
decided at run time? A type that contains twice the amount of data with
unclear ownership rules decided at run time.
Just like the GLib array types, hash tables should be used but never
directly exposed to consumers of an
API
.
Addendum:
GList
,
GSList
,
GQueue
See above, re: pain and misery. On top of that, linked lists are a
terrible
data type that people should rarely, if ever, use in the first place.
Callbacks
Your callbacks should always be in the form of a simple callable with a
data argument:
typedef void (* SomeCallback) (SomeObject *obj,
gpointer data);
Any function that takes a callback should also take a “user data” argument
that will be passed
as is
to the callback:
// scope: call; the callback data is valid until the
// function returns
void
some_object_do_stuff_immediately (SomeObject *self,
SomeCallback callback,
gpointer data);
// scope: notify; the callback data is valid until the
// notify function gets called
void
some_object_do_stuff_with_a_delay (SomeObject *self,
SomeCallback callback,
gpointer data,
GDestroyNotify notify);
// scope: async; the callback data is valid until the async
// callback is called
void
some_object_do_stuff_but_async (SomeObject *self,
GCancellable *cancellable,
GAsyncReadyCallback callback,
gpointer data);
// not pictured here: scope forever; the data is valid fori
// the entirety of the process lifetime
If your function takes more than one callback argument, you should make sure
that it also takes a different user data for each callback, and that the
lifetime of the callbacks are well defined. The alternative is to use
GClosure
instead of a simple C function pointer—but that comes at a cost
of
GValue
marshalling, so the recommendation is to stick with one callback
per function.
Addendum: the
closure
annotation
It seems that many people are unclear about the
closure
annotation.
Whenever you’re describing a function that takes a callback, you should
always
annotate the
callback
argument with the argument that contains
the
user data
using the
(closure argument)
annotation, e.g.
/**
* some_object_do_stuff_immediately:
* @self: ...
* @callback: (scope call) (closure data): the callback
* @data: the data to be passed to the @callback
*
* ...
*/
You should
not
annotate the
data
argument with a unary
(closure)
.
The unary
(closure)
is meant to be used when annotating the
callback
type
:
/**
* SomeCallback:
* @self: ...
* @data: (closure): ...
*
* ...
*/
typedef void (* SomeCallback) (SomeObject *self,
gpointer data);
Yes, it’s confusing, I know.
Sadly, the introspection parser isn’t very clear about this, but in the
future it will emit a warning if it finds a unary
closure
on anything that
isn’t a callback type.
Ideally, you don’t really need to annotate anything when you call your
argument
user_data
, but it does not hurt to be explicit.
A cleaned up version of this blog post will go up on the
gobject-introspection website, and we should really have a proper set of
best
API
design practices on the Developer Documentation website by now;
nevertheless, I do hope people will actually follow these recommendations at
some point, and that they will be prepared for new recommendations in the
future. Only dead and unmaintained projects don’t change, after all, and I
expect the
GNOME
stack to last a bit longer than the 25 years it already
spans today.