Porting Example

HPy supports incrementally porting an existing C extension from the original Python C API to the HPy API and to have the extension compile and run at each step along the way.

Here we walk through porting a small C extension that implements a Point type with some simple methods (a norm and a dot product). The Point type is minimal, but does contain additional C attributes (the x and y values of the point) and an attribute (obj) that contains a Python object (that we will need to convert from a PyObject * to an HPyField).

There is a separate C file illustrating each step of the incremental port:

  • step_00_c_api.c: The original C API version that we are going to port.

  • step_01_hpy_legacy.c: A possible first step where all methods still receive PyObject * arguments and may still cast them to PyPointObject * if they are instances of Point.

  • step_02_hpy_legacy.c: Shows how to transition some methods to HPy methods that receive HPy handles as arguments while still supporting legacy methods that receive PyObject * arguments.

  • step_03_hpy_final.c: The completed port to HPy where all methods receive HPy handles and PyObject_HEAD has been removed.

Take a moment to read through step_00_c_api.c. Then, once you’re ready, keep reading.

Each section below corresponds to one of the three porting steps above:

Note

The steps used here are one approach to porting a module. The specific steps are not required. They’re just an example approach.

Step 01: Converting the module to a (legacy) HPy module

First for the easy bit – let’s include hpy.h:

3
#include <hpy.h>

We’d like to differentiate between references to PyPointObject that have been ported to HPy and those that haven’t, so let’s rename it to PointObject and alias PyPointObject to PointObject. We’ll keep PyPointObject for the instances that haven’t been ported yet (the legacy ones) and use PointObject where we have ported the references:

16
17
18
19
20
21
22
23
typedef struct {
    // PyObject_HEAD is required while legacy_slots are still used
    // but can (and should) be removed once the port to HPy is completed.
    PyObject_HEAD
    double x;
    double y;
    PyObject *obj;
} PointObject;
29
typedef PointObject PyPointObject;

For this step, all references will be to PyPointObject – we’ll only start porting references in the next step.

Let’s also call HPyType_LEGACY_HELPERS to define some helper functions for use with the PointObject struct:

37
HPyType_LEGACY_HELPERS(PointObject)

Again, we won’t use these helpers in this step – we’re just setting things up for later.

Now for the big steps.

We need to replace PyType_Spec for the Point type with the equivalent HPyType_Spec:

131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
// HPy type methods and slots (no methods or slots have been ported yet)
static HPyDef *point_defines[] = {
    NULL
};

static HPyType_Spec Point_Type_spec = {
    .name = "point_hpy_legacy_1.Point",
    .basicsize = sizeof(PointObject),
    .itemsize = 0,
    .flags = HPy_TPFLAGS_DEFAULT,
    .builtin_shape = SHAPE(PointObject),
    .legacy_slots = Point_legacy_slots,
    .defines = point_defines,
};

// HPy supports only multiphase module initialization, so we must migrate the
// single phase initialization by extracting the code that populates the module
// object with attributes into a separate 'exec' slot. The module is not
// created manually by calling API like PyModule_Create, but the runtime creates
// the module for us from the specification in HPyModuleDef, and we can provide
// additional slots to populate the module before its initialization is finalized
HPyDef_SLOT(module_exec, HPy_mod_exec)
static int module_exec_impl(HPyContext *ctx, HPy mod)
{
    HPy point_type = HPyType_FromSpec(ctx, &Point_Type_spec, NULL);
    if (HPy_IsNull(point_type))
        return -1;
    HPy_SetAttr_s(ctx, mod, "Point", point_type);
    return 0;
}

Initially the list of ported methods in point_defines is empty and all of the methods are still in Point_slots which we have renamed to Point_legacy_slots for clarity.

SHAPE(PointObject) is a macro that retrieves the shape of PointObject as it was defined by the HPyType_LEGACY_HELPERS macro and will be set to HPyType_BuiltinShape_Legacy until we replace the legacy macro with the HPyType_HELPERS one. Any type with legacy_slots or that still includes PyObject_HEAD in its struct should have .builtin_shape set to HPyType_BuiltinShape_Legacy.

Similarly we replace PyModuleDef with HPyModuleDef:

162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
// Legacy module methods (the "dot" method is still a PyCFunction)
static PyMethodDef PointModuleLegacyMethods[] = {
    {"dot", (PyCFunction)dot, METH_VARARGS, "Dot product."},
    {NULL, NULL, 0, NULL}
};

// HPy module methods: no regular methods have been ported yet,
// but we add the module execute slot
static HPyDef *module_defines[] = {
    &module_exec,
    NULL
};

static HPyModuleDef moduledef = {
    // .name = "step_01_hpy_legacy",
    // ^-- .name is not needed for multiphase module initialization,
    // it is always taken from the ModuleSpec
    .doc = "Point module (Step 1; All legacy methods)",
    .size = 0,
    .legacy_methods = PointModuleLegacyMethods,
    .defines = module_defines,
};

Like the type, the list of ported methods in module_defines is initially almost empty: all the regular methods are still in PointModuleMethods which has been renamed to PointModuleLegacyMethods. However, because HPy supports only multiphase module initialization, we must convert our module initialization code to an “exec” slot on the module and add that slot to module_defines.

Now all that is left is to replace the module initialization function with one that uses HPy_MODINIT. The first argument is the name of the extension, i.e., what was XXX in PyInit_XXX, and the second argument is the HPyModuleDef.

189
HPy_MODINIT(step_01_hpy_legacy, moduledef)

And we’re done!

Instead of the PyInit_XXX, we now have an “exec” slot on the module. We implement it with a C function that that takes an HPyContext *ctx and HPy mod as arguments. The ctx must be forwarded as the first argument to calls to HPy API methods. The mod argument is a handle for the module object. The runtime creates the module for us from the provided HPyModuleDef. There is no need to call API like PyModule_Create explicitly.

Next step is to replace PyType_FromSpec by HPyType_FromSpec.

HPy_SetAttr_s is used to add the Point class to the module. HPy requires no special PyModule_AddObject method.

152
153
154
155
156
157
158
159
160
HPyDef_SLOT(module_exec, HPy_mod_exec)
static int module_exec_impl(HPyContext *ctx, HPy mod)
{
    HPy point_type = HPyType_FromSpec(ctx, &Point_Type_spec, NULL);
    if (HPy_IsNull(point_type))
        return -1;
    HPy_SetAttr_s(ctx, mod, "Point", point_type);
    return 0;
}

Step 02: Transition some methods to HPy

In the previous step we put in place the type and module definitions required to create an HPy extension module. In this step we will port some individual methods.

Let us start by migrating Point_traverse. First we need to change PyObject *obj in the PointObject struct to HPyField obj:

16
17
18
19
20
21
22
23
24
25
typedef struct {
    // PyObject_HEAD is required while legacy methods still access
    // PointObject and should be removed once the port to HPy is completed.
    PyObject_HEAD
    double x;
    double y;
    // HPy handles are shortlived to support all GC strategies
    // For that reason, PyObject* in C structs are replaced by HPyField
    HPyField obj;
} PointObject;

HPy handles can only be short-lived – i.e. local variables, arguments to functions or return values. HPyField is the way to store long-lived references to Python objects. For more information, please refer to the documentation of HPyField.

Now we can update Point_traverse:

40
41
42
43
44
45
HPyDef_SLOT(Point_traverse, HPy_tp_traverse)
int Point_traverse_impl(void *self, HPyFunc_visitproc visit, void *arg)
{
    HPy_VISIT(&((PointObject*)self)->obj);
    return 0;
}

In the first line we used the HPyDef_SLOT macro to define a small structure that describes the slot being implemented. The first argument, Point_traverse, is the name to assign the structure to. By convention, the HPyDef_SLOT macro expects a function called Point_traverse_impl implementing the slot. The second argument, HPy_tp_traverse, specifies the kind of slot.

This is a change from how slots are defined in the old C API. In the old API, the kind of slot is only specified much lower down in Point_legacy_slots. In HPy the implementation and kind are defined in one place using a syntax reminiscent of Python decorators.

The implementation of traverse is now a bit simpler than in the old C API. We no longer need to visit Py_TYPE(self) and need only HPy_VISIT self->obj. HPy ensures that interpreter knows that the type of the instance is still referenced.

Only struct members of type HPyField can be visited with HPy_VISIT, which is why we needed to convert obj to an HPyField before we implemented the HPy traverse.

Next we must update Point_init to store the value of obj as an HPyField:

48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
HPyDef_SLOT(Point_init, HPy_tp_init)
int Point_init_impl(HPyContext *ctx, HPy self, const HPy *args,
        HPy_ssize_t nargs, HPy kw)
{
    static const char *kwlist[] = {"x", "y", "obj", NULL};
    PointObject *p = PointObject_AsStruct(ctx, self);
    p->x = 0.0;
    p->y = 0.0;
    HPy obj = HPy_NULL;
    HPyTracker ht;
    if (!HPyArg_ParseKeywordsDict(ctx, &ht, args, nargs, kw, "|ddO", kwlist,
                                  &p->x, &p->y, &obj))
        return -1;
    if (HPy_IsNull(obj))
        obj = ctx->h_None;
    /* INCREF not needed because HPyArg_ParseKeywordsDict does not steal a
       reference */
    HPyField_Store(ctx, self, &p->obj, obj);
    HPyTracker_Close(ctx, ht);
    return 0;
}

There are a few new HPy constructs used here:

  • The kind of the slot passed to HPyDef_SLOT is HPy_tp_init.

  • PointObject_AsStruct is defined by HPyType_LEGACY_HELPERS and returns an instance of the PointObject struct. Because we still include PyObject_HEAD at the start of the struct this is still a valid PyObject * but once we finish the port the struct will no longer contain PyObject_HEAD and this will just be an ordinary C struct with no memory overhead!

  • We use HPyTracker when parsing the arguments with HPyArg_ParseKeywords. The HPyTracker keeps track of open handles so that they can be closed easily at the end with HPyTracker_Close.

  • HPyArg_ParseKeywords is the equivalent of PyArg_ParseTupleAndKeywords. Note that the HPy version does not steal a reference like the Python version.

  • HPyField_Store is used to store a reference to obj in the struct. The arguments are the context (ctx), a handle to the object that owns the reference (self), the address of the HPyField (&p->obj), and the handle to the object (obj).

Note

An HPyTracker is not strictly needed for HPyArg_ParseKeywords in Point_init. The arguments x and y are C floats (so there are no handles to close) and the handle stored in obj was passed in to the Point_init as an argument and so should not be closed.

We showed the tracker here to demonstrate its use. You can read more about argument parsing in the API docs.

If a tracker is needed and one is not provided, HPyArg_ParseKeywords will return an error.

The last update we need to make for the change to HPyField is to migrate Point_obj_get which retrieves obj from the stored HPyField:

71
72
73
74
75
76
HPyDef_GET(Point_obj, "obj", .doc="Associated object.")
HPy Point_obj_get(HPyContext *ctx, HPy self, void* closure)
{
    PointObject *p = PointObject_AsStruct(ctx, self);
    return HPyField_Load(ctx, self, p->obj);
}

Above we have used PointObject_AsStruct again, and then HPyField_Load to retrieve the value of obj from the HPyField.

We’ve now finished all of the changes needed by introducing HPyField. We could stop here, but let’s migrate one ordinary method, Point_norm, to end off this stage of the port:

79
80
81
82
83
84
85
86
HPyDef_METH(Point_norm, "norm", HPyFunc_NOARGS, .doc="Distance from origin.")
HPy Point_norm_impl(HPyContext *ctx, HPy self)
{
    PointObject *p = PointObject_AsStruct(ctx, self);
    double norm;
    norm = sqrt(p->x * p->x + p->y * p->y);
    return HPyFloat_FromDouble(ctx, norm);
}

To define a method we use HPyDef_METH instead of HPyDef_SLOT. HPyDef_METH creates a small structure defining the method. The first argument is the name to assign to the structure (Point_norm). The second is the Python name of the method (norm). The third specifies the method signature (HPyFunc_NOARGS – i.e. no additional arguments in this case). The last provides the docstring. The macro then expects a function named Point_norm_impl implementing the method.

The rest of the implementation remains similar, except that we use HPyFloat_FromDouble to create a handle to a Python float containing the result (i.e. the distance of the point from the origin).

Now we are done and just have to remove the old implementations from Point_legacy_slots and add them to point_defines:

121
122
123
124
125
126
127
static HPyDef *point_defines[] = {
    &Point_init,
    &Point_norm,
    &Point_obj,
    &Point_traverse,
    NULL
};

Step 03: Complete the port to HPy

In this step we’ll complete the port. We’ll no longer include Python, remove PyObject_HEAD from the PointObject struct, and port the remaining methods.

First, let’s remove the import of Python.h:

2
// #include <Python.h>  // disallow use of the old C API

And PyObject_HEAD from the struct:

15
16
17
18
19
20
21
22
23
typedef struct {
    // PyObject_HEAD is no longer available in PointObject. In CPython,
    // of course, it still exists but is inaccessible from HPy_AsStruct. In
    // other Python implementations (e.g. PyPy) it might no longer exist at
    // all.
    double x;
    double y;
    HPyField obj;
} PointObject;

And the typedef of PointObject to PyPointObject:

29
// typedef PointObject PyPointObject;

Now any code that has not been ported should result in a compilation error.

We must also change the type helpers from HPyType_LEGACY_HELPERS to HPyType_HELPERS so that PointObject_AsStruct knows that PyObject_HEAD has been removed:

35
HPyType_HELPERS(PointObject)

There is one more method to port, the dot method which is a module method that implements the dot product between two points:

86
87
88
89
90
91
92
93
94
95
96
97
HPyDef_METH(dot, "dot", HPyFunc_VARARGS, .doc="Dot product.")
HPy dot_impl(HPyContext *ctx, HPy self, const HPy *args, size_t nargs)
{
    HPy point1, point2;
    if (!HPyArg_Parse(ctx, NULL, args, nargs, "OO", &point1, &point2))
        return HPy_NULL;
    PointObject *p1 = PointObject_AsStruct(ctx, point1);
    PointObject *p2 = PointObject_AsStruct(ctx, point2);
    double dp;
    dp = p1->x * p2->x + p1->y * p2->y;
    return HPyFloat_FromDouble(ctx, dp);
}

The changes are similar to those used in porting the norm method, except:

  • We use HPyArg_Parse instead of HPyArg_ParseKeywordsDict.

  • We opted not to use an HPyTracker by passing NULL as the pointer to the tracker when calling HPyArg_Parse. There is no reason not to use a tracker here, but the handles to the two points are passed in as arguments to dot_impl and thus there is no need to close them (and they should not be closed).

We use PointObject_AsStruct and HPyFloat_FromDouble as before.

Now that we have ported everything we can remove PointMethods, Point_legacy_slots and PointModuleLegacyMethods. The resulting type definition is much cleaner:

113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
static HPyDef *point_defines[] = {
    &Point_init,
    &Point_norm,
    &Point_obj,
    &Point_traverse,
    NULL
};

static HPyType_Spec Point_Type_spec = {
    .name = "point_hpy_final.Point",
    .doc = "Point (Step 03)",
    .basicsize = sizeof(PointObject),
    .itemsize = 0,
    .flags = HPy_TPFLAGS_DEFAULT,
    .defines = point_defines
};

HPyDef_SLOT(module_exec, HPy_mod_exec)
static int module_exec_impl(HPyContext *ctx, HPy mod)
{
    HPy point_type = HPyType_FromSpec(ctx, &Point_Type_spec, NULL);
    if (HPy_IsNull(point_type))
        return -1;
    HPy_SetAttr_s(ctx, mod, "Point", point_type);
    return 0;
}

and the module definition is simpler too:

141
142
143
144
145
146
147
148
149
150
151
static HPyDef *module_defines[] = {
        &module_exec,
    &dot,
    NULL
};

static HPyModuleDef moduledef = {
    .doc = "Point module (Step 3; Porting complete)",
    .size = 0,
    .defines = module_defines,
};

Now that the port is complete, when we compile our extension in HPy universal mode, we obtain a built extension that depends only on the HPy ABI and not on the CPython ABI at all!