Experimenting with NumPy internals

Registering user-defined type which can be specified in the dtype argument

First I tried to create a user-defined type. It is summarized as follows:

  • Create such a PyTypeObject that is a subclass of PyGenericArrType_Type for the user-defined type.
    PyTypeObject *PyMyArrType_Type;
    
    Py_INCREF(&PyGenericArrType_Type);
    {
        PyObject *args = PyTuple_Pack(3,
            PyString_InternFromString("myarray.myarray"),
            PyTuple_Pack(1, &PyGenericArrType_Type),
            PyDict_New());
        PyMyArrType_Type = (PyTypeObject *)PyType_Type.tp_new(
            &PyType_Type, args, NULL);
        if (!PyMyArrType_Type)
            return;
        Py_DECREF(args);
    }
    
  • Create a PyArray_Descr object that specifies the type object created in the previous step for typeobject field.
        PyArray_Descr *PyMyArray_Descr;
    
        PyMyArray_Descr = PyObject_New(PyArray_Descr, &PyArrayDescr_Type);
        PyMyArray_Descr->typeobj = PyMyArrType_Type;
        PyMyArray_Descr->kind = PyArray_DOUBLELTR;
        PyMyArray_Descr->type = PyArray_DOUBLELTR;
        PyMyArray_Descr->byteorder = '<';
        PyMyArray_Descr->hasobject = 0;
        PyMyArray_Descr->type_num = PyArray_USERDEF;
        PyMyArray_Descr->elsize = sizeof(double);
        PyMyArray_Descr->alignment = sizeof(double);
        PyMyArray_Descr->subarray = NULL;
        PyMyArray_Descr->fields = NULL;
        PyMyArray_Descr->names = NULL;
        PyMyArray_Descr->f = &_PyMyArr_ArrFuncs;
    
  • Register the descriptor with PyArray_RegisterDataType()
        if (!PyArray_RegisterDataType(PyMyArray_Descr))
            return;
    
  • And finally put the type object created in the first step to the module object.
        PyObject *m = Py_InitModule("myarray", myarray_methods);
        {
            PyObject *d = PyModule_GetDict(m);
            PyDict_SetItemString(d, "myarray", (PyObject *)PyMyArrType_Type);
        }
    

Arrays of the newly defined user type can be instantiated like below:

import myarray;
import numpy;

a = numpy.ndarray(10, myarray.myarray);

Element retrieval sequence

Next, I looked through the NumPy? multiarray code to see how the value of an element is retrieved. The call sequence is like the following:

  1. PyEval_*()
  2. array_subscript_nice()
    through PyMappingMethods.
  3. PyArray_GetPtr()
    Calculating the pointer to the element in question.
  4. PyArray_Scalar()
    Retrieving the element as a Python object.
  5. PyArray_Descr.f->PyArray_GetItemFunc()
    This function gets called only when PyArray_Descr.hasobject & NPY_USE_GETITEM != 0

Conclusion

NumPy doesn't allow us to define user-defined array types that have data of non-standard memory layouts though it defines accessor interface in PyArray_ArrFuncs, which accepts the pointer to the element, not the indices that refer to the element.