Building xsNumpy

Akshay Mestry's photo Akshay Mestry

National Louis University

Mar 01, 2025

Learning doesn’t stop at using libraries, it starts when you build your own one line at a time

So it all kicked off in mid-November of 2024. Like usual, I was knee-deep in a Machine Learning project, tinkering with NumPy every single day. I was slicing and dicing arrays, multiplying matrices, and running complex mathematical operations, all with just a line or two of code. It was working a treat, like pure magic. But, just like any magic trick, I had no idea whatsoever how in the world it actually worked!

I remember starting with a simple bit of code, nothing fancy.

1import numpy as np
2
3a = np.array([[1, 2], [3, 4]])
4b = np.array([[5, 6], [7, 8]])
5c = np.dot(a, b)

The result popped out instantly, no errors, no fuss. But this time, instead of just taking the output at face value, I asked myself, how the hell did numpy.dot() know how to multiply those matrices? And hang on, I was doing matrix multiplication right, then why the heck am I doing numpy.dot() when I could’ve done something like a @ b which is more explicit?

What’s going on here? If numpy.dot() does matrix multiplication, then what in the blazes does numpy.matmul or a @ b do? Why are there two different ways to do the same thing? Is NumPy broken, or am I just missing something dead obvious?

I couldn’t let this slide. That’s when I decided to learn more about these and thought of building xsNumPy.

Reasons behind xsNumPy

Now that I was motivated to learn the nitty-gritty implementation details of NumPy, I was ready to build my own scrappy version of it. I mean, I didn’t set out to create something to rival NumPy. Because let’s be real, NumPy is a bloody powerhouse, built over decades of work and backed by incredible mad lads in math and science, plus tons of optimisations in place. I possibly can’t compete with that!

Nonetheless, I realised something quite important. Just because its dense and complicated, it doesn’t mean I can’t try to understand it. I realised I’d been treating NumPy like a black box. I was chucking numbers at it, calling its functions and underlying APIs without truly understanding how they worked! I was gutted and felt a massive gap in my understanding. Sure, I was smashing out Machine Learning models, fiddling with arrays, tensors, and pulling off all sorts of matrix wizardry. But every time I used something like numpy.linalg.inv() or numpy.einsum(), there was this nagging feeling: I was just trusting the library to do its thing, without knowing why it worked.

This realisation hit me so hard, I challenged myself, “Could I build a dinky version of NumPy from scratch?” Again, not to replace it, that’d be barking mad, but to learn from it. If I really want to ace at teaching these stuff to my students, I had to go deeper.

We need to go deeper meme from Inception

There were a few other reasons for writing xsNumPy besides my lack of understanding about the NumPy internals. I essentially wanted to break free from the “Oh, Neural Networks are like black box” rubbish. When I’m teaching Machine Learning and Neural Networks, I often compare these scientific computing libraries to a car. You can go places, sure, but what happens when something breaks? What do you do then? So to get around this situation, I thought of actually learning it by building.

xsNumPy isn’t just for me. It’s for anyone and everyone who’s ever stared at a piece of Machine Learning code and asked, “How in God’s name does this bloody thing works?”

Building Process

So with the “whys” being explained, I’ll explain the “hows”. I was ready to build my scrappy little version of NumPy, but I didn’t know where to start. So, like any sensible person, I did what we all do when we’re lost I started poking and prodding at various NumPy functions and methods, trying to suss out what made them tick. It didn’t take long to twig that most of NumPy’s APIs lean heavily on one core construct, the numpy.array() function. But here’s the kicker, it’s just a cheeky little wrapper for the mighty numpy.ndarray. Aha! That’s where I decided to start, implementing my primary xsnumpy.ndarray data structure.

Now, I’ll be straight with you, it all seemed dead simple in my head at first. I mean, what’s an array, really? A bunch of numbers neatly arranged in some orientations like rows and columns, right?

Wrong.

The deeper I dug, the more worms came wriggling out. Memory allocation, shape (size) calculations, strides, and optimising how the data’s stored, it was like opening Pandora’s box. Turns out, building even a barebones version of numpy.ndarray is a bit of a faff. Still, after a few weeks of head-scratching, I managed to cobble together a working, albeit minimal, version using ctypes.

 1class ndarray:
 2    """Simplified implementation of a multi-dimensional array.
 3
 4    An array object represents a multidimensional, homogeneous
 5    collection or list of fixed-size items. An associated data-type
 6    property describes the format of each element in the array.
 7
 8    :param shape: The desired shape of the array. Can be an int for
 9        1D arrays or a sequence of ints for multidimensional arrays.
10    :param dtype: The desired data type of the array, defaults to
11        `None` if not specified.
12    :param buffer: Object used to fill the array with data, defaults to
13        `None`.
14    :param offset: Offset of array data in buffer, defaults to `0`.
15    :param strides: Strides of data in memory, defaults to `None`.
16    :param order: The memory layout of the array, defaults to `None`.
17    :raises RuntimeError: If an unsupported order is specified.
18    :raises ValueError: If invalid strides or offsets are provided.
19    """
20
21    def __init__(
22        self,
23        shape: _ShapeLike | int,
24        dtype: None | DTypeLike | _BaseDType = None,
25        buffer: None | t.Any = None,
26        offset: t.SupportsIndex = 0,
27        strides: None | _ShapeLike = None,
28        order: None | _OrderKACF = None,
29    ) -> None:
30        """Initialise an `ndarray` object from the provided shape."""
31        if order is not None:
32            raise RuntimeError(
33                f"{type(self).__qualname__} supports only C-order arrays;"
34                " 'order' must be None"
35            )
36        if not isinstance(shape, Iterable):
37            shape = (shape,)
38        self._shape = tuple(int(dim) for dim in shape)
39        if dtype is None:
40            dtype = float64
41        elif isinstance(dtype, type):
42            dtype = globals()[
43                f"{dtype.__name__}{'32' if dtype != builtins.bool else ''}"
44            ]
45        else:
46            dtype = globals()[dtype]
47        self._dtype = dtype
48        self._itemsize = int(_convert_dtype(dtype, "short")[-1])
49        self._offset = int(offset)
50        if buffer is None:
51            self._base = None
52            if self._offset != 0:
53                raise ValueError("Offset must be 0 when buffer is None")
54            if strides is not None:
55                raise ValueError("Buffer is None; strides must be None")
56            self._strides = calc_strides(self._shape, self.itemsize)
57        else:
58            if isinstance(buffer, ndarray) and buffer.base is not None:
59                buffer = buffer.base
60            self._base = buffer
61            if isinstance(buffer, ndarray):
62                buffer = buffer.data
63            if self._offset < 0:
64                raise ValueError("Offset must be non-negative")
65            if strides is None:
66                strides = calc_strides(self._shape, self.itemsize)
67            elif not (
68                isinstance(strides, tuple)
69                and all(isinstance(stride, int) for stride in strides)
70                and len(strides) == len(self._shape)
71            ):
72                raise ValueError("Invalid strides provided")
73            self._strides = tuple(strides)
74        buffersize = self._strides[0] * self._shape[0] // self._itemsize
75        buffersize += self._offset
76        Buffer = _convert_dtype(dtype, "ctypes") * buffersize
77        if buffer is None:
78            if not isinstance(Buffer, str):
79                self._data = Buffer()
80        elif isinstance(buffer, ctypes.Array):
81            self._data = Buffer.from_address(ctypes.addressof(buffer))
82        else:
83            self._data = Buffer.from_buffer(buffer)

Note

This isn’t the full-fat version of the implementation. I’ve skimmed over a lot of the gory details for brevity. If you want to get into the weeds, check out the full xsnumpy.ndarray class on GitHub.

Deconstructing ndarray

Alright, let me break this down in a way that makes sense. First up, the shape of the array. I started by checking if the shape was an instance of collections.abc.Iterable. Basically, if it was a tuple or a list. If it wasn’t, I wrapped it in a tuple, making sure the shape always looked like a tuple. Then, I converted the shape into a tuple of integers, because let’s face it, you can’t have non-integer dimensions knocking about in an array.

1    if not isinstance(shape, Iterable):
2        shape = (shape,)
3    self._shape = tuple(int(dim) for dim in shape)

Next up, the dtype (data type). If you didn’t provide a dtype, the constructor would default it to None. If a type (such as int or a float) is provided, it dynamically retrieves the appropriate data type from the global namespace using globals(). This nifty trick meant I could dynamically fetch whatever data type you fancied.

Once resolved, the data type was assigned to self._dtype.

1    if dtype is None:
2        dtype = float64
3    elif isinstance(dtype, type):
4        dtype = globals()[
5            f"{dtype.__name__}{'32' if dtype != builtins.bool else ''}"
6        ]
7    else:
8        dtype = globals()[dtype]
9    self._dtype = dtype

Now, the size of each element in the array. I wrote a neat little function called _convert_dtype. Its job? To fetch the size of the given data type in its shortest format. This is super important for calculating memory layout and strides.

1    self._itemsize = int(_convert_dtype(dtype, "short")[-1])

Right, on to the buffer. If no buffer was provided, the array was initialised without an external memory buffer. In this case:

  • The offset must be zero

  • Strides must also be None

The constructor would then calculate the strides, which, put simply, are just the number of bytes between consecutive elements in memory.

1    if buffer is None:
2        self._base = None
3        if self._offset != 0:
4            raise ValueError("Offset must be 0 when buffer is None")
5        if strides is not None:
6            raise ValueError("Buffer is None; strides must be None")
7        self._strides = calc_strides(self._shape, self.itemsize)

But what if a buffer was provided?

Well, then it got a bit trickier. The constructor checked if the buffer was another xsnumpy.ndarray. If it was, it nabbed the base buffer. The buffer was assigned to self._base, and the strides were either given directly or calculated. Before moving on, the constructor did a bit of housekeeping:

  • Offset (starting point in the memory) had to be non-negative

  • Strides had to be a tuple of integers matching the shape’s dimensions otherwise, the whole thing would fall apart

 1    else:
 2        if isinstance(buffer, ndarray) and buffer.base is not None:
 3            buffer = buffer.base
 4        self._base = buffer
 5        if isinstance(buffer, ndarray):
 6            buffer = buffer.data
 7        if self._offset < 0:
 8            raise ValueError("Offset must be non-negative")
 9        if strides is None:
10            strides = calc_strides(self._shape, self.itemsize)
11        elif not (
12            isinstance(strides, tuple)
13            and all(isinstance(stride, int) for stride in strides)
14            and len(strides) == len(self._shape)
15        ):
16            raise ValueError("Invalid strides provided")
17        self._strides = tuple(strides)

Finally, calculating the total buffer size. This was worked out using the strides, shape, and item size. The buffer itself was a type derived from the data type (dtype) and its size. Depending on whether a buffer was passed or not, the constructor handled it like so:

  • If no buffer is provided, a new buffer is created

  • If the buffer is a ctypes.Array, the address of the buffer is used to initialise the data. Basically, we use its address like a map

  • If it’s any other type of buffer, the buffer is used directly

Phew 😮‍💨 that was a fair bit, wasn’t it?

But now you can see how all the pieces fit together. From handling shapes and data types to calculating strides and buffers. It’s all a bit mad when you first dive in, but once you get the hang of it, it starts clicking into place.

The “easy peasy” stuff

Like I said before, I wanted to build a tiny version of NumPy. It was my clear and straightforward goal. Start small, build arrays, and then add the fancy operations like matrix multiplication, broadcasting, and so on. What took me by surprise was the fact that how challenging things were, which I thought to be “easy peasy”. Things like writing a repr() or overriding the built-in methods.

I remember talking to myself one morning, “let’s start with something dead easy, perhaps just display the array.” That couldn’t be hard, right? All I need to do is print the content of my array in a readable format how NumPy does. Little did I know I was shooting myself in the foot. At its core, a repr() is just an object’s internal data representation. I started with something like this…

1def __repr__(self) -> str:
2    return f"array({self._data}, dtype={self.dtype.__str__()})"

Sure, it worked for a scalar. But what about vectors? With some adjustments, I got it working for 1D arrays. Feeling chuffed, I tried a 2D array. Suddenly, it printed everything as a flat list. I realised that I hadn’t accounted rows and columns in my initial implementation. No problem, I updated the code slightly to make it work and after some initial struggles, I got it working… just about!

Then the 3D arrays… and it broke again.

That’s when it hit me, this wasn’t just about formatting strings. I needed a proper solution that would work with any number of dimensions. A few days later, I found myself deep into recursive logic and multi-dimensional indexing, all for what I believed was a “easy peasy” print function. Now the problem wasn’t just getting this thing to work but rather making sure it worked consistently across all the possible array shapes. What I thought would take an hour or two dragged on for days.

But finally, I cracked it!

Note

See __repr__ for complete implementation details.

Just when I thought the hard part was done and dusted, I moved on to array indexing which is perhaps one of the biggest superpowers of NumPy. At first, I assumed this would be easy too, and it worked… partly.

1def __getitem__(self, index) -> t.Any:
2    row, column = index
3    flat = row * self.shape[1] + column
4    return self.data[flat]

When I tried a slice like array[:, 1], it broke. When I tried with higher-dimensional arrays, it fell apart! With each new test case, it became pretty obvious that there were some significant flaws in my logic. I wasn’t just building some way to access data, I was constructing a flexible system needed to mirror NumPy’s powerful, intuitive indexing.

Deep sigh meme

After days of trial and error, I finally realised, these so-called “easy peasy” methods were actually sly little gateways into NumPy’s deeper design philosophies:

  • Consistency. Whether you’re tinkering with 1D, 2D, or N-D arrays, the operations should behave like clockwork, no surprises, Sherlock!

  • Efficiency. Slices and views shouldn’t faff about copying data willy-nilly, they ought to create references, keeping things lean and mean.

  • Extensibility. Indexing had to be nimble enough to handle both the simple stuff (array[1, 2]) and the proper head-scratchers ( array[1:3, ...]).

What kicked off as a laid-back attempt to rework repr() and other important methods ended up being a right masterclass in designing for generality. I wasn’t just sorting out the easy bits, I had to step back and think like a “library designer”, anticipating edge cases and making sure the whole thing didn’t crumble the moment someone tried something a tad clever. As of writing about xsNumPy, a couple of months later, this struggle taught me something profound, what seems super duper simple on the surface often hides massive complexity underneath.

And that’s exactly why building xsNumpy has been so powerful for my learning.

Illusion of simplicity

Well, after wrestling with the “simple” things, I naively thought th hardest and, in all honesty, the boring part of xsNumPy was behind me. I was chuffed and more excited than ever before for the “fun” stuff element-wise arithmetics, broadcasting, and other random functions. What I didn’t realise was that my journey was about to get even more mental. If implementing the xsnumpy.ndarray class was untangling a knot, matrix operations felt like trying to weave my own thread from scratch. Not sure if that makes sense.

But the point was, it was hard!

If you’ve read it till this point, you might’ve noticed a trend in my thought process. I assume things to be quite simple, which they bloody aren’t, and I start small. This was nothing different. I started simple, at least that’s what I thought. Basic arithmetic operations like addition, subtraction, and scalar multiplication seemed relatively straight. I figured I could just iterate through my flattened data and perform operations element-wise. And it worked… for the first few test cases.

 1def __add__(self, other: ndarray | int | builtins.float) -> ndarray:
 2    """Perform element-wise addition of the ndarray with a scalar or
 3    another ndarray.
 4
 5    This method supports addition with scalars (int or float) and
 6    other ndarrays of the same shape. The resulting array is of the
 7    same shape and dtype as the input.
 8
 9    :param other: The operand for addition. Can be a scalar or an
10        ndarray of the same shape.
11    :return: A new ndarray containing the result of the element-wise
12        addition.
13    :raises TypeError: If `other` is neither a scalar nor an
14        ndarray.
15    :raises ValueError: If `other` is an ndarray but its shape
16        doesn't match `self.shape`.
17    """
18    arr = ndarray(self.shape, self.dtype)
19    if isinstance(other, (int, builtins.float)):
20        arr[:] = [x + other for x in self._data]
21    elif isinstance(other, ndarray):
22        if self.shape != other.shape:
23            raise ValueError(
24                "Operands couldn't broadcast together with shapes "
25                f"{self.shape} {other.shape}"
26            )
27        arr[:] = [x + y for x, y in zip(self.flat, other.flat)]
28    else:
29        raise TypeError(
30            f"Unsupported operand type(s) for +: {type(self).__name__!r} "
31            f"and {type(other).__name__!r}"
32        )
33    return arr

But, as always, the system collapsed almost immediately for higher-dimensional vectors. What if I added a scalar to a matrix? Or a (3,) array to a (3, 3) matrix? Could I add floats to ints? I mean, this lot works in normal maths, right? Each new “simple” operation posed a challenge in itself. I realised I wasn’t just adding or multiplying numbers but recreating NumPy’s broadcasting rules.

Trust me, lads, nothing compared to the chaos caused by the matrix multiplication. Whilst coding the initial draft of the __matmul__, I remember discussing this with my mate, Sameer. I thought it’d be just a matter of looping through rows and columns, summing them element-wise. Classic high school maths, if you ask me. And it worked as well… until I tried with higher-dimensional arrays. This is where I realised that matrix multiplication isn’t just about rows and columns but about correctly handling batch dimensions for higher-order tensors. I found myself diving into NumPy’s documentation, reading about the Generalised Matrix Multiplication (GEMM) routines and how broadcasting affects the output shapes.

Note

You can check out the complete implementation of arithmetic operations on GitHub.

Learn more

More than just code

This happened during the winter break. I didn’t have to attend uni and was working full-time on this project. After days of debugging, I realised that all of my vector operations weren’t about “getting the math right”, but they were about thinking like NumPy:

  • Shape manipulation. How do I infer the correct output shape?

  • Broadcasting. How can I extend the smaller arrays to fit the larger ones?

  • Efficiency. How can I minimise unnecessary data duplication?

At this stage, I wasn’t just rebuilding some scrappy numerical computing doppelgänger but rather a flexible and extensible system that could handle both the intuitive use cases and the weird edge cases. As I started thinking more along the lines of NumPy developers, I began coming up with broader and more general solutions. I realised for knotty problems, xsNumPy was slow… perhaps painfully slow. But it was mine. Unlike NumPy, which runs like The Flash which I can’t bloody see or understand, I understood every line of code. And with each iteration, every commit I made, I explored even more ways to optimise it, reducing redundant calculations, improving “pseudo-cache” locality.

Every bug, every unexpected result, and every small achievement taught me something new about NumPy and how it might be doing its magic behind the scenes. As time went by, xsNumPy became more than a project and a scrappy experiment. It became a mindset. It taught me to stop treating libraries as mysterious tools and start seeing them as collections of smartly packed algorithms and data structures waiting to be explored. Now, after countless late nights and endless debugging sessions, I finally reached a point where xsNumPy wasn’t just a dinky implementation but it had proper shape, form, and most importantly, it worked! What kicked off as a way to demystify NumPy had grown into something far bigger. A project that taught me more than I could’ve ever imagined about numerical computing.

So, what can xsNumPy actually do?

When I first started adding array creation methods to xsNumPy, I thought, how hard could it be? Just slap together a few initialisers, right? But, as always, reality gave me a proper wake-up call. It wasn’t just about making arrays appear; it was about ensuring they worked seamlessly with the whole system shapes, data types, and all.

  • array()

    The xsnumpy.array() function is the bread and butter of xsNumPy, the most flexible way to create arrays from Python lists or tuples.

    >>> import xsnumpy as xp
    >>>
    >>> xp.array([1, 2, 3])
    array([1, 2, 3])
    >>> xp.array([[1, 2, 3], [4, 5, 6]])
    array([[1, 2, 3],
           [4, 5, 6]])
    >>> xp.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
    array([[[1, 2],
            [3, 4]],
    
           [[5, 6],
            [7, 8]]])
    >>> xp.array([1, 2, 3.0])
    array([1. , 2. , 3. ])
    >>> xp.array([1, 0, 2], dtype=xp.bool)
    array([True, False, True])
    
  • zeros() and ones()

    I added xsnumpy.zeros() and xsnumpy.ones() as the go-to methods for initialising arrays filled with, well, zeros and ones. Simple, yet essential.

    >>> xp.zeros(3)
    array([0. , 0. , 0. ])
    >>> xp.zeros([2, 2])
    array([[0. , 0. ],
           [0. , 0. ]])
    >>> xp.ones([3, 2], dtype=xp.int32)
    array([[1, 1],
           [1, 1],
           [1, 1]])
    
  • full()

    For custom initialisation, xsnumpy.full() lets you fill an array with any value you want.

    >>> xp.full(2, 3, fill_value=3.14)
    array([[3.14, 3.14, 3.14],
           [3.14, 3.14, 3.14]])
    

    Here, I had to be mindful about handling scalars vs arrays, ensuring the fill_value was broadcastable when needed.

  • arange()

    Inspired by Python’s range, xsnumpy.arange() generates arrays with evenly spaced values.

    >>> xp.arange(3)
    array([0, 1, 2])
    >>> xp.arange(3.0)
    array([0. , 1. , 2. ])
    >>> xp.arange(3, 7)
    array([3, 4, 5, 6])
    >>> xp.arange(3, 7, 2)
    array([3, 5])
    >>> xp.arange(0, 5, 0.5)
    array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])
    

    The tricky part here? Making sure it worked with both integers and floats without rounding errors creeping in.

See also

Check out the complete list of array creation methods which are supported by xsNumPy on GitHub.

Once I had array creation sorted, I quickly realised that the real meat of xsNumPy lay in the operations, the arithmetic, element-wise manipulations, and the fundamental maths that give NumPy its power. It wasn’t just about adding two numbers or multiplying matrices; it was about making these operations flexible, intuitive, and most of all, consistent with how NumPy does it.

In xsNumPy, I implemented a range of arithmetic operations, carefully adhering to NumPy’s rules for broadcasting and type coercion.

  • Basic arithmetic

    You can perform element-wise addition, subtraction, multiplication, and division directly using xsNumPy arrays. Just like NumPy, these operations are broadcasted, so you can mix scalars, vectors, and matrices freely.

    >>> import xsnumpy as xp
    >>>
    >>> a = xp.array([[1, 0], [0, 1]])
    >>> b = xp.array([[4, 1], [2, 2]])
    >>>
    >>> a + b
    array([[5, 1],
           [2, 3]])
    >>> a - b
    array([[-3, -1],
           [-2, -1]])
    >>> a * b
    array([[4, 0],
           [0, 2]])
    >>> a / b
    array([[0.25, 0.  ],
           [0.  ,  0.5]])
    >>> a // b
    array([[0, 0],
           [0, 0]])
    >>> a ** b
    array([[1, 0],
           [0, 1]])
    >>> a % b
    array([[1, 0],
           [0, 1]])
    >>> a @ b
    array([[4, 1],
           [2, 2]])
    >>> a < b
    array([[True, True],
           [True, True]])
    >>> a >= b
    array([[False, False],
           [False, False]])
    

    The challenge here wasn’t the simple cases, it was ensuring that these operations worked for higher-dimensional arrays, and correctly handled broadcasting.

  • Broadcasting and arithmetic

    I had to dive deep into the logic of broadcasting. If you’ve ever wondered why adding a (3, 1) array to a (3, 3) matrix just works in NumPy, it’s all thanks to broadcasting rules. Implementing those rules was tricky, matching shapes, stretching smaller arrays, and making sure the output shape followed NumPy’s exact logic.

    >>> matrix = xp.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
    >>> column_vector = xp.array([[1], [2], [3]])
    >>> matrix + column_vector
    array([[ 2,  4,  6],
           [ 5,  7,  9],
           [ 8, 10, 12]])
    
  • Linear algebraic helper functions

    To mirror NumPy’s API, I also implemented explicit arithmetic functions. These are useful when you want to be very clear about the operation being performed or when you need more control over the parameters.

    >>> xp.dot(3, 4)
    12
    >>> a = xp.array([[1, 0], [0, 1]])
    >>> b = xp.array([[4, 1], [2, 2]])
    >>> xp.dot(a, b)
    array([[4, 1],
           [2, 2]])
    >>> xp.add(a, b)
    array([[5. , 1. ],
           [2. , 3. ]])
    >>> xp.divide(a, b)
    array([[0.25, 0.  ],
           [0.  ,  0.5]])
    >>> xp.power(3, 4)
    81
    
  • Scalar operations

    You’re not just limited to array-to-array operations, scalars work too, just as you’d expect.

    >>> xp.array([3, 4]) + 10
    array([13, 14])
    

See also

Check out more examples of the arithmetic operations supported by xsNumPy on GitHub.

Once I had nailed down array creation and operations, the next beast to tackle was shape manipulation. If there’s one thing I learned quickly, it’s that reshaping arrays isn’t just a matter of rearranging elements, it’s about understanding how data is stored and accessed under the hood.

In xsNumPy, I wanted to mirror NumPy’s intuitive and flexible shape manipulation methods, while also reinforcing my grasp of concepts like views, strides, and contiguous arrays.

Tip

Read more about NumPy internals here.

  • reshape()

    The xsnumpy.ndarray.reshape() method allows you to change an array’s shape without altering its data. The key was ensuring the total number of elements remained consistent, a simple yet crucial check.

    >>> import xsnumpy as xp
    >>>
    >>> a = xp.array([1, 2, 3, 4, 5, 6])
    >>> a.reshape((2, 3))
    array([[1, 2, 3],
           [4, 5, 6]])
    >>> a.reshape((2, 4))
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    ...
    ValueError: New shape is incompatible with the current size
    

    The tricky bit was handling corner cases, reshaping empty arrays, adding singleton dimensions, and ensuring reshaped arrays remain views (not copies) where possible.

  • transpose()

    Transposing is more than just flipping rows and columns; for higher-dimensional arrays, it’s about permuting the axes. The xsnumpy.ndarray.transpose() method does just that.

    >>> a = xp.array([[1, 2, 3], [4, 5, 6]])
    >>> a.transpose()
    array([[1, 4],
           [2, 5],
           [3, 6]])
    
  • flatten()

    The xsnumpy.ndarray.flatten() method returns a copy. Implementing this pushed me to understand memory alignment and stride tricks.

    >>> a = xp.array([[1, 2, 3], [4, 5, 6]])
    >>> a.flatten()
    array([1, 2, 3, 4, 5, 6])
    

These methods taught me the importance of shape manipulation, it’s not just about rearranging numbers but respecting how arrays interact with memory and computation. Each feature made me peel back yet another layer of NumPy’s magic, reinforcing my understanding while building xsNumPy piece by piece.

Indexing and slicing were, without a doubt, one of the most head-scratching features to implement in xsNumPy. What seemed like a simple task of grabbing an element or a subset of an array turned into a proper rabbit hole of possibilities, single-element access, slice objects, fancy indexing, boolean masks, the lot.

  • Basic indexing

    At its core, basic indexing in xsNumPy works similarly to NumPy, using zero-based indices to access elements. You can fetch single elements or entire subarrays.

    >>> import xsnumpy as xp
    >>>
    >>> a = xp.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
    >>> a[0, 1]
    2
    >>> a[1, 2]
    6
    

    You can also use negative indices to count from the end of an array.

    >>> a = xp.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
    >>> a[-1, -2]
    8
    
  • Slicing

    Slicing allows you to extract subarrays using a start:stop:step format. Just like NumPy, xsNumPy supports all the classic slicing mechanics.

    >>> a = xp.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
    >>> a[1:2]
    array([[4, 5, 6]])
    >>> a[:2]
    array([[1, 2, 3],
           [4, 5, 6]])
    >>> a[::2]
    array([[1, 2, 3],
           [7, 8, 9]])
    >>> a[:2, 1:]
    array([[2, 3],
           [5, 6]])
    >>> a[::2, ::2]
    array([[1, 3],
           [7, 9]])
    
  • Boolean masking

    This was a added surprise. I honestly, didn’t engineer this one but since, xsNumPy now functions more generally, it allows features like Boolean masking. Boolean masking lets you select elements based on a condition.

    >>> a[a % 2 == 0]
    array([1, 2, 3])
    

Implementing indexing and slicing wasn’t just about grabbing elements, it was about ensuring the shapes stayed correct, broadcasting rules were respected, and that corner cases (like empty slices or out-of-bounds indices) didn’t cause the whole system to collapse. It took a lot of late nights and a fair bit of trial and error to make sure xsNumPy worked as closely as possible to NumPy.

See also

Indexing and slicing were implemented by overridding the standard __getitem__ and __setitem__ protocols. To see the complete implementation and other complementary methods, visit here.

After wrangling with array creation, operations, shape manipulation, and indexing, I found myself standing at the gates of reductions, those neat little methods that take an array and distil it down to a single value or a smaller array. Sounds straightforward, right? Well, not quite.

Reductions in xsNumPy were a real eye-opener. They forced me to think deeply about axes, and handling empty arrays, all while ensuring my logic matched the intuitive elegance of NumPy.

  • sum()

    The xsnumpy.sum() method computed the sum of elements along a given axis. The tricky part? Handling multi-dimensional arrays.

    >>> import xsnumpy as xp
    >>>
    >>> a = xp.array([[1, 2, 3], [4, 5, 6]])
    >>> a.sum()
    21
    >>> a.sum(axis=0)
    array([5, 7, 9])
    
  • min() and max()

    Finding minimum and maximum values sounds simple, but reducing along the axes with proper shape handling kept me busy for a while.

    >>> a = xp.array([[1, 2, 3], [4, 5, 6]])
    >>> a.min()
    1
    >>> a.max(axis=1)
    array([3, 6])
    
  • mean()

    Calculating the mean was more than just summing and dividing, I needed to ensure type consistency and careful shape adjustments.

    >>> a = xp.array([[1, 2, 3], [4, 5, 6]])
    >>> a.mean()
    3.5
    >>> a.mean(axis=1)
    array([2. , 5. ])
    
  • prod()

    The xsnumpy.prod() (product) method computed the multiplication of elements along a given axis. Multiplying elements together was the final boss of reductions. As simple as it may sound, I had to think through the overflow errors and correct data types.

    >>> a = xp.array([[1, 2, 3], [4, 5, 6]])
    >>> a.prod()
    720
    >>> a.prod(axis=0)
    array([ 4, 10, 18])
    
  • any() and all()

    Logical reductions were their own beast. The xsnumpy.all() method checks if all elements are True, while xsnumpy.any() checks if at least one is.

    >>> b = xp.array([[True, False, True], [True, True, False]])
    >>> b.all()
    False
    >>> b.any(axis=1)
    array([True, True])
    

Building reductions in xsNumPy pushed me to think harder about how arrays collapse along dimensions and how NumPy seamlessly handles type promotion and shape consistency. It’s not just about computing a value, it’s about ensuring the result fits neatly into the broader array ecosystem.

With reductions wrapped up, xsNumPy finally started to feel like a real numerical computing library. Every sum, min, and mean wasn’t just a calculation, it was a carefully crafted operation built on a solid foundation.

Concluding xsNumPy

Now, I won’t pull the wool over your eyes, xsNumPy isn’t a blazing-fast, industrial-strength library, nor was it ever meant to be. But every line of code carries the weight of a battle fought, a bug squashed, a concept unravelled, a little victory earned. It’s a project born out of pure curiosity and a stubborn desire to lift the bonnet on a tool I use daily. More than just its features, xsNumPy reflects a mindset, the belief that the best way to learn is by rolling up your sleeves, building something from scratch, breaking it, then putting it back together, piece by piece.

Victory shall be mine meme from Family Guy

This experience taught me to stop seeing libraries as mystical black boxes and start recognising them for what they are. And for me, that’s the real win of demystifying complex libraries one line at a time!