Building xsNumPy
Learning really begins when you build your tools… one line at a time
It all started around mid-November 2024. I was working on something where I had to use NumPy. I was indexing arrays, multiplying matrices, and NumPy did NumPy things, like it always does. It just made all the computations look simple. However, beneath this simplicity, a few questions began to gnaw at me.
So, I found myself experimenting with a simple code, nothing fancy.
1a = np.array([[1, 2], [3, 4]])
2b = np.array([[5, 6], [7, 8]])
3np.dot(a, b)
The result popped out instantly. But this time, instead of accepting the
answer, I asked myself a bunch of questions. Like why was I even using
np.dot
when I could’ve used
np.matmul
, which feels more appropriate? And if
np.dot
indeed performs matrix multiplication, then what’s
the deal with the a @ b
(@
operator)? Why
are there three different ways to perform matrix multiplication, and if so,
which one’s the best?
Building with a purpose
Once I was motivated enough to learn the implementation details of NumPy, I was ready to build my own scrappy version. I didn’t set out to create something that’ll rival NumPy; I mean, NumPy’s a bloody powerhouse, built over decades of work by incredible minds in math and science, plus tons of optimisations. I couldn’t possibly compete with that. I just wanted to break free from treating these libraries as black boxes and truly understand the whys and hows.
This realisation hit me so hard that I challenged myself. Could I build a dinky version of NumPy from scratch? Because if I’m going to teach these concepts one day, I had to go deeper…

It felt a wee bit like baking a cake from scratch. Every ingredient, every step, had to be understood and chosen deliberately. If I was going to learn this properly, I needed discipline and some rules to follow.
Rules of engagement
No LLMs or AI assistance. Every line of code and every solution had to come from my own understanding and experimentation.
Pure Python only. No external dependencies, just the standard library.
Clean, statically typed, and well-documented code that mirrored NumPy’s public APIs, aiming to be a drop-in replacement where sensible.
Crafting my first array
After setting rules for myself, I started experimenting with NumPy’s core APIs,
trying to understand their functionality. It quickly became evident that most
of NumPy’s APIs heavily rely on a single core construct, the
np.array
function. It’s worth noting that this function
is a cheeky little wrapper for the np.ndarray
class.
That’s where I decided to start, implementing my own ndarray
data
structure.
Quick analogy
If you’re new to arrays, think of them as egg cartons, each slot holds an egg, and the shape of the carton tells you how many eggs you’ve got. Where your hand moves from one slot to the next are the strides; the type of eggs is the dtype; the carton itself is the buffer.
I had a basic understanding of an array. I always thought of it as a collection of numbers neatly organised in rows and columns. But, as I looked deeper and deeper, I discovered a whole lot of concepts, including memory allocation, shape calculations, strides, and various optimisation techniques for data storage. It felt like opening Pandora’s box!!
And I wasn’t ready…
After a few days of head-scratching, I managed to create a basic, albeit
minimal, working version using Python’s built-in ctypes
module. It
wasn’t pretty, but it worked.
1class ndarray:
2
3 def __init__(
4 self, shape, dtype=None, buffer=None, offset=0, strides=None
5 ):
6 if not isinstance(shape, Iterable):
7 shape = (shape,)
8 self._shape = tuple(int(dim) for dim in shape)
9 if dtype is None:
10 dtype = globals()[dtype]
11 self._dtype = dtype
12 self._itemsize = int(_convert_dtype(dtype, "short")[-1])
13 if buffer is None:
14 self._base = None
15 if self._offset != 0:
16 raise ValueError("Offset must be 0 when buffer is None")
17 if strides is not None:
18 raise ValueError("Buffer is None; strides must be None")
19 self._strides = calc_strides(self._shape, self.itemsize)
20 else:
21 if isinstance(buffer, ndarray) and buffer.base is not None:
22 buffer = buffer.base
23 self._base = buffer
24 if isinstance(buffer, ndarray):
25 buffer = buffer.data
26 if self._offset < 0:
27 raise ValueError("Offset must be non-negative")
28 if strides is None:
29 strides = calc_strides(self._shape, self.itemsize)
30 elif not (
31 isinstance(strides, tuple)
32 and all(isinstance(stride, int) for stride in strides)
33 and len(strides) == len(self._shape)
34 ):
35 raise ValueError("Invalid strides provided")
36 self._strides = tuple(strides)
37 buffersize = self._strides[0] * self._shape[0] // self._itemsize
38 buffersize += self._offset
39 Buffer = _convert_dtype(dtype, "ctypes") * buffersize
40 if buffer is None:
41 if not isinstance(Buffer, str):
42 self._data = Buffer()
43 elif isinstance(buffer, ctypes.Array):
44 self._data = Buffer.from_address(ctypes.addressof(buffer))
45 else:
46 self._data = Buffer.from_buffer(buffer)
Attention
I’ve intentionally removed a lot of details to keep things simple. Check
out the complete implementation of ndarray
on GitHub.
Making sense of shapes
I started by checking if the provided shape can be
iterated
. If it wasn’t, I wrapped it in
a tuple
. Then, I converted the shape into a tuple of
integers
, because you can’t have non-integer dimensions in an
array.
1if not isinstance(shape, Iterable):
2 shape = (shape,)
3self._shape = tuple(int(dim) for dim in shape)
Next up, the dtype
(short for data type). If you didn’t provide it, the
constructor would default it to None
. If a float
or an
int
is provided, it dynamically retrieves the appropriate data
type from the global namespace using globals()
. This nifty trick meant I
could dynamically fetch whatever data type you fancied.
1if dtype is None:
2 dtype = globals()[dtype]
3self._dtype = dtype
Right, on to the buffer
. If no buffer
was provided, the array was
initialised without an external memory buffer. In this case the offset
must
be zero and strides
must be None
. The constructor would then
calculate the strides, which, put simply, are just the number of bytes
between consecutive elements in memory.
1if buffer is None:
2 self._base = None
3 if self._offset != 0:
4 raise ValueError("Offset must be 0 when buffer is None")
5 if strides is not None:
6 raise ValueError("Buffer is None; strides must be None")
7 self._strides = calc_strides(self._shape, self.itemsize)
But what if a buffer was provided?
Well, then it got a bit trickier. It used the base buffer and the strides were either given directly or calculated.
1else:
2 if isinstance(buffer, ndarray) and buffer.base is not None:
3 buffer = buffer.base
4 self._base = buffer
5 if isinstance(buffer, ndarray):
6 buffer = buffer.data
7 if strides is None:
8 strides = calc_strides(self._shape, self.itemsize)
9 self._strides = tuple(strides)
Finally, calculating the total buffer
size. This was worked out using the
strides, shape, and item size. The buffer
itself was a type derived from
the data type and its size. Depending on whether a buffer was passed or not,
the constructor handled it accordingly, either creating a new buffer or using
the existing one.
Phew… that was a lot of work, wasn’t it?
Illusion of simplicity
After all that hard work, I thought of giving myself a break. I remembered telling myself, “Let’s start with something dead easy… perhaps just display the array.” I thought, “That couldn’t be hard, right? All I’ve to do is print the content of my array in a readable format, just like NumPy does.”
Little did I know, I was shooting myself in the foot. At its core, a
__repr__
is an object’s internal data representation.
I started with something simple, and it worked for scalars and 1D arrays.
1def __repr__(self):
2 return f"array({self._data}, dtype={str(self.dtype)})"
Feeling quite pleased and a bit cocky, I tried a 2D array, but it unexpectedly printed everything as a flat list. I realised I hadn’t accounted for the rows and columns. No problem, I updated the code and it worked!
1def __repr__(self):
2 if self.ndim == 1:
3 return f"array({self._data}, dtype={str(self.dtype)})"
4 elif self.ndim > 1:
5 rows = ",\n ".join(
6 [f"[{', '.join(map(str, row))}]" for row in self._data]
7 )
8 return f"array([{rows}], dtype={str(self.dtype)})"
Then the 3D arrays… and it broke again.
That’s when it hit me, this wasn’t just about formatting strings. I needed a general solution that would work with any number of dimensions. A few days later, I found myself deep into recursive logic and multi-dimensional indexing, all for what I believed was an easy print function.
What started as a chilled attempt to rework __repr__
turned out to be a masterclass in designing for generality. This struggle
taught me something profound… what seemingly appears simple on the surface
often hides massive complexity underneath.
And so, I realised, printing a NumPy array from scratch was a rabbit hole!!
See also
Complete implementation of __repr__
with helper functions.
More than meets the eye
After wrestling with the simple things, I naively believed the hardest part was behind me. I was excited for the fun stuff, like element-wise arithmetic, broadcasting, and other random functions. However, I didn’t realise my journey was about to get even more challenging.
Basic arithmetic operations like addition, subtraction, and scalar multiplication seemed straightforward. I figured I could just iterate through my flattened data and perform operations element-wise. And it worked… for the first few test cases. But, as always, the system collapsed almost immediately for higher-dimensional vectors.
1def __add__(self, other):
2 arr = ndarray(self.shape, self.dtype)
3 if isinstance(other, (int, float)):
4 arr[:] = [x + other for x in self._data]
5 elif isinstance(other, ndarray):
6 if self.shape != other.shape:
7 raise ValueError(
8 "Operands couldn't broadcast together with shapes "
9 f"{self.shape} {other.shape}"
10 )
11 arr[:] = [x + y for x, y in zip(self.flat, other.flat)]
12 else:
13 raise TypeError(
14 f"Unsupported operand type(s) for +: {type(self).__name__!r} "
15 f"and {type(other).__name__!r}"
16 )
17 return arr
What if I added a scalar to a matrix, or a (3,)
array to a (3, 3)
matrix? Could I add a float
to an int
? Each new
simple question posed a challenge in itself. I realised I wasn’t just adding or
multiplying numbers, but learning and recreating NumPy’s broadcasting rules.

Matrix multiplication was another beast entirely. I thought it would be just a matter of looping through rows and columns, summing them element-wise, classic high school mathematics, if you ask me. And it worked as well… until I tried with higher-dimensional arrays. This is where I realised that matrix multiplication isn’t just about rows and columns, but about correctly handling batch dimensions for higher-order tensors. I found myself diving into NVIDIA’s documentation, reading about the Generalised Matrix Multiplication (GEMM) routines and how broadcasting affects the output shapes.
See also
Complete implementation of arithmetic operations on GitHub.
Small victories, big lessons
Here comes December. I was in my winter break. I was fully committed to this project because I didn’t have to attend uni. After days of debugging, I realised that my vector operations weren’t just about getting the math right.
They were about thinking like NumPy:
How can I infer the correct output shape?
How can I broadcast arrays?
How can I minimise unnecessary data duplication?
At this stage, I wasn’t just rebuilding a scrappy numerical computing doppelganger like I thought of. I was creating a flexible and extensible system that could handle both intuitive and weird edge cases. With each iteration, every commit I made, I explored even more ways to optimise it, reducing redundant calculations.
Every bug, every unexpected result, and every small achievement taught me something new about NumPy. I started speculating about the magic behind the scenes. As time went by, xsNumPy became more than just a project and a scrappy experiment. It became a mindset, a belief that the best way to learn is by rolling up your sleeves, breaking it, and then putting it back together, piece by piece.
What can xsNumPy do?
xsNumPy started off as a learning exercise and has since grown into a small but reliable companion. It was not about speed but about clarity. Here’s a brief tour, without the scaffolding, to show what it already does well.
xsNumPy provides familiar ways to create arrays. These creation routines are consistent, predictable, and designed to slot neatly into later operations.
array()
Like NumPy, the
array
function is the bread and butter of xsNumPy as well. It’s the most flexible way to create arrays from Python lists or tuples with sensibledtype
inference and the option to set one explicitly.>>> import xsnumpy as xp >>> xp.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]]) array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]]) >>> xp.array([1, 0, 2], dtype=xp.bool) array([True, False, True])
zeros()
,ones()
, andfull()
xsNumPy support
zeros
,ones
, andfull
functions for repeatable initialisation of arrays filled with, zeros, ones, and anyfill_value
respectively.>>> xp.zeros(3) array([0. , 0. , 0. ]) >>> xp.ones([3, 2], dtype=xp.int32) array([[1, 1], [1, 1], [1, 1]]) >>> xp.full(2, 3, fill_value=3.14159) array([[3.14159, 3.14159, 3.14159], [3.14159, 3.14159, 3.14159]])
arange()
Inspired by Python’s
range
,arange
generates arrays with evenly spaced values.>>> xp.arange(0, 5, 0.5) array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])
See also
Check out all of array creation methods supported by xsNumPy on GitHub.
xsNumPy provides a range of arithmetic operations, carefully adhering to NumPy’s rules for broadcasting and type coercion. The emphasis is on correctness and clear behaviour across dimensions.
Element-wise arithmetic
xsNumPy supports element-wise addition, subtraction, multiplication, and division along with other basic arithmetics.
>>> a = xp.array([[1, 0], [0, 1]]) >>> b = xp.array([[4, 1], [2, 2]]) >>> a + b array([[5, 1], [2, 3]])
Broadcasting arithmetic
xsNumPy matches shapes, stretches smaller arrays, and makes sure the output shape followed NumPy’s exact logic. Just like NumPy, these operations are broadcasted.
>>> matrix = xp.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) >>> vector = xp.array([[1], [2], [3]]) >>> matrix + vector array([[ 2, 4, 6], [ 5, 7, 9], [ 8, 10, 12]])
Linear algebraic helper functions
To mirror NumPy’s API, xsNumPy supports explicit arithmetic functions. These are useful when you want to be very clear about the operation being performed or when you need more control over the parameters.
>>> a = xp.array([[1, 0], [0, 1]]) >>> b = xp.array([[4, 1], [2, 2]]) >>> xp.dot(a, b) array([[4, 1], [2, 2]])
Scalar operations
xsNumPy supports scalar operations as well so you’re not just limited to array-to-array operations.
>>> xp.array([3, 4]) + 10 array([13, 14])
See also
Check out examples of the arithmetic operations supported by xsNumPy on GitHub.
xsNumPy provides essential shape manipulation APIs that are predictable and memory-aware. The emphasis is on clarity of intent and avoiding unnecessary data duplication. Think of this as learning to fold and unfold the same fabric without tearing it.
Tip
Read more about NumPy internals here.
.reshape()
The
reshape
method changes the view of data when possible, preserving the total element count.>>> a = xp.array([1, 2, 3, 4, 5, 6]) >>> a.reshape((2, 3)) array([[1, 2, 3], [4, 5, 6]])
.transpose()
Transposing is more than just flipping rows and columns; for higher-dimensional arrays, it’s about permuting the axes. The
transpose
method does just that.>>> a = xp.array([[1, 2, 3], [4, 5, 6]]) >>> a.transpose() array([[1, 4], [2, 5], [3, 6]])
.flatten()
The
flatten
method returns a tidy 1D copy.>>> a = xp.array([[1, 2, 3], [4, 5, 6]]) >>> a.flatten() array([1, 2, 3, 4, 5, 6])
Indexing is expressive and disciplined in xsNumPy, just like NumPy. The goal is to provide intuitive access to elements and subarrays while maintaining clarity about the underlying data structure.
Attention
Indexing and slicing were implemented by overridding the standard
__getitem__
and
__setitem__
protocols.
Basic indexing
At its core, basic indexing in xsNumPy works similarly to NumPy, using zero-based indices to access elements. You can fetch single elements or entire subarrays. You can also use negative indices to count from the end of an array.
>>> a = xp.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) >>> a[1, 2] 6 >>> a[-1, -2] 8
Slicing
Slicing allows you to extract subarrays using a
a[start:stop:step]
format. Just like NumPy, xsNumPy supports almost all the classic slicing mechanics.>>> a = xp.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) >>> a[::2] array([[1, 2, 3], [7, 8, 9]]) >>> a[:2, 1:] array([[2, 3], [5, 6]])
Boolean masking
Boolean masking lets you select elements based on a condition.
>>> a[a % 2 == 0] array([1, 2, 3])
See also
Check out the complete implementation here on GitHub.
Reductions condense information carefully, preserving the essence of the data. xsNumPy provides a few key reduction operations that are predictable and consistent.
.sum()
The
sum
method computes the sum of elements along a given axis.>>> a = xp.array([[1, 2, 3], [4, 5, 6]]) >>> a.sum() 21 >>> a.sum(axis=0) array([5, 7, 9])
.prod()
The
prod
(product) method computes the multiplication of elements along a given axis.>>> a = xp.array([[1, 2, 3], [4, 5, 6]]) >>> a.prod() 720 >>> a.prod(axis=0) array([ 4, 10, 18])
.any()
and.all()
The
all
method checks if all elements areTrue
, whileany
checks if at least one is.>>> b = xp.array([[True, False, True], [True, True, False]]) >>> b.all() False >>> b.any(axis=1) array([True, True])
From notes to community
Now, fast forward to March 2025, this project felt like more of a conversation than code. I shared my story at ChiPy in a talk titled “xsNumPy: Curiosity to Code”, walking through the decisions, the missteps, and the insights that stayed with me.
The presentation covered the technical challenges, mathematical discoveries, and most importantly, the mindset shift from viewing libraries as opaque entities to understanding them as collections of elegant algorithms waiting to be explored.
Looking back, moving forward
xsNumPy didn’t aim for performance, that wasn’t the plan anyway. It aimed for understanding. It taught me to replace awe with attention, trusting libraries while still learning and understanding their core concepts with care. Most importantly, it reminded me that doing something by yourself is perhaps the best teaching and learning experience.
I intend to keep refining the library in small, respectful steps whenever I’ll get time. However, the larger work is already done. I re-learnt the essentials by making them, and that learning will travel with me far beyond this code.