Introduction

Python is widely used for scientific computing, data analysis, and machine learning tasks due to its ease of use and rich ecosystem. NumPy, a popular library for numerical computations in Python, provides a powerful array object and a variety of functions to operate on these arrays. However, performance can sometimes be a bottleneck when working with large datasets or complex operations.

Cython, a superset of the Python language, allows you to write Python code with optional C-like syntax and static types, which can then be compiled to C and executed as a native extension. This can lead to significant performance improvements compared to pure Python code.

In this blog post, we will demonstrate how to use Cython with NumPy to optimize matrix operations, and compare the performance of pure Python and Cython-accelerated implementations.

Prerequisites

To follow along with the examples in this post, make sure you have both Cython and NumPy installed:

pip install cython numpy

Example: Element-wise Matrix Multiplication

Let’s start by implementing element-wise matrix multiplication using Cython with NumPy.

Step 1: Create a Cython file

Create a Cython file called matrix_mult_cython.pyx with the following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# matrix_mult_cython.pyx
import numpy as np
cimport numpy as cnp

cpdef cnp.ndarray[double, ndim=2] elementwise_multiply(cnp.ndarray[double, ndim=2] A, cnp.ndarray[double, ndim=2] B):
cdef int nrows = A.shape[0]
cdef int ncols = A.shape[1]
cdef cnp.ndarray[double, ndim=2] result = np.zeros((nrows, ncols), dtype=np.float64)

cdef int i, j
for i in range(nrows):
for j in range(ncols):
result[i, j] = A[i, j] * B[i, j]

return result

Note: In Cython, the cimport statement is used to import C-level declarations from other Cython modules or libraries. It is similar to the regular Python import statement, but it specifically deals with importing C-level functions, types, and other constructs that are not part of the Python runtime.

In the provided example, the line cimport numpy as cnp is used to import the Cython definitions for the NumPy library. This allows us to interact with NumPy arrays more efficiently, leading to performance improvements. The cnp alias is used in the same way as np for regular Python NumPy imports, but it refers to the C-level NumPy constructs.

Step 2: Compile the Cython module

Create a setup.py file to build the Cython module:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# setup.py
from setuptools import setup, Extension
from Cython.Build import cythonize
import numpy as np

ext_modules = [
Extension("matrix_mult_cython", ["matrix_mult_cython.pyx"],
include_dirs=[np.get_include()]) # Add the NumPy header files
]

setup(
name='Matrix Multiplication Cython Example',
ext_modules=cythonize(ext_modules),
zip_safe=False,
)

Compile the Cython module by running the following command in your terminal:

python setup.py build_ext --inplace

Note: The zip_safe parameter is an option indicating whether the package can be safely installed and run from a zip archive without being extracted to the file system.

When zip_safe is set to True, it means that the package can be installed and run directly from a zip archive without any issues. However, when set to False, it indicates that the package needs to be extracted to the file system before being executed.

In the case of Cython-compiled extensions, it is generally recommended to set zip_safe=False. This is because Cython generates compiled C extensions (shared libraries or DLLs) that need to be accessed by the operating system’s dynamic loader, which often cannot read files from a zip archive.

Step 3: Use the Cython module in your Python code

Now, you can use the compiled Cython module in your Python code:

1
2
3
4
5
6
7
8
9
10
11
12
13
# main.py
import numpy as np
from matrix_mult_cython import elementwise_multiply

def main():
A = np.random.rand(1000, 1000)
B = np.random.rand(1000, 1000)

result = elementwise_multiply(A, B)
print(result)

if __name__ == "__main__":
main()

Run your Python code:

python main.py

Performance Comparison

To compare the performance before and after using Cython, we can implement the same element-wise multiplication using pure Python with NumPy and then measure the execution time for both the pure Python and Cython-accelerated implementations.

Pure Python Implementation

Add a pure Python implementation of element-wise multiplication in main.py:

1
2
3
4
5
6
7
8
9
10
11
def elementwise_multiply_python(A, B):
assert A.shape == B.shape, "Both matrices must have the same shape."

nrows, ncols = A.shape
result = np.zeros((nrows, ncols), dtype=np.float64)

for i in range(nrows):
for j in range(ncols):
result[i, j] = A[i, j] * B[i, j]

return result

Timing the Implementations

Add timing code to main.py to measure the execution time of both the pure Python and Cython-accelerated implementations:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import time

def main():
A = np.random.rand(1000, 1000)
B = np.random.rand(1000, 1000)

# Time the pure Python implementation
start_python = time.time()
result_python = elementwise_multiply_python(A, B)
end_python = time.time()
elapsed_python = end_python - start_python

# Time the Cython-accelerated implementation
start_cython = time.time()
result_cython = elementwise_multiply(A, B)
end_cython = time.time()
elapsed_cython = end_cython - start_cython

print(f"Pure Python: {elapsed_python:.5f} seconds")
print(f"Cython-accelerated: {elapsed_cython:.5f} seconds")

if __name__ == "__main__":
main()

Run the Python code:

python main.py

You should see the execution time of both the pure Python implementation and the Cython-accelerated implementation. Typically, the Cython-accelerated implementation should be significantly faster, as it leverages C-level array access and efficient looping. This is the result on my machine:

Conclusion

In this blog post, we demonstrated how to use Cython with NumPy to optimize matrix operations, specifically element-wise matrix multiplication. We also compared the performance of pure Python and Cython-accelerated implementations, showing that Cython can provide significant performance improvements.

Cython is a powerful tool for optimizing Python code that relies on numerical computations, especially when used in conjunction with libraries like NumPy. By taking advantage of C-level array access and efficient looping, you can achieve substantial speedups in your scientific computing, data analysis, and machine learning tasks.