Numpy Part I

numpy is one of the important libraries when it comes to data science and machine learning.

what is numpy?

numpy is a python library used for working with arrays . Numpy provides an array object that is much faster than traditional Python lists.

why use numpy arrays over lists?

Now you would think that in python If lists are present then why do we need arrays? numpy arrays are stored in contiguous memory location unlike lists. this provides an fast and efficient way of working with data

how to install numpy?

first install pip

for windows

if you don't have pip installed which is a package manager for python , download it from here. after that navigate to the folder where you downloaded the file and open a command prompt there and type the following in command prompt

python get-pip.py

for linux

type the following command

sudo apt install python3-pip

now install numpy

type the following command in your command prompt

pip install numpy

get started with numpy

import numpy
arr = numpy.array([1,2,3,4,5,6])          #here arr is a numpy array object
print(arr)

numpy arrays

1-D array

these are the most common and basic arrays

import numpy as np
arr = np.array([1,2,3,4,5])            #here arr is a 1-D array containing elements 1,2,3,4,5
print(arr)

2-D array

array containing 1-D arrays as its elements is known as a 2-D array. 2-D arrays have rows and columns and are usually used to represent matrices.

import numpy as np
arr = np.array([[1,2,3],[4,5,6]])      #here arr is a 2-D array 
print(arr)

numpy ndim

numpy provides a ndim property to know the dimensions of a particular array

import numpy as np
a = np.array(50)
b = np.array([1,2,3,4])
c = np.array([1,2,3,4],[5,6,7,8])
print(a.ndim)                             #prints 0 because it has 0 dimensions
print(b.ndim)                             #prints 1 because it has 1 dimension
print(c.ndim)                              #prints 2 because it has 2 dimensions

Accessing Array elements

you can access an array element by referencing its index number

indexes start with 0 , meaning that first element has index 0 , second element has index 1 and so on.

accessing 1-D array elements

import numpy as np
arr = np.array([1,2,3,4]
print(arr[2])                        #here it prints 3 as the element at index 2 is 3

accessing 2-D array elements

import numpy as np
arr = np.array([[1,2,3],[4,5,6]])

in the above example to get the element with the value of 5 we would use the following indexing :

print(arr[1,1])                        #prints 5

the first index is the row number and the second index is the column number

we can also use slicing method to access the element

import numpy as np
arr = np.array([1,2,3,4,5,6])
print(arr[0:4])                                 #prints 1,2,3,4

same can be used with 2-D arrays

import numpy as np
arr = np.array([[1,2,3],[4,5,6]])
print(arr[1,0:2])                            #prints 4,5

Array Shape

numpy arrays have a property named shape that returns a tuple with first index representing the number of dimensions and second index representing number of elements each dimension has

first index = number of dimensions

second index = number of elements each dimension has

import numpy as np
arr = np.array([1,2,3,4],[5,6,7,8])
print(arr.shape)                                    #prints (2,4)

here (2,4) means it has 2 dimensions (rows) and each dimension having 4 elements

Array Reshape

reshaping means to change the shape of the array

numpy provides a reshape property for the same

Reshape 1-D array to 2-D array

import numpy as np
arr = np.array([1,2,3,4,5,6,7,8,9,10])
new_arr = arr.reshape(2,5)               
print(new_arr)                                    #prints [[1,2,3,4,5],[6,7,8,9,10])

here , it converts a 1-D array with 10 elements into a 2-D array with 2 rows and 5 elements in each row

Important Note - while reshaping it's mandatory that original array and modified array has same number of elements

We can reshape a 1D array with 8 elements into 2D array with 4 elements in 2 rows but we cannot reshape it into a 2D array of 3 elements 3 rows as that would require 3x3 = 9 elements.

array join

Joining means putting contents of two or more arrays into a single array

for this , numpy provides a concatenate method

import numpy as np
arr1 = np.array([1,2,3])
arr2 = np.array([4,5,6])
final_arr = np.concatenate(arr1,arr2)
print(final_arr)                                        #prints1,2,3,4,5,6

using Numpy stack to join

Stacking is same as concatenation, the only difference is that stacking is done along a new axis.

We can concatenate two 1-D arrays along the second axis which would result in putting them one over the other

import numpy as np
arr1 = np.array([1,2,3])
arr2 = np.array([4,5,6])
final_arr = np.stack(arr1,arr2,axis=1)
print(final_arr)                                  #prints [[1,4],[2,5],[3,6]]

if no value is passed to axis , it is considered as 0 i.e axis = 0

array splitting

Splitting is the reverse operation of Joining

numpy provides array_split() for splitting arrays

array_split(array,number of splits)

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6])

newarr = np.array_split(arr, 3)

print(newarr)                                  #prints [array([1, 2]), array([3, 4]), array([5, 6])]

Same can be used with 2-D arrays

Searching arrays

using where method

import numpy as np
arr = np.array([1,2,3,2,4,5,6,2,9])
x = np.where(arr == 2)
print(x)                                        #prints (array([1, 3, 7], )

in the above example it searches for elements having value 2 and returns the indexes of the same

numpy searchsorted

searchsorted() performs a binary search in the array, and returns the index where the specified value would be inserted to maintain the search order.

import numpy as np

arr = np.array([6, 7, 8, 9])

x = np.searchsorted(arr, 7)

print(x)                                  #prints 1

Note - The searchsorted() method is assumed to be used on sorted array

search from the right side

By default the left most index is returned, but we can give side='right' to return the right most index instead.

import numpy as np

arr = np.array([6, 7, 8, 9])

x = np.searchsorted(arr, 7, side='right')

print(x)                                       #prints 2

Sorting arrays

Sorting means putting elements in an ordered sequence.

import numpy as np

arr = np.array([3, 2, 0, 1])

print(np.sort(arr))                                #prints [0,1,2,3]

if sort() is used for 2-D arrays then both the arrays will be sorted

Filtering arrays

Getting some elements out of an existing array and creating a new array out of them is called filtering.

In numPy, you filter an array using a boolean index list.

import numpy as np

arr = np.array([41, 42, 43, 44])

x = [True, False, True, False]

newarr = arr[x]

print(newarr)          #prints [41,43]

If the value at an index is True that element is contained in the filtered array, if the value at that index is False that element is excluded from the filtered array.

array copy vs array view

The main difference between a copy and a view of an array is that the copy is a new array, and the view is just a view of the original array.

Copy

import numpy as np

arr = np.array([1, 2, 3, 4, 5])
x = arr.copy()
arr[0] = 42

print(arr)                  #prints [42,2,3,4,5]
print(x)                     #prints [1,2,3,4,5]

View

import numpy as np

arr = np.array([1, 2, 3, 4, 5])
x = arr.view()
arr[0] = 42

print(arr)                      #prints [42,2,3,4,5]
print(x)                         #prints [42,2,3,4,5]

The copy owns the data and any changes made to the copy will not affect original array, and any changes made to the original array will not affect the copy.

The view does not own the data and any changes made to the view will affect the original array, and any changes made to the original array will affect the view.

Check if Array Owns it's Data

copies owns the data, and views does not own the data, but how can we check this?

Every numpy array has the attribute base that returns None if the array owns the data.

Otherwise, the base attribute refers to the original object.

import numpy as np

arr = np.array([1, 2, 3, 4, 5])

x = arr.copy()
y = arr.view()

print(x.base)                   #prints None
print(y.base)                    #prints [1,2,3,4,5]

hope you find this tutorial interesting and this was just the part 1

many more interesting concepts are yet to come with part 2

if you are stuck with any problem or have any questions feel free to ping me on Twitter