今天写了些比较轻松的内容，Numpy的基本操作，总共拆分为了三部分，这是第一部分。

Numpy是很好用的数据挖掘、分析的包，这里写了一些基本简单的操作。很多可以参考，其中有些操作博主认为使用Pandas操作更为简单和友好。以下内容仅供大家参考。

代码

1	import numpy as np

1.新建一维数组

1	np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

2.新建3*3布尔数组

1	np.full((3,3), True, dtype=bool)

array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]], dtype=bool)

1	np.ones((3,3), dtype=bool)

array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]], dtype=bool)

3.按条件抽取一维数组的数据

1 2	arr = np.arange(10) arr[arr % 2 == 1]

array([1, 3, 5, 7, 9])

4.按条件修改一维数组的数据

1
2
3

arr = np.arange(10)
arr[arr % 2 == 1] = -1
arr

array([ 0, -1,  2, -1,  4, -1,  6, -1,  8, -1])

5.按条件替代一维数组值，同时维持原数组值不变

arr = np.arange(10)
out = np.where(arr % 2 == 1, -1 ,arr)
print(arr)
print(out)

[0 1 2 3 4 5 6 7 8 9]
[ 0 -1  2 -1  4 -1  6 -1  8 -1]

6.reshape数组

1 2	arr = np.arange(10) arr.reshape(2, 5)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

7.竖直方向堆叠数组

1
2
3

a = np.arange(10).reshape(2, -1)
b = np.repeat(1, 10).reshape(2, -1)
np.vstack([a, b])

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

8.水平方向堆叠数组

1	np.hstack([a, b])

array([[0, 1, 2, 3, 4, 1, 1, 1, 1, 1],
       [5, 6, 7, 8, 9, 1, 1, 1, 1, 1]])

9.生成重复序列数据

1 2	a = np.arange(1, 4) print(np.repeat(a, 3))

[1 1 1 2 2 2 3 3 3]

10.获取两个数组的共同值

1
2
3

a = np.array([1,2,3,2,3,4,3,4,5,6])
b = np.array([7,2,10,2,7,4,9,4,9,8])
np.intersect1d(a, b)

array([2, 4])

11.从数组中移除同时存在于另一个数组中的值

1
2
3

a = np.array([1,2,3,4,5])
b = np.array([5,6,7,8,9])
np.setdiff1d(a, b)

array([1, 2, 3, 4])

12.获取两个数组相同元素的索引

1
2
3

a = np.array([1,2,3,2,3,4,3,4,5,6])
b = np.array([7,2,10,2,7,4,9,4,9,8])
np.where(a == b)

(array([1, 3, 5, 7], dtype=int64),)

13.数组中提取出指定范围的值

a = np.arange(15)
index = np.where((a >= 5) & (a <= 10))
print(a[index])
print(a[(a>=5) & (a<=10)])

[ 5  6  7  8  9 10]
[ 5  6  7  8  9 10]

14.使用python函数操作数组

def maxx(x, y):
    if x >= y:
        return x
    else:
        return y

pair_max = np.vectorize(maxx)
a = np.array([1,2,3,2,3,4,3,4,5,6])
b = np.array([7,2,10,2,7,4,9,4,9,8])
pair_max(a, b)

array([ 7,  2, 10,  2,  7,  4,  9,  4,  9,  8])

15.互换二维数组中行的位置

1 2	arr = np.arange(9).reshape(3, 3) arr[[2, 0, 1], :]

array([[6, 7, 8],
       [0, 1, 2],
       [3, 4, 5]])

16.互换二维数组中列的位置

1 2	arr = np.arange(9).reshape(3, 3) arr[:, [2, 0, 1]]

array([[2, 0, 1],
       [5, 3, 4],
       [8, 6, 7]])

17.翻转二维数组中的行

1	arr[::-1,:]

array([[6, 7, 8],
       [3, 4, 5],
       [0, 1, 2]])

18.翻转二维数组中的列

1	arr[:, ::-1]

array([[2, 1, 0],
       [5, 4, 3],
       [8, 7, 6]])

19.创建一个值为5-10随机浮点数的二维数组

1 2	rand_arr = np.random.uniform(5, 10, size=(3, 3)) rand_arr

array([[ 8.03246322,  7.64526394,  5.26370662],
       [ 8.35671523,  6.89082064,  5.35361041],
       [ 8.8293029 ,  9.78622608,  6.76954183]])

20.输出精度为3的浮点数组

1
2
3

rand_arr = np.random.random((3, 3))
np.set_printoptions(precision=3)
rand_arr

array([[ 0.279,  0.357,  0.487],
       [ 0.565,  0.208,  0.345],
       [ 0.705,  0.813,  0.984]])

21.外部加载数字和文本数据

url = 'http://aima.cs.berkeley.edu/data/iris.csv'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

iris[:3]

array([[b'5.1', b'3.5', b'1.4', b'0.2', b'setosa'],
       [b'4.9', b'3.0', b'1.4', b'0.2', b'setosa'],
       [b'4.7', b'3.2', b'1.3', b'0.2', b'setosa']], dtype=object)

22.从一维元组数组中提取指定的列

url = 'http://aima.cs.berkeley.edu/data/iris.csv'
iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)

species = np.array([row[2] for row in iris_1d])
species[:10]

array([ 1.4,  1.4,  1.3,  1.5,  1.4,  1.7,  1.4,  1.5,  1.4,  1.5])

23.将一维元组数组转化为二维数组

1
2
3

iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)
iris_2d = np.array([row.tolist()[:4] for row in iris_1d])
iris_2d[:10]

array([[ 5.1,  3.5,  1.4,  0.2],
       [ 4.9,  3. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2],
       [ 4.6,  3.1,  1.5,  0.2],
       [ 5. ,  3.6,  1.4,  0.2],
       [ 5.4,  3.9,  1.7,  0.4],
       [ 4.6,  3.4,  1.4,  0.3],
       [ 5. ,  3.4,  1.5,  0.2],
       [ 4.4,  2.9,  1.4,  0.2],
       [ 4.9,  3.1,  1.5,  0.1]])

24.计算数组的均值、中位数值和标准差

1
2
3

iris_1d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])
mean, med, std = np.mean(iris_1d), np.median(iris_1d), np.std(iris_1d)
print('mean: {}, median: {}, std: {}'.format(mean, med, std))

mean: 5.843333333333334, median: 5.8, std: 0.8253012917851409

25.找到数组的百分数

1	np.percentile(iris_1d, q=[5, 95])

array([ 4.6  ,  7.255])

26.找到数组中缺失值的位置

# 先随机插一些空值
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan
# 第一列的空值情况
np.where(np.isnan(iris_2d[:, 0]))

(array([ 15,  44,  48,  65,  78,  94, 135]),)

27.通过一个或两个条件过滤数组

1 2	filter_ = np.array([~np.any(np.isnan(row)) for row in iris_2d]) iris_2d[filter_][:5]

array([[ 5.1,  3.5,  1.4,  0.2],
       [ 4.9,  3. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2],
       [ 4.6,  3.1,  1.5,  0.2],
       [ 5. ,  3.6,  1.4,  0.2]])

28.计算两列的相关系数

1 2	iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3]) np.corrcoef(iris_2d[:, 0], iris_2d[:, 1])[0, 1]

-0.10936924995064938

29.将数组中的空值填充为0

1
2
3

# 先随机插一些空值
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan
iris_2d[np.isnan(iris_2d)] = 0

30.对数组中的唯一值进行统计

1	np.unique(iris_1d, return_counts=True)

(array([ 4.3,  4.4,  4.5,  4.6,  4.7,  4.8,  4.9,  5. ,  5.1,  5.2,  5.3,
         5.4,  5.5,  5.6,  5.7,  5.8,  5.9,  6. ,  6.1,  6.2,  6.3,  6.4,
         6.5,  6.6,  6.7,  6.8,  6.9,  7. ,  7.1,  7.2,  7.3,  7.4,  7.6,
         7.7,  7.9]),
 array([ 1,  3,  1,  4,  2,  5,  6, 10,  9,  4,  1,  6,  7,  6,  8,  7,  3,
         6,  6,  4,  9,  7,  5,  2,  8,  3,  4,  1,  1,  3,  1,  1,  1,  4,
         1]))

enjoy it!

参考资料：https://www.machinelearningplus.com/101-numpy-exercises-python/