用seaborn做数据可视化

  seaborn是一个python可视化的工具,是基于 matplotlib ,能与 dataframe 数据结构有更好切合的工具。

Seaborn可视化

1
2
3
4
5
6
import numpy as np
import pandas as pd

%pylab inline
import seaborn as sns
sns.set_style('darkgrid')
Populating the interactive namespace from numpy and matplotlib

先用pandas读进来一份数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
names = [
'mpg'
, 'cylinders'
, 'displacement'
, 'horsepower'
, 'weight'
, 'acceleration'
, 'model_year'
, 'origin'
, 'car_name'
]
df = pd.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data", sep='\s+', names=names)
df['maker'] = df.car_name.map(lambda x: x.split()[0])
df.origin = df.origin.map({1: 'America', 2: 'Europe', 3: 'Asia'})
df=df.applymap(lambda x: np.nan if x == '?' else x).dropna()
df['horsepower'] = df.horsepower.astype(float)
df.head()
mpg cylinders displacement horsepower weight acceleration model_year origin car_name maker
0 18.0 8 307.0 130.0 3504.0 12.0 70 America chevrolet chevelle malibu chevrolet
1 15.0 8 350.0 165.0 3693.0 11.5 70 America buick skylark 320 buick
2 18.0 8 318.0 150.0 3436.0 11.0 70 America plymouth satellite plymouth
3 16.0 8 304.0 150.0 3433.0 12.0 70 America amc rebel sst amc
4 17.0 8 302.0 140.0 3449.0 10.5 70 America ford torino ford

一般绘图:factorplot 和 FacetGrid

根据2个维度变量绘图

1
2
# 画出model_year和mpg的关系图
sns.factorplot(data=df, x="model_year", y="mpg")
<seaborn.axisgrid.FacetGrid at 0x141170f90>

png

可以按照第3个维度绘制不同的关系图

1
sns.factorplot(data=df, x="model_year", y="mpg", col="origin")
<seaborn.axisgrid.FacetGrid at 0x1411827d0>

png

可以从折线图切成柱状图

1
sns.factorplot("cylinders", data=df, col="origin", kind='bar')
<seaborn.axisgrid.FacetGrid at 0x13eff0810>

png

1
2
g = sns.FacetGrid(df, col="origin")
g.map(sns.distplot, "mpg")
<seaborn.axisgrid.FacetGrid at 0x13e964b90>

png

散点图

1
2
g = sns.FacetGrid(df, col="origin")
g.map(plt.scatter, "horsepower", "mpg")
<seaborn.axisgrid.FacetGrid at 0x138cd2f90>

png

绘图的同时还做回归

1
2
3
4
g = sns.FacetGrid(df, col="origin")
g.map(sns.regplot, "horsepower", "mpg")
plt.xlim(0, 250)
plt.ylim(0, 60)
(0, 60)

png

kde等高线图

1
2
3
4
5
df['tons'] = (df.weight/2000).astype(int)
g = sns.FacetGrid(df, col="origin", row="tons")
g.map(sns.kdeplot, "horsepower", "mpg")
plt.xlim(0, 250)
plt.ylim(0, 60)
(0, 60)

png

按照2个维度展开画图

1
2
g = sns.FacetGrid(df, col="origin", row="tons")
g.map(plt.hist, "mpg", bins=np.linspace(0, 50, 11))
<seaborn.axisgrid.FacetGrid at 0x131e0f610>

png

pairplot and PairGrid

多个维度两两组合绘图

1
2
3
g = sns.pairplot(df[["mpg", "horsepower", "weight", "origin"]], hue="origin", diag_kind="hist")
for ax in g.axes.flat:
plt.setp(ax.get_xticklabels(), rotation=45)

png

组合绘图的时候顺便回归一下

1
2
3
4
5
6
7
8
g = sns.PairGrid(df[["mpg", "horsepower", "weight", "origin"]], hue="origin")
g.map_upper(sns.regplot)
g.map_lower(sns.residplot)
g.map_diag(plt.hist)
for ax in g.axes.flat:
plt.setp(ax.get_xticklabels(), rotation=45)
g.add_legend()
g.set(alpha=0.5)
<seaborn.axisgrid.PairGrid at 0x13eac5790>

png

jointplot and JointGrid

联合绘图(kde等高)

1
sns.jointplot("mpg", "horsepower", data=df, kind='kde')
<seaborn.axisgrid.JointGrid at 0x1393a5d10>

png

联合绘图(加回归)

1
sns.jointplot("horsepower", "mpg", data=df, kind="reg")
<seaborn.axisgrid.JointGrid at 0x141b640d0>

png

1
2
3
g = sns.JointGrid(x="horsepower", y="mpg", data=df)
g.plot_joint(sns.regplot, order=2)
g.plot_marginals(sns.distplot)
<seaborn.axisgrid.JointGrid at 0x141cd8690>

png

enjoy it!

0%