simpler_eda package¶
Submodules¶
simpler_eda.categorical_eda module¶
-
simpler_eda.categorical_eda.categorical_eda(data, xval, plot_type='histogram', color=None, title=None, font_size=10, color_scheme='tableau20', plot_height=150, plot_width=200, opacity=1, facet_factor=None, facet_col=None)[source]¶ This function takes in a data frame object and one categorical feature, to produce a histogram plot that visualizes the distribution of the feature. User can also choose to plot density graph of the feature by specifing in plot_type. The function also offers customization on color, plot title, font size, color-scheme, plot size and other common configurations.
- Parameters
data (pandas.core.frame.DataFrame) – Input dataframe object.
xval (str) – Variable used to represent the x-axis.
plot_type (str, optional) – Variable used to specify plot type. Options include “histogram” and “density”. When “density” is selected, the variable yval becomes obsolete.
color (str, optional) – Variable used to set the color of the marks in the plot object.
tilte (str, optional) – Variable used to set the title of the plot.
font_size (int, optional) – Variable used to set the size of the axis labels and title.
color_scheme (str, optional) – Variable used to set the bar size.
plot_height (int, optional) – Variable used to specify plot height
plot_witdh (int, optional) – Variable used to specify plot width
opacity (float, optional) – Variable used to specify density fill opacity for density plot
facet_factor (str, optional) – Variable used to specify facet factor
facet_col (int, optional) – Variable used to specify number of facet columns
- Returns
A histogram or density chart object based on user specifications.
- Return type
altair
Examples
>>> import altair as alt >>> import numpy as np >>> import pandas as pd >>> from simpler_eda.categorical_eda import categorical_plot >>> from vega_datasets import data >>> cars = data.cars() >>> categorical_eda(data = cars, xval = "Origin", color = "Horsepower", title = "Histogram of Origin in Different Levels of Horsepower", plot_height = 100, plot_width = 200 )
simpler_eda.corr_map module¶
-
simpler_eda.corr_map.corr_map(data, features, corr_method='pearson', color_scheme='blueorange', plot_width=450, plot_height=450, title='Correlation Map')[source]¶ Plot a correlation map with the given dataframe object and a list of numerical features.Users are allowed to set multiple arguments regarding the setting of the correlation plot including method to calculate correlation, color schemes, plot width, height, and plot title.
- Parameters
data (pandas.core.frame.DataFrame) – The input dataframe ojbect
features (list) – A 1D list with names of numerical feature in str for correlation map plotting. It should contain at least 2 features.
corr_method (str, optional) – The method to calculate correlation between features. The default is “Pearson”, two other supported methods are “‘kendall’ and ‘spearman’.
color_scheme (str, optional) – The color scheme Other diverging color schemes can be “blueorange, “redgrey”, “purpleorange”, etc. Other proper color scheme reference can be found in https://vega.github.io/vega/docs/schemes/
plot_width (int, optional) – The width of the plot
plot_height (int, optional) – The heigh of the plot
title (str, optional) – The title of the correlation map
- Returns
The altair correlation map plot
- Return type
altair
Examples
>>> import pandas as pd >>> import altair as alt >>> import numpy as np >>> from simpler_eda.corr_map import corr_map >>> from vega_datasets import data >>> df = data.cars() >>> corr_map(df, ["Horsepower", "Displacement", "Cylinders", "Acceleration"])
simpler_eda.numerical_eda module¶
-
simpler_eda.numerical_eda.numerical_eda(data, xval, yval, color, plot_type='scatter', title=None, font_size=10, color_scheme='tableau20', plot_width=400, plot_height=300, x_transform=False, y_transform=False)[source]¶ This function takes in a data frame object, two numeric columns, and produces either a scatter or line plot to visualize the relationship between the two numerical features. Users can optionally change default arguments for plot-type, color, title, size of text, color-scheme, and toggle log transformation for the x and y axis. :param data: Input dataframe object. :type data: pandas.core.frame.DataFrame :param xval: Variable used to represent the x-axis. :type xval: str :param yval: Variable used to represent the y-axis. :type yval: str :param color: Variable used to group the data ponts in different colors based on a
variable in the dataframe.
- Parameters
plot_type (str, optional) – Variable used to represent the graphical relationship between xval and yval, options are scatter or line plot.
tilte (str, optional) – Variable used to set the title of the plot.
font_size (int, optional) – Variable used to set the size of the axis labels and title.
color_scheme (str, optional) – The color scheme used for the plot. Other color schemes can be “accent”, “category10”, “category20”, “category20b”, “dark2”, etc. Other proper color scheme reference can be found in https://vega.github.io/vega/docs/schemes/
plot_width (int, optional) – The width of the plot.
plot_height (int, optional) – The height of the plot.
x_transform (bool, optional) – Determines whether a log transformation occurs on the x-axis.
y_transform (bool, optional) – Determines whether a log transformation occurs on the y-axis.
- Returns
Scatter plot or Line plot of user-specified variables.
- Return type
altair
Examples
>>> import altair as alt >>> import pandas as pd >>> import numpy as np >>> from simpler_eda.numerical_eda import numerical_plot >>> from vega_datasets import data >>> numerical_plot(data.cars(), xval = "Horsepower", yval = "Acceleration", plot_type = "line", color = "Origin", title = " Horsepower vs Acceleration", font_size = 10)