simpler_eda package

Submodules

simpler_eda.categorical_eda module

simpler_eda.categorical_eda.categorical_eda(data, xval, plot_type='histogram', color=None, title=None, font_size=10, color_scheme='tableau20', plot_height=150, plot_width=200, opacity=1, facet_factor=None, facet_col=None)[source]

This function takes in a data frame object and one categorical feature, to produce a histogram plot that visualizes the distribution of the feature. User can also choose to plot density graph of the feature by specifing in plot_type. The function also offers customization on color, plot title, font size, color-scheme, plot size and other common configurations.

Parameters
  • data (pandas.core.frame.DataFrame) – Input dataframe object.

  • xval (str) – Variable used to represent the x-axis.

  • plot_type (str, optional) – Variable used to specify plot type. Options include “histogram” and “density”. When “density” is selected, the variable yval becomes obsolete.

  • color (str, optional) – Variable used to set the color of the marks in the plot object.

  • tilte (str, optional) – Variable used to set the title of the plot.

  • font_size (int, optional) – Variable used to set the size of the axis labels and title.

  • color_scheme (str, optional) – Variable used to set the bar size.

  • plot_height (int, optional) – Variable used to specify plot height

  • plot_witdh (int, optional) – Variable used to specify plot width

  • opacity (float, optional) – Variable used to specify density fill opacity for density plot

  • facet_factor (str, optional) – Variable used to specify facet factor

  • facet_col (int, optional) – Variable used to specify number of facet columns

Returns

A histogram or density chart object based on user specifications.

Return type

altair

Examples

>>> import altair as alt
>>> import numpy as np
>>> import pandas as pd
>>> from simpler_eda.categorical_eda import categorical_plot
>>> from vega_datasets import data
>>> cars = data.cars()
>>> categorical_eda(data = cars,
                    xval = "Origin",
                    color = "Horsepower",
                    title = "Histogram of Origin in Different Levels of
                    Horsepower",
                    plot_height = 100,
                    plot_width = 200
                    )

simpler_eda.corr_map module

simpler_eda.corr_map.corr_map(data, features, corr_method='pearson', color_scheme='blueorange', plot_width=450, plot_height=450, title='Correlation Map')[source]

Plot a correlation map with the given dataframe object and a list of numerical features.Users are allowed to set multiple arguments regarding the setting of the correlation plot including method to calculate correlation, color schemes, plot width, height, and plot title.

Parameters
  • data (pandas.core.frame.DataFrame) – The input dataframe ojbect

  • features (list) – A 1D list with names of numerical feature in str for correlation map plotting. It should contain at least 2 features.

  • corr_method (str, optional) – The method to calculate correlation between features. The default is “Pearson”, two other supported methods are “‘kendall’ and ‘spearman’.

  • color_scheme (str, optional) – The color scheme Other diverging color schemes can be “blueorange, “redgrey”, “purpleorange”, etc. Other proper color scheme reference can be found in https://vega.github.io/vega/docs/schemes/

  • plot_width (int, optional) – The width of the plot

  • plot_height (int, optional) – The heigh of the plot

  • title (str, optional) – The title of the correlation map

Returns

The altair correlation map plot

Return type

altair

Examples

>>> import pandas as pd
>>> import altair as alt
>>> import numpy as np
>>> from simpler_eda.corr_map import corr_map
>>> from vega_datasets import data
>>> df = data.cars()
>>> corr_map(df,
["Horsepower", "Displacement", "Cylinders", "Acceleration"])

simpler_eda.numerical_eda module

simpler_eda.numerical_eda.numerical_eda(data, xval, yval, color, plot_type='scatter', title=None, font_size=10, color_scheme='tableau20', plot_width=400, plot_height=300, x_transform=False, y_transform=False)[source]

This function takes in a data frame object, two numeric columns, and produces either a scatter or line plot to visualize the relationship between the two numerical features. Users can optionally change default arguments for plot-type, color, title, size of text, color-scheme, and toggle log transformation for the x and y axis. :param data: Input dataframe object. :type data: pandas.core.frame.DataFrame :param xval: Variable used to represent the x-axis. :type xval: str :param yval: Variable used to represent the y-axis. :type yval: str :param color: Variable used to group the data ponts in different colors based on a

variable in the dataframe.

Parameters
  • plot_type (str, optional) – Variable used to represent the graphical relationship between xval and yval, options are scatter or line plot.

  • tilte (str, optional) – Variable used to set the title of the plot.

  • font_size (int, optional) – Variable used to set the size of the axis labels and title.

  • color_scheme (str, optional) – The color scheme used for the plot. Other color schemes can be “accent”, “category10”, “category20”, “category20b”, “dark2”, etc. Other proper color scheme reference can be found in https://vega.github.io/vega/docs/schemes/

  • plot_width (int, optional) – The width of the plot.

  • plot_height (int, optional) – The height of the plot.

  • x_transform (bool, optional) – Determines whether a log transformation occurs on the x-axis.

  • y_transform (bool, optional) – Determines whether a log transformation occurs on the y-axis.

Returns

Scatter plot or Line plot of user-specified variables.

Return type

altair

Examples

>>> import altair as alt
>>> import pandas as pd
>>> import numpy as np
>>> from simpler_eda.numerical_eda import numerical_plot
>>> from vega_datasets import data
>>> numerical_plot(data.cars(), xval = "Horsepower", yval = "Acceleration",
plot_type = "line",
             color = "Origin",
             title = " Horsepower vs Acceleration",
             font_size = 10)

simpler_eda.simpler_eda module

Module contents