Modules

Modules#

This section provides an overview of the core module SheetBuddy and its available methods.

SheetBuddy Module#

SheetBuddy module for performing various EDA (Exploratory Data Analysis) operations on datasets.

class sheetbuddy.SheetBuddy(file_path_or_url)

Bases: object

A class to handle various EDA (Exploratory Data Analysis) operations on datasets.

Attributes:#

file_path_or_urlstr

The file path or URL to the dataset.

add_correlation_heatmap(sheet, data, start_row)

Add a correlation heatmap to the Excel sheet.

Parameters:#

sheetopenpyxl.worksheet.worksheet.Worksheet

The Excel sheet to add the heatmap to.

dataDataFrame

The data for the heatmap.

start_rowint

The row to start placing the heatmap.

add_dataset_info(workbook)

Add a Dataset Info sheet to the workbook.

Parameters:#

workbookopenpyxl.Workbook

The workbook to add the Dataset Info sheet to.

add_histogram(sheet, data, column, start_row, color)

Add a histogram chart to the Excel sheet.

Parameters:#

sheetopenpyxl.worksheet.worksheet.Worksheet

The Excel sheet to add the chart to.

dataDataFrame

The data for the histogram.

columnstr

The column to plot in the histogram.

start_rowint

The row to start placing the chart.

colorstr

The color to use for the histogram bars.

add_outliers_plot(sheet, data, column, start_row)

Add a boxplot to visualize outliers in the Excel sheet.

Parameters:#

sheetopenpyxl.worksheet.worksheet.Worksheet

The Excel sheet to add the chart to.

dataDataFrame

The data for the boxplot.

columnstr

The column to plot in the boxplot.

start_rowint

The row to start placing the chart.

add_text_heading(sheet, text, start_row)

Add a text heading to the Excel sheet.

Parameters:#

sheetopenpyxl.worksheet.worksheet.Worksheet

The Excel sheet to add the text heading to.

textstr

The text for the heading.

start_rowint

The row to start placing the heading.

detect_outliers(method='z-score', threshold=3)

Detect outliers in the dataset.

Parameters:#

methodstr

The method to use for outlier detection (‘z-score’ or ‘iqr’).

thresholdfloat

The threshold value for detecting outliers.

Returns:#

DataFrame

A DataFrame indicating the presence of outliers in the dataset.

generate_eda_report(output_file_name)

Load data and generate the EDA report.

Parameters:#

output_file_namestr

The name of the output Excel file.

generate_report(file_name)

Generate an EDA report and save it to an Excel file.

Parameters:#

file_namestr

The name of the output Excel file.

get_basic_math()

Retrieve basic mathematical calculations for numerical columns.

Returns:#

DataFrame

A DataFrame containing mean, median, mode, and range for each numerical column.

get_categorical_frequencies()

Retrieve the frequency of unique values for categorical columns.

Returns:#

DataFrame

A DataFrame containing the frequency of unique values for each categorical column.

get_column_info()

Retrieve information about the columns in the dataset.

Returns:#

DataFrame

A DataFrame containing the column names, data types, descriptions, and whether they are categorical or numerical.

get_most_frequent_values()

Retrieve the most frequent value for each column.

Returns:#

Series

A series containing the most frequent value for each column.

get_null_percentage()

Retrieve the percentage of null values for each column.

Returns:#

Series

A series containing the percentage of null values for each column.

get_null_values()

Retrieve the count of null values for each column.

Returns:#

Series

A series containing the count of null values for each column.

get_shape()

Retrieve the shape (dimensions) of the dataset.

Returns:#

tuple

The shape of the dataset as a tuple (rows, columns).

get_standard_deviation()

Retrieve the standard deviation for numerical columns.

Returns:#

Series

A series containing the standard deviation for each numerical column.

get_summary_statistics()

Retrieve summary statistics of the dataset.

Returns:#

DataFrame

The summary statistics of the dataset.

get_unique_values()

Retrieve the count of unique values for each column.

Returns:#

Series

A series containing the count of unique values for each column.

read_api()

Read data from an API into a DataFrame.

Returns:#

DataFrame

The dataset read from the API.

read_csv()

Read a CSV file into a DataFrame.

Returns:#

DataFrame

The dataset read from the CSV file.

read_data()

Determine the file type and read the data accordingly.

Returns:#

DataFrame

The dataset read from the file or URL.

read_json()

Read JSON data from a URL or file into a DataFrame.

Returns:#

DataFrame

The dataset read from the JSON URL or file.

style_excel_sheet(sheet)

Style an Excel sheet with header and row formatting.

Parameters:#

sheetopenpyxl.worksheet.worksheet.Worksheet

The Excel sheet to style.

🔹 Key Features: - Data Loading: Read data from CSV, JSON, Excel, and API sources. - Data Summarization: Retrieve essential statistics and dataset insights. - EDA Report Generation: Automatically create detailed exploratory data analysis (EDA) reports. - Outlier Detection: Identify anomalies and outliers in datasets. - Visualization Support: Generate correlation heatmaps and histograms. - Data Cleaning Helpers: Get missing values, unique values, and more.

For usage examples, refer to the Usage Guide.