Modules#
This section provides an overview of the core module SheetBuddy and its available methods.
SheetBuddy Module#
SheetBuddy module for performing various EDA (Exploratory Data Analysis) operations on datasets.
- class sheetbuddy.SheetBuddy(file_path_or_url)
Bases:
objectA class to handle various EDA (Exploratory Data Analysis) operations on datasets.
Attributes:#
- file_path_or_urlstr
The file path or URL to the dataset.
- add_correlation_heatmap(sheet, data, start_row)
Add a correlation heatmap to the Excel sheet.
Parameters:#
- sheetopenpyxl.worksheet.worksheet.Worksheet
The Excel sheet to add the heatmap to.
- dataDataFrame
The data for the heatmap.
- start_rowint
The row to start placing the heatmap.
- add_dataset_info(workbook)
Add a Dataset Info sheet to the workbook.
Parameters:#
- workbookopenpyxl.Workbook
The workbook to add the Dataset Info sheet to.
- add_histogram(sheet, data, column, start_row, color)
Add a histogram chart to the Excel sheet.
Parameters:#
- sheetopenpyxl.worksheet.worksheet.Worksheet
The Excel sheet to add the chart to.
- dataDataFrame
The data for the histogram.
- columnstr
The column to plot in the histogram.
- start_rowint
The row to start placing the chart.
- colorstr
The color to use for the histogram bars.
- add_outliers_plot(sheet, data, column, start_row)
Add a boxplot to visualize outliers in the Excel sheet.
Parameters:#
- sheetopenpyxl.worksheet.worksheet.Worksheet
The Excel sheet to add the chart to.
- dataDataFrame
The data for the boxplot.
- columnstr
The column to plot in the boxplot.
- start_rowint
The row to start placing the chart.
- add_text_heading(sheet, text, start_row)
Add a text heading to the Excel sheet.
Parameters:#
- sheetopenpyxl.worksheet.worksheet.Worksheet
The Excel sheet to add the text heading to.
- textstr
The text for the heading.
- start_rowint
The row to start placing the heading.
- detect_outliers(method='z-score', threshold=3)
Detect outliers in the dataset.
Parameters:#
- methodstr
The method to use for outlier detection (‘z-score’ or ‘iqr’).
- thresholdfloat
The threshold value for detecting outliers.
Returns:#
- DataFrame
A DataFrame indicating the presence of outliers in the dataset.
- generate_eda_report(output_file_name)
Load data and generate the EDA report.
Parameters:#
- output_file_namestr
The name of the output Excel file.
- generate_report(file_name)
Generate an EDA report and save it to an Excel file.
Parameters:#
- file_namestr
The name of the output Excel file.
- get_basic_math()
Retrieve basic mathematical calculations for numerical columns.
Returns:#
- DataFrame
A DataFrame containing mean, median, mode, and range for each numerical column.
- get_categorical_frequencies()
Retrieve the frequency of unique values for categorical columns.
Returns:#
- DataFrame
A DataFrame containing the frequency of unique values for each categorical column.
- get_column_info()
Retrieve information about the columns in the dataset.
Returns:#
- DataFrame
A DataFrame containing the column names, data types, descriptions, and whether they are categorical or numerical.
- get_most_frequent_values()
Retrieve the most frequent value for each column.
Returns:#
- Series
A series containing the most frequent value for each column.
- get_null_percentage()
Retrieve the percentage of null values for each column.
Returns:#
- Series
A series containing the percentage of null values for each column.
- get_null_values()
Retrieve the count of null values for each column.
Returns:#
- Series
A series containing the count of null values for each column.
- get_shape()
Retrieve the shape (dimensions) of the dataset.
Returns:#
- tuple
The shape of the dataset as a tuple (rows, columns).
- get_standard_deviation()
Retrieve the standard deviation for numerical columns.
Returns:#
- Series
A series containing the standard deviation for each numerical column.
- get_summary_statistics()
Retrieve summary statistics of the dataset.
Returns:#
- DataFrame
The summary statistics of the dataset.
- get_unique_values()
Retrieve the count of unique values for each column.
Returns:#
- Series
A series containing the count of unique values for each column.
- read_api()
Read data from an API into a DataFrame.
Returns:#
- DataFrame
The dataset read from the API.
- read_csv()
Read a CSV file into a DataFrame.
Returns:#
- DataFrame
The dataset read from the CSV file.
- read_data()
Determine the file type and read the data accordingly.
Returns:#
- DataFrame
The dataset read from the file or URL.
- read_json()
Read JSON data from a URL or file into a DataFrame.
Returns:#
- DataFrame
The dataset read from the JSON URL or file.
- style_excel_sheet(sheet)
Style an Excel sheet with header and row formatting.
Parameters:#
- sheetopenpyxl.worksheet.worksheet.Worksheet
The Excel sheet to style.
🔹 Key Features: - Data Loading: Read data from CSV, JSON, Excel, and API sources. - Data Summarization: Retrieve essential statistics and dataset insights. - EDA Report Generation: Automatically create detailed exploratory data analysis (EDA) reports. - Outlier Detection: Identify anomalies and outliers in datasets. - Visualization Support: Generate correlation heatmaps and histograms. - Data Cleaning Helpers: Get missing values, unique values, and more.
For usage examples, refer to the Usage Guide.