[SC 16935] Fix IQROutliersBarPlot failure on boolean numeric#526
Conversation
|
Nice fix overall — the Two things worth addressing before merge:
|
PR SummaryThis pull request refines the logic for outlier detection within the IQR methods by improving the handling of boolean and binary features. Previously, the code was only excluding binary features based on the number of unique values; now, it explicitly checks for boolean data types using pandas' is_bool_dtype, ensuring that both boolean and binary columns are omitted from the outlier calculations. Key changes include:
These enhancements ensure that the outlier detection functionality will correctly process datasets containing boolean features without runtime errors, preserving the integrity of the analysis. Test Suggestions
|
Pull Request Description
What and why?
IQROutliersBarPlotfailed withTypeError: numpy boolean subtractwhen a dataset included boolean columns infeature_columns_numeric(pandas treatsboolas numeric).Before: Boolean/binary columns could still be processed when building
outlier_counts_by_feature, causingquantile()to fail on boolean data.After: Boolean and binary features are excluded from IQR outlier calculations using
is_bool_dtype()andnunique() > 2, for both plots and raw data output.How to test
Or manually:
What needs special review?
Dependencies, breaking changes, and deployment notes
IQROutliersBarPlotoutput.Release notes
bugFixed a failure in
IQROutliersBarPlotwhen datasets contain boolean feature columns. The test now skips boolean and binary features instead of raising anumpy boolean subtracterror during IQR calculations.Checklist