obliquetree.Classifier

class obliquetree.Classifier(use_oblique=True, max_depth=-1, min_samples_leaf=1, min_samples_split=2, min_impurity_decrease=0.0, ccp_alpha=0.0, categories=None, random_state=None, n_pair=2, top_k=None, gamma=1.0, max_iter=100, relative_change=0.001, linear_leaf=False, leaf_l2=1e-06, leaf_max_iter=100)
__init__(use_oblique=True, max_depth=-1, min_samples_leaf=1, min_samples_split=2, min_impurity_decrease=0.0, ccp_alpha=0.0, categories=None, random_state=None, n_pair=2, top_k=None, gamma=1.0, max_iter=100, relative_change=0.001, linear_leaf=False, leaf_l2=1e-06, leaf_max_iter=100)

A decision tree classifier supporting both traditional axis-aligned and oblique splits.

This advanced decision tree classifier extends traditional regression trees by supporting oblique splits (linear combinations of features) alongside conventional axis-aligned splits. It offers enhanced flexibility in modeling continuous outputs while maintaining the interpretability of decision trees.

Parameters:
  • use_oblique (bool, default True) –

    • If True, enables oblique splits using linear combinations of features.

    • If False, uses traditional axis-aligned splits only.

  • max_depth (int, default -1) –

    Maximum depth of the tree. Controls model complexity and prevents overfitting.

    • If -1: Expands until leaves are pure or contain fewer than min_samples_split samples.

    • If int > 0: Limits the tree to the specified depth.

  • min_samples_leaf (int, default 1) – Minimum number of samples required at leaf nodes.

  • min_samples_split (int, default 2) – Minimum number of samples required to split an internal node.

  • min_impurity_decrease (float, default 0.0) – Minimum required decrease in impurity to create a split.

  • ccp_alpha (float, default 0.0) – Complexity parameter for Minimal Cost-Complexity Pruning.

  • categories (List[int], default None) – Indices of categorical features in the dataset.

  • random_state (int, default None) –

    Seed for random number generation in oblique splits.

    • Only used when use_oblique=True.

  • n_pair (int, default 2) –

    Number of features to combine in oblique splits.

    • Only used when use_oblique=True.

  • top_k (int or None, default None) –

    Number of numeric features kept after cheap oblique feature screening.

    • If None, an internal heuristic is used.

    • Only used when use_oblique=True.

  • gamma (float, default 1.0) –

    Separation strength parameter for oblique splits.

    • Only used when use_oblique=True.

  • max_iter (int, default 100) –

    Maximum iterations for L-BFGS optimization in oblique splits.

    • Only used when use_oblique=True.

  • relative_change (float, default 0.001) –

    Early stopping threshold for L-BFGS optimization.

    • Only used when use_oblique=True.

  • linear_leaf (bool, default False) –

    If True, replace the constant leaf with a small parametric model fit on the leaf samples:

    • Binary classification → weighted logistic regression (IRLS, 25 iters); predict returns sigmoid(intercept + coef · x) as P(class=1).

    • Multiclass (n_classes > 2) → multinomial softmax regression with K-1 reference-class parametrization (Newton-Raphson on the full Hessian, 50 iters); predict returns the softmax over per-class logits.

    Only numeric features participate in the leaf coefficients (categorical features in categories are excluded; the tree splits already capture their structure). See Notes on BaseTree for the full mechanism, fallback rules, and iteration policy.

  • leaf_l2 (float, default 1e-6) – L2 (ridge) penalty on the leaf coefficients. For classification a tiny internal floor of 1e-10 is always applied (the K-class softmax Hessian is rank-deficient without it), so leaf_l2=0.0 is effectively 1e-10 for classifiers. A larger value (e.g. 0.1 - 1.0) often improves probability calibration without harming accuracy.

  • leaf_max_iter (int, default 100) – Maximum IRLS / Newton iterations per leaf. Iteration stops early when the largest absolute parameter step falls below 1e-6; with well-conditioned data convergence is typically reached in 5-15 iterations. Increase (e.g. to 500) for harder leaves where the default cap might leave the iterate short of the optimum.

apply(X)

Return the index of the leaf that each sample ends up in.

Parameters:

X (array-like of shape (n_samples, n_features)) – The input samples.

Returns:

X_leaves – For each datapoint x in X, return the index of the leaf x ends up in. Nodes are numbered using pre-order (depth-first) traversal.

Return type:

numpy.ndarray of shape (n_samples,)

fit(X, y, sample_weight=None)

Build a decision tree classifier from the training set (X, y).

Parameters:
  • X (array-like of shape (n_samples, n_features)) – The training input samples.

  • y (array-like of shape (n_samples,)) – Target values (class labels).

  • sample_weight (array-like of shape (n_samples,), default None) – Sample weights.

Returns:

self – Fitted estimator.

Return type:

Classifier

predict(X)

Predict class labels for X.

Parameters:

X (array-like of shape (n_samples, n_features)) – The input samples to predict.

Returns:

y – The predicted class labels.

Return type:

numpy.ndarray of shape (n_samples,)

predict_proba(X)

Predict class probabilities for X.

Parameters:

X (array-like of shape (n_samples, n_features)) – The input samples.

Returns:

proba – The class probabilities of the input samples.

Return type:

numpy.ndarray of shape (n_samples, n_classes)