Boyce index code #100

PC-FSU · 2024-10-03T00:17:30Z

PR Description: Add Continuous Boyce Index Calculation and Test Cases

Summary:

In this pull request, I have added functionality to calculate the continuous Boyce index as described in Hirzel et al. (2006) . This method provides a reliable way to evaluate habitat suitability models, specifically for presence-only data. Along with the implementation, I have also added test cases to ensure the correctness and robustness of the new function.

Key Updates:

Boyce Index Calculation:
- Implemented a function to compute the continuous Boyce index based on intervals of habitat suitability values.
- This includes handling both sliding window and fixed-bin approaches, as well as ensuring compatibility with NumPy, Pandas, and GeoPandas data structures.
- Added logic to handle edge cases like empty arrays, NaN values, and invalid inputs (e.g., non-1D arrays).
Test Cases:
- Created test cases to validate the behavior of the Boyce index calculation under various scenarios (e.g., different bin sizes, presence of NaN values).
- Ensured that the function calculates the correct predicted-to-expected (P/E) ratio and returns accurate Spearman correlation coefficients.
Notebook Update:
- Updated the notebook WorkingWithGeospatialData.ipynb to include a detailed example demonstrating how to use the continuous Boyce index function.

Testing:

The test cases ensure that the continuous Boyce index function works as expected. The test cases cover:

Different binning strategies.
Varying habitat suitability ranges.
Handling of NaN values in input data.

This PR enhances the project by providing a robust and well-tested method to evaluate habitat suitability models using presence-only data, with clear examples in the updated notebook.

…ated gitignore, and updated __init__.py

earth-chris

Thank you for submitting this PR, @PC-FSU. Including this metric is a nice addition to the package. I have a few comments and requests.

Thank you very much for the detailed docstring and jupyter examples.
Thank you very very much for including tests.
Please apply proper code formatting. To do this, review the contributing guidelines and set up a dev environment with pre-commit installed. Then run pre-commit run --all to apply formatting.
To simplify the code, and to better align with other sklearn metrics, I'd recommend breaking the boyce_index function into smaller components. One to compute the intervals, one for the p/e ratio, one for plotting, and one for returning the correlation coefficient. As a user, I'd expect a single value returned from the boyce_index function.
I haven't had much time to evaluate the scientific merit of the code yet, so I'll likely provide another review after addressing the key software comments I've made. But what strikes me at the moment is that:
- The relationships between nclass, window and res are not super clear to me, and the results appear very sensitive to these parameters.
- It is very easy to return nan values, despite the many nan checks that are applied. This makes me think that something is not sufficiently robust.

Thanks for your submission, and I'll look forward to your updates.

earth-chris · 2024-10-04T04:36:59Z

elapid/evaluate.py

+# implement Boyce index as describe in https://www.whoi.edu/cms/files/hirzel_etal_2006_53457.pdf (Eq.4)
+
+
+def boycei(interval, obs, fit):


i might prefer renaming these functions boyce_index and continuous_boyce_index to differentiate (boycei/boyce_index are easily confused)

Done as per suggestion.

earth-chris · 2024-10-04T04:38:40Z

elapid/evaluate.py

+    Args:
+        interval (tuple or list): Two elements representing the lower and upper bounds of the interval.
+        obs (numpy.ndarray): Observed suitability values (i.e., predictions at presence points).
+        fit (numpy.ndarray): Suitability values (e.g., from a raster), i.e., predictions at presence + background points.


to better align with sklearn api design, prefer renaming variables and order them as boyce_index(yobs, ypred, interval)

Done as per suggestion.

earth-chris · 2024-10-04T04:39:14Z

elapid/evaluate.py

+    return fi
+
+
+def boyce_index(fit, obs, nclass=0, window="default", res=100, PEplot=False):


please rename and reorder fit, obs as yobs, ypred

Done as per suggestion.

earth-chris · 2024-10-04T04:40:36Z

elapid/evaluate.py

+
+
+    # Remove NaNs from fit
+    fit = fit[~np.isnan(fit)]


remove duplicate code

Done as per suggestion.

earth-chris · 2024-10-04T04:40:57Z

elapid/evaluate.py

+        print(vec_mov)
+        print(intervals)


remove debug print statements

earth-chris · 2024-10-04T04:48:42Z

elapid/evaluate.py

+        vec_mov = np.linspace(mini, maxi, num=nclass + 1)
+        intervals = np.column_stack((vec_mov[:-1], vec_mov[1:]))
+    else:
+        raise ValueError("Invalid nclass value.")


a comment or two in this section would be useful to elucidate the different methods for computing the intervals

I refactored the code. Intervals are now calculated in a new function, and refactored the code, turned out res was not required at all. Updated the name of the argument to more logical names.

earth-chris · 2024-10-04T04:48:56Z

elapid/evaluate.py

+        corr, _ = spearmanr(f_valid, intervals_mid)
+
+
+    if PEplot:


I would prefer to keep plotting as a separate function, which makes for cleaner code.

earth-chris · 2024-10-04T04:50:28Z

elapid/evaluate.py

+
+
+    # Remove NaNs
+    valid = ~np.isnan(f)


there seems to be a lot of nan checking. if the nans are removed from the initial arrays, what would lead to 'invalid' values? are we sure this isn't covering up some other issue in the calculations?

Remove extra nan checking.

earth-chris · 2024-10-04T04:53:02Z

elapid/evaluate.py

+
+    results = {
+        'F.ratio': f,
+        'Spearman.cor': round(corr, 3) if not np.isnan(corr) else np.nan,


another suspicious nan check here. also, it seems unnecessary to round the correlation coefficient.

earth-chris · 2024-10-04T04:54:43Z

elapid/evaluate.py

+        predicted = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0])
+
+        # Observed presence suitability scores (e.g., predictions at presence points)
+        observed = np.array([0.3, 0.7, 0.8, 0.9])


based on this example, it's not clear to me why users would want to pass in arrays of different lengths (where predicted and observed are not matching: one is presence+background, the other is just presence).

So I misinterpreted the equation. For P/E ratio, you need the predicted frequency for each habitat suitability region and need only presence-only data. Expected frequency is calculated using the random distribution only. i.e at background points and doesn't required prediction at presence data. I changed the code and docs to reflect the same.

…e made at presence and background points, not presence and combined presence + background (Thanks for pointing that out, I misinterpreted the paper). Removed redundant NaN checks. Updated test cases, and example notebook.

PC-FSU · 2024-10-18T15:28:47Z

Any updates?

earth-chris · 2025-02-24T18:25:57Z

elapid/evaluate.py

+        nbins (int | list, optional): Number of classes or a list of class thresholds. Defaults to 0.
+        bin_size (float | str, optional): Width of the the bin. Defaults to 'default' which sets width as 1/10th of the fit range.


the interactions between these parameters isn't super clear to me. the default behavior sets nbins to 10. setting 0 here doesn't mean that zero bins are estimated, but instead behaves as if no bins are passed at all, which depends on the 'default' parameter.

I might recommend one of the following:

set the defaults as nbins=10 and bin_size=None, support passing None for each, and throw an error if both are passed (since they are mutually exclusive)

drop the bin_size argument and just go with nbins.

earth-chris · 2025-02-24T18:27:31Z

elapid/evaluate.py

+    mini, maxi = range
+
+    if isinstance(bin_size, float):
+        nbins = (maxi - mini) / bin_size


I get the following warning that appears to be driven by floating point precision. I don't think it's a problem, just flagging it:

ela.evaluate.continuous_boyce_index(y, ypred, bin_size=0.25) /home/cba/src/elapid/elapid/evaluate.py:59: UserWarning: bin_size has been adjusted to nearest appropriate size using ceil, as range/bin_size : 0.9999999999983606 / 0.25 is not an integer.

earth-chris · 2025-02-24T18:29:31Z

elapid/evaluate.py

+    results = {
+        "F.ratio": f_scores,
+        "Spearman.cor": corr,
+        "HS": intervals,
+    }
+
+    return results


my personal preference is to avoid returning a dictionary, but instead return the three values as a tuple, so users can specify:

f_ratio, cor, hs = ela.evaluate.continuous_boyce_index(presence, background)

earth-chris

Looks much better, @PC-FSU, thank you for the improvements! One more round and I think it's there. There are a few small updates I've requested in code, but I'll make one more here regarding the notebook.

The example provided uses some dummy array data, which does not provide much insight into what the index does in a real-life context. Would you please use modeled predictions from the notebook to demonstrate what it can tell you beyond what you see from the AUC results?

earth-chris · 2025-02-24T21:23:04Z

elapid/evaluate.py

+    yobs: Union[np.ndarray, pd.Series, gpd.GeoSeries],
+    ypred: Union[np.ndarray, pd.Series, gpd.GeoSeries],


I think I had misunderstood the original implementation, and it looks like these are both actually predicted values, just predictions at different locations (ypred at presence and background sites).

I might prefer we rename these variables to be a bit more clear (like ypred_observed and ypred_background).

PC-FSU added 3 commits October 2, 2024 20:00

created evaluate.py to inculde boyce_index function

1c638af

added test case for evaluate.py

91b8d25

added Boyce index calculation in WorkingWithGeospatialData.ipynb, upd…

be80087

…ated gitignore, and updated __init__.py

earth-chris requested changes Oct 4, 2024

View reviewed changes

PC-FSU requested a review from earth-chris November 18, 2024 01:43

earth-chris reviewed Feb 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Boyce index code #100

Boyce index code #100

PC-FSU commented Oct 3, 2024

earth-chris left a comment

earth-chris Oct 4, 2024

PC-FSU Oct 7, 2024

earth-chris Oct 4, 2024

PC-FSU Oct 7, 2024

earth-chris Oct 4, 2024

PC-FSU Oct 7, 2024

earth-chris Oct 4, 2024

PC-FSU Oct 7, 2024

earth-chris Oct 4, 2024

PC-FSU Oct 7, 2024

earth-chris Oct 4, 2024

PC-FSU Oct 7, 2024

earth-chris Oct 4, 2024

PC-FSU Oct 7, 2024

earth-chris Oct 4, 2024

PC-FSU Oct 7, 2024

earth-chris Oct 4, 2024

earth-chris Oct 4, 2024

PC-FSU Oct 7, 2024

PC-FSU commented Oct 18, 2024

earth-chris Feb 24, 2025

earth-chris Feb 24, 2025

earth-chris Feb 24, 2025

earth-chris left a comment

earth-chris Feb 24, 2025

		# implement Boyce index as describe in https://www.whoi.edu/cms/files/hirzel_etal_2006_53457.pdf (Eq.4)


		def boycei(interval, obs, fit):

		return fi


		def boyce_index(fit, obs, nclass=0, window="default", res=100, PEplot=False):

		nbins (int \| list, optional): Number of classes or a list of class thresholds. Defaults to 0.
		bin_size (float \| str, optional): Width of the the bin. Defaults to 'default' which sets width as 1/10th of the fit range.

		yobs: Union[np.ndarray, pd.Series, gpd.GeoSeries],
		ypred: Union[np.ndarray, pd.Series, gpd.GeoSeries],

Boyce index code #100

Are you sure you want to change the base?

Boyce index code #100

Conversation

PC-FSU commented Oct 3, 2024

PR Description: Add Continuous Boyce Index Calculation and Test Cases

Summary:

Key Updates:

Testing:

earth-chris left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PC-FSU commented Oct 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

earth-chris left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment