Skip to content

updated multiscale_entropy to allow changing the sample_length that i… #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

gattia
Copy link

@gattia gattia commented Apr 6, 2018

Updated multiscale_entropy to allow changing the sample_length that is input into sample_entropy. Re-used the variable sample_length do to so. Set the default to m=2, the value proposed in the original MSE paper by Costa linked at ref[2] - ref 2 was added to line 159. Renamed the sample_length factor to be scale_factor, as this is the named used to define the coinciding variable (Tau) in the same Costa paper - ref[2].

Ive only just started working with MSE analyses, so dont assume these are correct. However, I am fairly certain that they follow the proposed methods and extend the current implementation. Ive used names that follow the literature convention, however changing them might mess up existing code.... just a thought.

PS. Great code, and thanks for sharing, it has definitely helped me out!

gattia added 2 commits April 5, 2018 23:35
…s input into sample_entropy. Re-used the variable sample_length do to so. Set the default to m=2, the value proposed in the original MSE paper by Costa linked at ref[2]. Renamed the sample_length factor to be scale_factor, as this is the named used to define the coinciding variable (Tau) in the same Costa paper - ref[2].
…ction for last commit. It appears that changing sample_length to 2 breaks sample_entropy - it now outputs 2 variables instead of one. The Neurokit repo (https://github.com/neuropsychology/NeuroKit.py/blob/master/neurokit/signal/complexity.py) uses the nolds package to calculate sample_entropy
@gattia
Copy link
Author

gattia commented Apr 6, 2018

After I tested the changes, the package failed. It seemed that when I inputted sample_length=2 into the sample_entropy() function it returned se of length 2. Im not positive what is going on. I printed the outputted se for my data and received [2.45622099 0.05539498] as the output. The se calculated using the nolds package (https://github.com/CSchoel/nolds/blob/master/nolds/measures.py) resulted in a se of 0.2135971525998909. If se=sample_entropy() at line 171 is replaced with se = nolds.sampen(temp_ts,sample_length,tolerance,nolds.measures.rowwise_euclidean,debug_plot=False, plot_file=None) after importing nolds it runs fine, and much faster. This is assuming that nolds is producing an accurate measure of sample entropy.

@nikdon
Copy link
Owner

nikdon commented Apr 10, 2018

Hey, thank you for the PR.

Please refer to the issue #1 . The same topic was discussed there. In summary, there is a difference in calculation of sample entropy for different sample lengths or so-called embedded dimensions. Also, function sample_entropy is ported from SampEn from PhysioNet.

In general, for different sample_lengths we have counts of distances between embedded vectors, how many are smaller than tolerance a and, as I remember, the total number b. And by taking a -log(a/b) the sample entropy is estimated.

Function sample_entropy here returns a vector where a value at index i (starting from 0) is sample entropy for the sample_length + 1. While it seems to has the same result for the sample_length = 1, for sample_length > 1 they are not equal. The main reason, I believe, is in the way the counts estimated. I don't have a free time to go deeper into this question, but I encourage you to do it.

And finally, if you want to take into account a sample_length for MSE, I think, it's necessary to take the last value from the sample_entropy output.

Anyway, I'll be happy to merge this PR as soon as possible with the corrections related to the picking of the sample entropy value.

P.S. Please forgive me any errors in the explanation/definitions as it was a long time ago and what you can find here is just a quick answer with some highlights about the implementation :)

@gattia
Copy link
Author

gattia commented Apr 10, 2018

Thanks for the reply.

From reading physionet it seems that the multiple outputs for sample_entropy are just the entropy for if sample_length is set to any value from 1 to the maximum number specified in sample_length. So the number of outputs makes sense from that function. There are still different answers to nolds, but as you reported in the other issue that might be due to a different distance function being used.

I think it's important to note that there is (or should be) a distinct difference between sample_legnth and scale_factor (I've added scale_factor to the implementation). The sample_length is the length of the sequences being compared. Whereas the scale_factor is a downsampling of the data. The data is meant to be downsampled by a factor equal to every integer upto the scale_factor, and for each of these scales the sample entropy calculated. The MSE analysis should return the sample entropy for every one of these scales (upto the specified scale_factor). An alternate output for MSE is the area under the MSE curve - this would give a single value.

Having made that distinction, I disagree that to get MSE you need to take the last output of sample_entropy. If that were the case, I dont see the reason for the for loop. You could instead just calculate the sample entropy at just the inputted scale_factor and you would be done. The purpose of the MSE and the loop that you've written is to calculate the sample entropy for different time scales (scale_factor) and to return them.

@nikdon
Copy link
Owner

nikdon commented Apr 11, 2018

You are right about the last value. But my point was about the output type returned, in one case its an array, in another its a single value. I think it's necessary to change the implementation of this function in accordance with the noels one and the underlying from the R package. What do you think?

@gattia
Copy link
Author

gattia commented Apr 11, 2018

Im thinking you are talking about the sample entropy function, while I am talking about the multi-scale entropy. For the sample entropy it makes sense to me that it's a single value. Whereas for multi scale entropy it should be an array. But this is definitely not my field, so take that with a grain of salt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants