Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add findpeaks #369

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open

add findpeaks #369

wants to merge 12 commits into from

Conversation

tungli
Copy link

@tungli tungli commented Jun 16, 2020

Introduction

findpeaks is a function which extracts filtered local maxima based on a few parameters.
It is often needed in data processing related to analysis of 1D spectra or other noisy signal data where finding maxima (or minima) is a routine task.

This function exists in MATLAB (Signal Processing Toolbox) and Python (scipy).

Contents

This PR contains source code, tests, docs changes, a data file and two pictures, all related to the findpeaks function.

Motivation

It was recently suggested to me to by @cormullion that a repository I created a long time ago would be better placed in the DSP package.
I base this assumption on the following active issue: #303

Notes

I am certain that the code I am proposing to you for review may be further optimized (possibly through use of lazy iterators and by minimizing the number of passes through the data vector) and its functionality expanded.
Sadly, I am not an active Julia user right now nor have the time to implement the necessary changes at the moment.

Also there is another package with similar functionality:
https://github.com/halleysfifthinc/Peaks.jl

Thank you for consideration!

@cormullion
Copy link

Cheers and thanks - I hope (for your sake and mine) that this can make it into the DSP package...

@galenlynch
Copy link
Member

Thanks for the PR, I'll review it soon.

@rob-luke
Copy link
Member

Thanks @tungli for the PR.

I have always used the images.jl findlocalmaxima for this purpose. What would this function provide over this existing implementation?

I think its ok to have some duplication across packages, but we should consider this before merging. findlocalmaxima works in multiple dimensions, but this function seems specific to one dimension? What other differences are there? Images.jl is well maintained and very efficient, I think we would need a good reason to duplicate functionality.

@tungli
Copy link
Author

tungli commented Jun 23, 2020

@rob-luke I don't think findlocalmaxima offers filtering functionality.
In processing noisy data you want to filter local maxima, usually based on some global property.
I've added some pictures that illustrate what I mean.

To do something similar in 2D or higher dimensions is much harder, that's why this is restricted to 1D.

@rob-luke
Copy link
Member

I don't quite understand what you mean by filtering functionality, could you elaborate on what you mean here and be as explicit as possible. But based on your figures I think I understand what you are trying to achieve.

Here are a few example snippets I use to find maxima (or peaks) using Images.jl. I have run them on your data to match your figures

maxima = findlocalmaxima(imfilter(y, KernelFactors.gaussian((3))))
maxima = [getindex(max, 1) for max in maxima]

f = plot(x, y)
scatter!(x[maxima], y[maxima])

Screen Shot 2020-06-23 at 8 25 43 pm

or you could use Images.jl mapwindow with an additional line to implement your min_dist...

maxima = unique(mapwindow(maximum, y, 101, border="replicate"))
idxs = [findfirst(y .== max) for max in maxima]
[i[2] == 1 ? y[idxs[i[1]-1]] < y[idxs[i[1]]] ? idxs[i[1]-1] = idxs[1] : 0  : 0 for i in enumerate([2; diff(idxs)])]

f = plot(x, y)
scatter!(x[newidx], y[newidx])

Screen Shot 2020-06-23 at 9 44 09 pm

I am sure @timholy has cleaner examples of how to use mapwindow on one dimensional data, but this should illustrate that peak finding is already implemented in a robust existing package. Images.jl doesn't have the additional arguments you have like min_height, but these are just one liners as illustrated above. There have also been several other packages to produce maxima or minima filters (MinMaxFilter.jl, MaxMinFilter.jl, RollingFilters.jl, LocalFilters.jl, etc) and see nice discussion here. Further, the images.jl code works in multiple dimensions.

But I guess the problem is that people aren't managing to find the Images.jl functions or cant see how it can be used to achieve 1 dimensional filtering (see also #343). So maybe it would be handy to point people in the right direction? This could be something for the new website (https://github.com/JuliaDSP/juliadsp.github.io/issues/1#issuecomment-647051107)?

Alternatively I may have misunderstood what you are trying to achieve, so please feel free to point out where I have got mixed up. And where your proposal extends beyond Images.jl. I am not opposed to adding a peak finding function, but I think it needs to be clear how it provides additional functionality over what's capable with Images.jl.

@galenlynch
Copy link
Member

I think this is similar to Matlab's findpeaks, which allows you to filter by peak prominence and half width etc., which is hard to recreate with just filtering a signal and then finding local maxima in the filtered signal (e.g. see their section on prominence: https://uk.mathworks.com/help/signal/ref/findpeaks.html#buff2t6). Though I don't use this function very often in Matlab, my understanding is that it would be hard to recreate its peak filtering behavior with Images.jl or similar packages. Is that right @tungli ?

@rob-luke
Copy link
Member

rob-luke commented Jun 23, 2020

Ok so there is some specific behaviour that this PR implements. It would be handy to provide an example which highlights the benefits of this approach and under what situations this method is advantageous over existing implementations, this isn’t evident from the example provided in this PR.

Also, I’m just curious now, maybe I should be using this prominence approach rather than my snippets.

@tungli
Copy link
Author

tungli commented Jun 24, 2020

Thanks for sharing your approach with Images.jl @rob-luke and for the links.
You are correct - this PR can mostly be replaced by appropriate use of Images.jl with some filters.

As @galenlynch points out the prominence approach is quite powerful - it is also the reason why I implemented the code in this PR.
By tuning this one parameter, findpeaks (almost always) gives you peaks that you yourself would consider to be peaks by eye (as opposed to noise).

I think there are two questions that have to be decided on:

  1. Does the peak-finding functionality need to be in a signal processing package as well, given that it is already in Images.jl or other less-known packages?
  2. Is this the interface DSP.jl wants to provide their users (i.e. filtering through few parameters (prominence, etc..)).

My own thoughts on these are:
Given that in Matlab and Python findpeaks is found in a signal processing, my guess is people would expect to find it here (as can be extrapolated from the links in the Motivation section of PR).
As for the interface, I find this to be intuitive and easy-to-use for people processing spectra/signal. It may not be the most general or efficient approach but I think it could satisfy users with background in Matlab/Python. Since I am not really familiar with DSP.jl, you guys need to decide whether this fits in with the rest of DSP.jl.

@galenlynch
Copy link
Member

I think it's appropriate, and the interface is intuitive. Thanks for submitting the PR, I am almost finished with my review of it.

Copy link
Member

@galenlynch galenlynch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tungli for the useful and very well documented and tested contribution, I think DSP.jl would be a great home for this function. I also agree that its presence in both SciPy and Matlab indicates that this is definitely a useful tool to analyze both signals and spectra.

While this code is already very impressive, there are some aspects of its API that I would like to change, or at least discuss, before it becomes part of DSP.jl. Since we are committing to maintaining that API for a while, I think it makes sense to invest some time in deciding what API we want. I understand from your initial comments that you may not have the time to work on this PR very much, but I think these changes would not take much time. If you still do not have the time to make these changes, then perhaps we might be able to make those change.

Please see the review comments for more details, but the main points that I think we should discuss about the API are

  • handling of plateaus
  • default values for filtering parameters
  • type signature of findpeaks
  • sorting of results
  • inclusion/exclusion of peaks that exactly match the "minimal" conditions
  • handling of empty arrays

Thanks once again for submitting your package to DSP.jl. It seems like people are already using your package quite a bit, and I for one would be happy to have it as part of DSP.jl.

src/findpeaks.jl Outdated
Comment on lines 27 to 30
min_height :: T = minimum(y),
min_prom :: T = zero(y[1]),
min_dist :: S = zero(x[1]),
threshold :: T = zero(y[1]),
Copy link
Member

@galenlynch galenlynch Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Each of these lines will throw an error if y is empty
  • The type signature is pretty restrictive. For example, if y is a Vector{Float64}, you cannot set min_height, min_prom, or threshold to 0 or 1. Does it really need to be that restrictive?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the first point, if we don't want to throw on empty data, this might be better:

function findpeaks(
                   y :: AbstractVector{T},
                   x :: AbstractVector{S} = collect(1:length(y))
                   ;
                   kwargs...
                  ) where {T <: Real, S}

    if isempty(y)
        return empty(x)
    end

    min_height = get(kwargs, :min_height, minimum(y))
    min_prom   = get(kwargs, :min_prom,   zero(y[1]))
    min_dist   = get(kwargs, :min_dist,   zero(x[1]))
    threshold  = get(kwargs, :threshold,  zero(y[1]))

    dy = diff(y)
...

"""
`findpeaks(y::Array{T},
x::Array{S}=collect(1:length(y))
;min_height::T=minimum(y), min_prom::T=minimum(y),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default in the documentation for min_prom does not match the actual default.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I noticed just after submitting the PR. The correct version should be in the issue above.

*Keyword*:\n
`min_height` -- minimal peak height\n
`min_prom` -- minimal peak prominence\n
`min_dist` -- minimal peak distance (keeping highest peaks)\n
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if the peaks are the same size?

*Optional*:\n
`x` -- x-data\n
*Keyword*:\n
`min_height` -- minimal peak height\n
Copy link
Member

@galenlynch galenlynch Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find "minimal" ambiguous here. What happens if the value is on the threshold? It looks like peaks that are exactly the same as any of these minimal criteria are excluded from the results. The name "min_height" implies, at least to me, that it would be the minimum value of the corresponding property of the result set, and should therefore be inclusive (i.e. values that exactly match the threshold should be included in the results).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what do you suggest here?

`min_height` -- minimal peak height\n
`min_prom` -- minimal peak prominence\n
`min_dist` -- minimal peak distance (keeping highest peaks)\n
`threshold` -- minimal difference (absolute value) between
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by absolute value here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not matter whether the value is negative or positive:

Suggested change
`threshold` -- minimal difference (absolute value) between
`threshold` -- minimal absolute value of difference between

... but perhaps there is a better (single) word for it

src/findpeaks.jl Outdated
x::Array{S}=collect(1:length(y))
;min_height::T=minimum(y), min_prom::T=minimum(y),
min_dist::S=0, threshold::T=0 ) where {T<:Real,S}`\n
Returns indices of local maxima (sorted from highest peaks to lowest)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking over this PR, it seems like you're returning the peak indices sorted by their value because of the way you filter to satisfy min_dist. I personally think it makes more sense to return the indices in the order that they occur, which is also what Matlab does.

@@ -0,0 +1,74 @@
using DSP
Copy link
Member

@galenlynch galenlynch Jun 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Could you please add a test with empty inputs?
  • Could you please add a test where x and y have differing lengths? In that case, I think findpeaks should through an informative ArgumentError
  • Many of these tests rely on the RNG. Could you please seed the RNG inside each @testset that uses it to ensure the result of the tests aren't themselves a random variable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I will add those, but as for the RNG: there is a seed! in runtests.jl already and other tests that do use rand do not have their own seeds. Is this necessary then?

src/findpeaks.jl Outdated
;min_height::T=minimum(y), min_prom::T=minimum(y),
min_dist::S=0, threshold::T=0 ) where {T<:Real,S}`\n
Returns indices of local maxima (sorted from highest peaks to lowest)
in 1D array of real numbers. Similar to MATLAB's findpeaks().\n
Copy link
Member

@galenlynch galenlynch Jun 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way plateaus are treated by this function differs from both Matlab or SciPy. If the input signal contains a number of adjacent indices with the same value that is flanked with lower values, then each of the indices in the "plateau" will be considered a peak. E.g. findpeaks([0, 1, 1, 1, 1, 1, 0]) returns [2, 3, 4, 5, 6]. In contrast, Matlab would return the first index of the plateau, and SciPy would return the middle index (rounded down if an even number of points). Of these behaviors, I think SciPy's makes the most sense. I think the current behavior should be changed to match SciPy's.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning the middle index also seems to me to be more reasonable than the first index.
But doing so loses information about the fact that there is a plateau there and, depending on the nature of data, technically not a peak. Also, for non-equally spaced x, the "middle" may not be so much a "middle" anymore.

Looking at scipy's find_peaks I also noticed they have a plateau_size parameter and a corresponding output.
This would be nice to have - then returning the middle index is OK, since there is also output about the size.

If you'll agree I could add this.


"""
`findpeaks(y::Array{T},
x::Array{S}=collect(1:length(y))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange things happen if x is, for example, all zeros. For many of the default filtering values below, I assume using the default value means "do not filter". However, in this (corner) case filtering is still done. I think using nothing as a default value for keyword parameters to indicate that no filtering should be done would make sense.

`min_dist` -- minimal peak distance (keeping highest peaks)\n
`threshold` -- minimal difference (absolute value) between
peak and neighboring points\n
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some information should be given about the behavior of findpeaks if the input array contains NaNs. Right now it seems like no peaks are returned in that case. SciPy's find_peaks has a disclaimer that the function will not behave as intended when operating on NaNs, which I think is sufficient.

src/findpeaks.jl Outdated
peaks = peaks[y[peaks] .> min_height]
yP = y[peaks]

peaks = with_distance(peaks, x, y, min_dist)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x and y should be checked to be the same length.

@tungli tungli requested a review from galenlynch June 25, 2020 09:10
@tungli
Copy link
Author

tungli commented Jun 29, 2020

Thank you for the review @galenlynch! I really appreciate it.
Unfortunately, I am a bit busy these few weeks. With some luck, I will be able to resolve the issues this or the next weekend.
I am sorry for the delay.

@galenlynch
Copy link
Member

No worries on the delay, thanks once again for the great PR!

@tungli
Copy link
Author

tungli commented Sep 17, 2020

@galenlynch and others, I apologize for the huge delay.

I changed a few things that revolve around collecting additional information about peaks during the search.
This done similar to how scipy does it.
Now a struct is returned together with the peaks (although I still have to write tests for the struct part).

I felt this change was necessary mainly because of the plateau handling (see comment above -> returning only one point of the plateau would result in loss of information about the plateau) and it is also a nice and, more or less, free benefit to have the peak information available.

I have also included every one of @galenlynch's comments, but this one:

Strange things happen if x is, for example, all zeros. For many of the default filtering values below, I assume using the default value means "do not filter". However, in this (corner) case filtering is still done. I think using nothing as a default value for keyword parameters to indicate that no filtering should be done would make sense.

which I need to think through a bit more. It seems to me that having all zero x (or even constant x) would be a bit inconsistent with peak finding. Also, up to now I avoided Nothing with the choice of defaults in such a way to simulate that no filtering is done (although the filter is run and does not (or at least should not) change anything).

@tungli
Copy link
Author

tungli commented Sep 17, 2020

I still have doubts about the plateau handling, because the plateau filtering gets cancelled out by the threshold filtering and vice versa. This also seems to be the case in scipy.

Perhaps a better version would treat every plateau as a single peak.

For this I think the modification will need to be the following:

  • change the order of filtering and find plateaus first
  • after plateaus are found, change the internal data representation of y such that there are no consequent repeated values (an example
  • return only the PeakInfo struct (no peaks)

The last point gets rid of the where-in-the-plateau-is-the-peak? issue - every peak will be treated as a range (included in the PeakInfo)

Minimal distance combined with plateaus will also probably be more meaningful.

@tungli tungli force-pushed the master branch 2 times, most recently from 0e17569 to fc628f4 Compare September 18, 2020 07:08
@halleysfifthinc
Copy link
Contributor

As an interested party (author of Peaks.jl), I will throw in my (biased) 2-cents:

If I am correctly following the functionality being added/requested, the goal is to:

  • find local maxima of a range of min_dist
  • filter maxima by peak prominence, height, and threshold

I will note that the latest Peaks.jl version handles min_dist, plateaus (MATLAB behavior), prominence calculation and filtering, and supports the presence of NaN and missing values.

Regardless, on the subject of API bikeshedding, I think that factoring out the different features (peak finding, calculation and filtering by prominence, height, and threshold) into separate functions would present a simpler and more flexible interface (eg something along the lines of findpeaks, peakprominence, peakheight, peakthreshold).

I also think that structs being returned would unnecessarily complicate the user's handling of the returned data. In most cases, users will only need one or two of the fields in PeakInfo; in particular, plateaus will not be a concern for many sources of data. Separate functions (eg, in the case of plateaus, plateau_extents or similar) for each feature would allow greater control by the user over what is being returned.

I'm also curious how the performance compares to equivalent Peaks.jl or Images.jl functions; somewhat dated benchmarking I have previously done showed similar performance scaling between Peaks.jl and Images.jl, with a slight edge towards my package.

@tbenst
Copy link

tbenst commented Oct 4, 2020

This pull request is very interesting. Is there any way to solve the 2D case? I'm looking for the Julia version of skimage.feature.peak_local_max but it seems this may not exist yet..?

@timholy
Copy link

timholy commented Oct 4, 2020

using Images
help?> findlocalmaxima
search: findlocalmaxima findlocalminima

  findlocalmaxima(img, [region, edges]) -> Vector{CartesianIndex}

  Returns the coordinates of elements whose value is larger than all of their immediate neighbors. region is a list of dimensions to consider. edges is a boolean specifying whether to include the first
  and last elements of each dimension, or a tuple-of-Bool specifying edge behavior for each dimension separately.

@JakeZw
Copy link

JakeZw commented Mar 10, 2021

This seems like interesting functionality. I am wondering when it may be available from within DSP.jl?

@ViralBShah
Copy link
Contributor

Would it make sense to update the documentation in DSP.jl to refer to the Peaks.jl package?

@JakeZw
Copy link

JakeZw commented Feb 9, 2024

I agree that referencing Peaks.jl in DSP.jl documentation would be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants