-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add findpeaks #369
base: master
Are you sure you want to change the base?
add findpeaks #369
Conversation
Cheers and thanks - I hope (for your sake and mine) that this can make it into the DSP package... |
Thanks for the PR, I'll review it soon. |
Thanks @tungli for the PR. I have always used the images.jl findlocalmaxima for this purpose. What would this function provide over this existing implementation? I think its ok to have some duplication across packages, but we should consider this before merging. findlocalmaxima works in multiple dimensions, but this function seems specific to one dimension? What other differences are there? Images.jl is well maintained and very efficient, I think we would need a good reason to duplicate functionality. |
@rob-luke I don't think To do something similar in 2D or higher dimensions is much harder, that's why this is restricted to 1D. |
I don't quite understand what you mean by filtering functionality, could you elaborate on what you mean here and be as explicit as possible. But based on your figures I think I understand what you are trying to achieve. Here are a few example snippets I use to find maxima (or peaks) using Images.jl. I have run them on your data to match your figures maxima = findlocalmaxima(imfilter(y, KernelFactors.gaussian((3))))
maxima = [getindex(max, 1) for max in maxima]
f = plot(x, y)
scatter!(x[maxima], y[maxima]) or you could use Images.jl mapwindow with an additional line to implement your maxima = unique(mapwindow(maximum, y, 101, border="replicate"))
idxs = [findfirst(y .== max) for max in maxima]
[i[2] == 1 ? y[idxs[i[1]-1]] < y[idxs[i[1]]] ? idxs[i[1]-1] = idxs[1] : 0 : 0 for i in enumerate([2; diff(idxs)])]
f = plot(x, y)
scatter!(x[newidx], y[newidx]) I am sure @timholy has cleaner examples of how to use mapwindow on one dimensional data, but this should illustrate that peak finding is already implemented in a robust existing package. Images.jl doesn't have the additional arguments you have like min_height, but these are just one liners as illustrated above. There have also been several other packages to produce maxima or minima filters (MinMaxFilter.jl, MaxMinFilter.jl, RollingFilters.jl, LocalFilters.jl, etc) and see nice discussion here. Further, the images.jl code works in multiple dimensions. But I guess the problem is that people aren't managing to find the Images.jl functions or cant see how it can be used to achieve 1 dimensional filtering (see also #343). So maybe it would be handy to point people in the right direction? This could be something for the new website (https://github.com/JuliaDSP/juliadsp.github.io/issues/1#issuecomment-647051107)? Alternatively I may have misunderstood what you are trying to achieve, so please feel free to point out where I have got mixed up. And where your proposal extends beyond Images.jl. I am not opposed to adding a peak finding function, but I think it needs to be clear how it provides additional functionality over what's capable with Images.jl. |
I think this is similar to Matlab's |
Ok so there is some specific behaviour that this PR implements. It would be handy to provide an example which highlights the benefits of this approach and under what situations this method is advantageous over existing implementations, this isn’t evident from the example provided in this PR. Also, I’m just curious now, maybe I should be using this prominence approach rather than my snippets. |
Thanks for sharing your approach with Images.jl @rob-luke and for the links. As @galenlynch points out the prominence approach is quite powerful - it is also the reason why I implemented the code in this PR. I think there are two questions that have to be decided on:
My own thoughts on these are: |
I think it's appropriate, and the interface is intuitive. Thanks for submitting the PR, I am almost finished with my review of it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tungli for the useful and very well documented and tested contribution, I think DSP.jl would be a great home for this function. I also agree that its presence in both SciPy and Matlab indicates that this is definitely a useful tool to analyze both signals and spectra.
While this code is already very impressive, there are some aspects of its API that I would like to change, or at least discuss, before it becomes part of DSP.jl. Since we are committing to maintaining that API for a while, I think it makes sense to invest some time in deciding what API we want. I understand from your initial comments that you may not have the time to work on this PR very much, but I think these changes would not take much time. If you still do not have the time to make these changes, then perhaps we might be able to make those change.
Please see the review comments for more details, but the main points that I think we should discuss about the API are
- handling of plateaus
- default values for filtering parameters
- type signature of
findpeaks
- sorting of results
- inclusion/exclusion of peaks that exactly match the "minimal" conditions
- handling of empty arrays
Thanks once again for submitting your package to DSP.jl. It seems like people are already using your package quite a bit, and I for one would be happy to have it as part of DSP.jl.
src/findpeaks.jl
Outdated
min_height :: T = minimum(y), | ||
min_prom :: T = zero(y[1]), | ||
min_dist :: S = zero(x[1]), | ||
threshold :: T = zero(y[1]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Each of these lines will throw an error if
y
is empty - The type signature is pretty restrictive. For example, if
y
is aVector{Float64}
, you cannot setmin_height
,min_prom
, orthreshold
to0
or1
. Does it really need to be that restrictive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the first point, if we don't want to throw on empty data, this might be better:
function findpeaks(
y :: AbstractVector{T},
x :: AbstractVector{S} = collect(1:length(y))
;
kwargs...
) where {T <: Real, S}
if isempty(y)
return empty(x)
end
min_height = get(kwargs, :min_height, minimum(y))
min_prom = get(kwargs, :min_prom, zero(y[1]))
min_dist = get(kwargs, :min_dist, zero(x[1]))
threshold = get(kwargs, :threshold, zero(y[1]))
dy = diff(y)
...
""" | ||
`findpeaks(y::Array{T}, | ||
x::Array{S}=collect(1:length(y)) | ||
;min_height::T=minimum(y), min_prom::T=minimum(y), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the default in the documentation for min_prom
does not match the actual default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I noticed just after submitting the PR. The correct version should be in the issue above.
*Keyword*:\n | ||
`min_height` -- minimal peak height\n | ||
`min_prom` -- minimal peak prominence\n | ||
`min_dist` -- minimal peak distance (keeping highest peaks)\n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if the peaks are the same size?
*Optional*:\n | ||
`x` -- x-data\n | ||
*Keyword*:\n | ||
`min_height` -- minimal peak height\n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find "minimal" ambiguous here. What happens if the value is on the threshold? It looks like peaks that are exactly the same as any of these minimal criteria are excluded from the results. The name "min_height" implies, at least to me, that it would be the minimum value of the corresponding property of the result set, and should therefore be inclusive (i.e. values that exactly match the threshold should be included in the results).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what do you suggest here?
`min_height` -- minimal peak height\n | ||
`min_prom` -- minimal peak prominence\n | ||
`min_dist` -- minimal peak distance (keeping highest peaks)\n | ||
`threshold` -- minimal difference (absolute value) between |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by absolute value here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not matter whether the value is negative or positive:
`threshold` -- minimal difference (absolute value) between | |
`threshold` -- minimal absolute value of difference between |
... but perhaps there is a better (single) word for it
src/findpeaks.jl
Outdated
x::Array{S}=collect(1:length(y)) | ||
;min_height::T=minimum(y), min_prom::T=minimum(y), | ||
min_dist::S=0, threshold::T=0 ) where {T<:Real,S}`\n | ||
Returns indices of local maxima (sorted from highest peaks to lowest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking over this PR, it seems like you're returning the peak indices sorted by their value because of the way you filter to satisfy min_dist
. I personally think it makes more sense to return the indices in the order that they occur, which is also what Matlab does.
@@ -0,0 +1,74 @@ | |||
using DSP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Could you please add a test with empty inputs?
- Could you please add a test where
x
andy
have differing lengths? In that case, I thinkfindpeaks
should through an informativeArgumentError
- Many of these tests rely on the RNG. Could you please seed the RNG inside each
@testset
that uses it to ensure the result of the tests aren't themselves a random variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I will add those, but as for the RNG: there is a seed!
in runtests.jl already and other tests that do use rand
do not have their own seeds. Is this necessary then?
src/findpeaks.jl
Outdated
;min_height::T=minimum(y), min_prom::T=minimum(y), | ||
min_dist::S=0, threshold::T=0 ) where {T<:Real,S}`\n | ||
Returns indices of local maxima (sorted from highest peaks to lowest) | ||
in 1D array of real numbers. Similar to MATLAB's findpeaks().\n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way plateaus are treated by this function differs from both Matlab or SciPy. If the input signal contains a number of adjacent indices with the same value that is flanked with lower values, then each of the indices in the "plateau" will be considered a peak. E.g. findpeaks([0, 1, 1, 1, 1, 1, 0])
returns [2, 3, 4, 5, 6]
. In contrast, Matlab would return the first index of the plateau, and SciPy would return the middle index (rounded down if an even number of points). Of these behaviors, I think SciPy's makes the most sense. I think the current behavior should be changed to match SciPy's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning the middle index also seems to me to be more reasonable than the first index.
But doing so loses information about the fact that there is a plateau there and, depending on the nature of data, technically not a peak. Also, for non-equally spaced x
, the "middle" may not be so much a "middle" anymore.
Looking at scipy's find_peaks I also noticed they have a plateau_size
parameter and a corresponding output.
This would be nice to have - then returning the middle index is OK, since there is also output about the size.
If you'll agree I could add this.
|
||
""" | ||
`findpeaks(y::Array{T}, | ||
x::Array{S}=collect(1:length(y)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange things happen if x
is, for example, all zeros. For many of the default filtering values below, I assume using the default value means "do not filter". However, in this (corner) case filtering is still done. I think using nothing
as a default value for keyword parameters to indicate that no filtering should be done would make sense.
`min_dist` -- minimal peak distance (keeping highest peaks)\n | ||
`threshold` -- minimal difference (absolute value) between | ||
peak and neighboring points\n | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some information should be given about the behavior of findpeaks
if the input array contains NaN
s. Right now it seems like no peaks are returned in that case. SciPy's find_peaks
has a disclaimer that the function will not behave as intended when operating on NaN
s, which I think is sufficient.
src/findpeaks.jl
Outdated
peaks = peaks[y[peaks] .> min_height] | ||
yP = y[peaks] | ||
|
||
peaks = with_distance(peaks, x, y, min_dist) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x
and y
should be checked to be the same length.
Thank you for the review @galenlynch! I really appreciate it. |
No worries on the delay, thanks once again for the great PR! |
@galenlynch and others, I apologize for the huge delay. I changed a few things that revolve around collecting additional information about peaks during the search. I felt this change was necessary mainly because of the plateau handling (see comment above -> returning only one point of the plateau would result in loss of information about the plateau) and it is also a nice and, more or less, free benefit to have the peak information available. I have also included every one of @galenlynch's comments, but this one:
which I need to think through a bit more. It seems to me that having all zero |
I still have doubts about the plateau handling, because the plateau filtering gets cancelled out by the threshold filtering and vice versa. This also seems to be the case in scipy. Perhaps a better version would treat every plateau as a single peak. For this I think the modification will need to be the following:
The last point gets rid of the where-in-the-plateau-is-the-peak? issue - every peak will be treated as a range (included in the Minimal distance combined with plateaus will also probably be more meaningful. |
0e17569
to
fc628f4
Compare
As an interested party (author of Peaks.jl), I will throw in my (biased) 2-cents: If I am correctly following the functionality being added/requested, the goal is to:
I will note that the latest Peaks.jl version handles Regardless, on the subject of API bikeshedding, I think that factoring out the different features (peak finding, calculation and filtering by prominence, height, and threshold) into separate functions would present a simpler and more flexible interface (eg something along the lines of I also think that structs being returned would unnecessarily complicate the user's handling of the returned data. In most cases, users will only need one or two of the fields in I'm also curious how the performance compares to equivalent Peaks.jl or Images.jl functions; somewhat dated benchmarking I have previously done showed similar performance scaling between Peaks.jl and Images.jl, with a slight edge towards my package. |
This pull request is very interesting. Is there any way to solve the 2D case? I'm looking for the Julia version of skimage.feature.peak_local_max but it seems this may not exist yet..? |
using Images
help?> findlocalmaxima
search: findlocalmaxima findlocalminima
findlocalmaxima(img, [region, edges]) -> Vector{CartesianIndex}
Returns the coordinates of elements whose value is larger than all of their immediate neighbors. region is a list of dimensions to consider. edges is a boolean specifying whether to include the first
and last elements of each dimension, or a tuple-of-Bool specifying edge behavior for each dimension separately. |
This seems like interesting functionality. I am wondering when it may be available from within DSP.jl? |
Would it make sense to update the documentation in DSP.jl to refer to the Peaks.jl package? |
I agree that referencing Peaks.jl in DSP.jl documentation would be useful. |
Introduction
findpeaks
is a function which extracts filtered local maxima based on a few parameters.It is often needed in data processing related to analysis of 1D spectra or other noisy signal data where finding maxima (or minima) is a routine task.
This function exists in MATLAB (Signal Processing Toolbox) and Python (scipy).
Contents
This PR contains source code, tests, docs changes, a data file and two pictures, all related to the
findpeaks
function.Motivation
It was recently suggested to me to by @cormullion that a repository I created a long time ago would be better placed in the DSP package.
I base this assumption on the following active issue: #303
Notes
I am certain that the code I am proposing to you for review may be further optimized (possibly through use of lazy iterators and by minimizing the number of passes through the data vector) and its functionality expanded.
Sadly, I am not an active Julia user right now nor have the time to implement the necessary changes at the moment.
Also there is another package with similar functionality:
https://github.com/halleysfifthinc/Peaks.jl
Thank you for consideration!