Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggesting to add a category of Data Mining in AI #5402

Open
jundongl opened this issue Dec 5, 2022 · 8 comments
Open

Suggesting to add a category of Data Mining in AI #5402

jundongl opened this issue Dec 5, 2022 · 8 comments

Comments

@jundongl
Copy link

jundongl commented Dec 5, 2022

I am very disappointed that the KDD conference was simply removed from the "Machine Learning & Data Mining" category as a default conference.

Data mining is a large and thriving research community, and CSrankings should list the top conferences in this area.

According to google scholar metrics, the top 3 conferences in data mining are KDD, WSDM, and ICDM
https://scholar.google.com/citations?hl=en&view_op=search_venues&vq=data+mining&btnG=

I am suggesting adding an independent category of Data Mining in the AI areas and listing these three conferences.

@canyuchen
Copy link

canyuchen commented Dec 5, 2022

Or maybe including KDD, WSDM, CIKM in the independent Data Mining category according to google scholar metrics.

https://scholar.google.com/citations?hl=en&view_op=search_venues&vq=data+mining&btnG=
image

https://scholar.google.com/citations?hl=en&view_op=search_venues&vq=ACM+International+Conference+on+Information+and+Knowledge+Management&btnG=
image

@acbull
Copy link
Contributor

acbull commented Dec 5, 2022

I totally agree that csrankings shall not simply change "Machine learning and Data Mining" into "Machine Learning", while still keeping KDD in this category. Not only is it because of the thriving community of data mining, but also because the problems DM community focuses on are not limited to applied ML.

This debate starts from this historical issue #238.
In the issue, the main argument that "Machine Learning and Data Mining" shall not be separated is that Data Mining is regarded as "applied machine learning"

However, most data mining researchers will not agree with this argument. Similar as researchers in Natural Language Processing community will not agree NLP is "applied ML for text", and researchers in Computer Vision community will not agree CV is "applied ML for vision".

Take the KDD 2022 program agenda as an example. The sessions include:

Graphs and Networks
Causal Analysis and Explainability
Data Privacy, Ethics and Data Science for Society
Information Security
Anomaly Detection
Spatiotemporal Data
Time Series and Streaming Data
Data Cleaning, Transformation and Integration
Online Learning and Transfer Learning
Clustering, Imbalanced Data and Tensors
User Modeling, Knowledge and Ontologies, Web and Commerce
Recommendation Systems
Interdisciplinary Applications: (including Biology, Climate, Physics, Medical, Social Good

Among them, there exist many topics that are unique in the research field of data mining, with their own problem scope and philosophy, instead of merely applied ML. And many of the papers published in this field (including KDD, ICDM, WSDM, CIKM, Recsys, ...) are not only talking about ML (i.e. learn parametric model from data), but also relate to statistics, data science, data structure, data infrastructure, system design, novel findings via mining from data, etc.

Based on this, I highly advocate adding another category as "Data Mining", similar to the current "Natural Language Processing" and "Computer Vision", as they are three equally important sub-categories under the AI category.

Regarding which conferences shall be included in a newly added "Data Mining" sub-category, I think we could based on the Google scholar citation metric at the beginning. But in the long term, I still think it's much more convincing to have a poll to let the whole DM community vote for the top-3 venues. (Similar suggestions have been raised in this issue: #4683 (comment). Within the same research sub-category, voting is reasonable as the average acceptance rate, and number among different conferences are similar.

@yzhao062
Copy link

yzhao062 commented Dec 5, 2022

Agreed with the point here. As data mining researchers, we have already faced the issue that only KDD is taken into account in csranking, while WSDM and ICDM are not. I would also recommend adding a new category under AI called data mining, and including KDD, WSDM, and ICDM. This seems a better fix than removing KDD from ML...

@lalalandlala
Copy link

Oh c’mon, KDD is already a second-tier conference nowadays, why we need to include even worse ones like WSDM or CIKM?? They are not representative, most top school never care about those at all. MIT won’t go to KDD, Berkeley won’t either. So let it be!

@jiank2
Copy link

jiank2 commented Dec 5, 2022

Oh c’mon, KDD is already a second-tier conference nowadays, why we need to include even worse ones like WSDM or CIKM?? They are not representative, most top school never care about those at all. MIT won’t go to KDD, Berkeley won’t either. So let it be!

KDD has papers from CMU, Stanford, UIUC, UW, Cornell, GeorgiaTech, UCLA, UCSD, etc. Are you saying that they are not top schools in CS? Oh btw, KDD does have papers from MIT and Berkeley :-)

@lalalandlala
Copy link

Oh c’mon, KDD is already a second-tier conference nowadays, why we need to include even worse ones like WSDM or CIKM?? They are not representative, most top school never care about those at all. MIT won’t go to KDD, Berkeley won’t either. So let it be!

KDD has papers from CMU, Stanford, UIUC, UW, Cornell, GeorgiaTech, UCLA, UCSD, etc. Are you saying that they are not top schools in CS? Oh btw, KDD does have papers from MIT and Berkeley :-)

Everyone has some papers that are not good enough, then those papers will go to KDD if it’s a good fit. No one from Stanford, MIT, Berkeley, UW has at least one paper on average every year in the past decade. Either people in DM area are no longer active, or they don’t recruit people in DM area. Only people from UIUC, CMU and Cornell are still active in this area. Half of the top institutions don’t care about what’s going on, indicating it’s already a dead area.

@kno10
Copy link
Contributor

kno10 commented Dec 6, 2022

Most KDD papers are machine learning and deep learning now. When was the last frequent itemset mining paper at KDD? All deep neural networks now, isn't it?
So there is some truth to this decision, IMHO.
If KDD does not differentiate itself anymore from NeurIPS and ICML, why should CSRankings?

@lalalandlala
Copy link

Most KDD papers are machine learning and deep learning now. When was the last frequent itemset mining paper at KDD? All deep neural networks now, isn't it?
So there is some truth to this decision, IMHO.
If KDD does not differentiate itself anymore from NeurIPS and ICML, why should CSRankings?

NLP is all deep neural networks now, CV is all deep neural networks now, I suggest we merge both of them into ML.

BERT is from NLP community, ResNet is from vision community. Who the hell cares what the data mining field is doing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants