SIGKDD Awards

2005 SIGKDD Service Award: The WEKA Team

2005 SIGKDD Service Award Award Winner

For their development of the freely-available Weka Data Mining Software, including the accompanying book Data Mining: Practical Machine Learning Tools and Techniques (now in second edition) and much other documentation.

The Weka team includes Ian H. Witten and Eibe Frank, and the following major contributors (in alphabetical order of last names): Remco R. Bouckaert, John G. Cleary, Sally Jo Cunningham, Andrew Donkin, Dale Fletcher, Steve Garner, Mark A. Hall, Geoffrey Holmes, Matt Humphrey, Lyn Hunt, Stuart Inglis, Ashraf M. Kibriya, Richard Kirkby, Brent Martin, Bob McQueen, Craig G. Nevill-Manning, Bernhard Pfahringer, Peter Reutemann, Gabi Schmidberger, Lloyd A. Smith, Tony C. Smith, Kai Ming Ting, Leonard E. Trigg, Yong Wang, Malcolm Ware, and Xin Xu.

The Weka team has put a tremendous amount of effort into continuously developing and maintaining the system since 1994. The development of Weka was funded by a grant from the New Zealand Government's Foundation for Research, Science and Technology.

The key features responsible for Weka's success are:

  • it provides many different algorithms for data mining and machine learning
  • is is open source and freely available
  • it is platform-independent
  • it is easily useable by people who are not data mining specialists
  • it provides flexible facilities for scripting experiments
  • it has kept up-to-date, with new algorithms being added as they appear in the research literature.

The Weka Data Mining Software has been downloaded 200,000 times since it was put on SourceForge in April 2000, and is currently downloaded at a rate of 10,000/month. The Weka mailing list has over 1100 subscribers in 50 countries, including subscribers from many major companies.

There are 15 well-documented substantial projects that incorporate, wrap or extend Weka, and no doubt many more that have not been reported on Sourceforge.

Ian H. Witten and Eibe Frank also wrote a very popular book "Data Mining: Practical Machine Learning Tools and Techniques" (now in the second edition), that seamlessly integrates Weka system into teaching of data mining and machine learning. In addition, they provided excellent teaching material on the book website.

This book became one of the most popular textbooks for data mining and machine learning, and is very frequently cited in scientific publications.

Weka is a landmark system in the history of the data mining and machine learning research communities, because it is the only toolkit that has gained such widespread adoption and survived for an extended period of time (the first version of Weka was released 11 years ago). Other data mining and machine learning systems that have achieved this are individual systems, such as C4.5, not toolkits.

Since Weka is freely available for download and offers many powerful features (sometimes not found in commercial data mining software), it has become one of the most widely used data mining systems. Weka also became one of the favorite vehicles for data mining research and helped to advance it by making many powerful features available to all.

In sum, the Weka team has made an outstanding contribution to the data mining field.

Copyrights © 2016 All Rights Reserved - SIGKDD
ACM Code of Conduct