This International Workshop will provide a combination of introductory, industry relevant, R&D, mathematical, and philosophical introduction to Data Mining, Soft Computing, Rough and Fuzzy Sets. The presenters have interests in practical applications, research and development, as well as academic aspects. This document describes the information on the talks as well as the invited presenters.
Rajendra Akerkar is Senior Scientist at West Norway Research Institute, Norway. He is also Chairman of the Technomathematics Research Foundation, India. His research and teaching assignments have taken him around the world to Germany, Japan, Spain, Holland, Norway, Austria, Canada, Vietnam, and Armenia. He received the BOYSCAST (Young Scientist) award from Government of India, and was recipient of the prestigious DAAD (Germany) fellowship, DAAD Visiting Professorship and UNESCO-TWAS Associateship. He serves as the editor-in-chief of International Journal of Computer Science & Applications and Journal of Hybrid Computing Research, and is on the editorial board of numerous other computer science journals and serves as the program committee member for various international conferences on data mining, semantic systems and cognitive technologies. He has authored more than 80 articles in various international journals and conferences, and authored/co-authored 9 books. His area of interest broadly includes intelligent systems, semi structured data, data mining, and semantic Web.
Abstract: Data mining has been a subject of considerable interest both in academia and industry. Data mining refers to a set of techniques that have been designed to efficiently find interesting pieces of information or knowledge in large amounts of data. Association rules, for instance, are a class of patterns that tell which products tend to be purchased together. Covering machine learning, statistics, and operations research, this technology of knowledge discovery now represents a vital tool to assist in intelligent decision making in the highly complex business environment. This talk gives a brief introduction to data mining process and explores how this interdisciplinary field brings together techniques from databases, statistics, machine learning, and information retrieval. The talk reviews the main data mining methods currently used, including clustering, classification, and association rule techniques. Some applications and trends will also be discussed.
Introduction to Rough and Fuzzy sets
Pawan Lingras is a Professor in the Department of Mathematics and Computing Science at Saint Mary’s University, Halifax, Canada. His undergraduate education from IIT, Bombay was followed by graduate studies at the University of Regina, Canada. He has authored more than 140 research papers in various international journals and conferences. He has also co‐authored a textbook and co‐edited a collection of research papers. His areas of interests include artificial intelligence, information retrieval, data mining, web intelligence, and intelligent transportation systems. He has served as the review committee chair, program committee member, and reviewer for various international conferences on artificial intelligence and data mining.
Abstract: The data mining techniques are based on conventional crisp logic, statistics, and probabilistic theories. Sometimes, the axiomatic limitations of the traditional mathematics can make it awkward to apply these techniques in practical applications. Fuzzy sets introduced in 1965 made it possible to allow partial membership of an object to a set. This flexibility led to development of data mining techniques that make it possible to conduct supervised and unsupervised learning from datasets, as well as identify fuzzy patterns and predict future trends. Rough set theory was introduced in 1982, and provides a less descriptive and complementary alternative to fuzzy set theory. Rough sets allow for multi-level memberships of objects to sets. Researchers have developed rough alternatives to almost all the data mining techniques. This talk will provide a general introduction to both rough and fuzzy set theory that will be helpful towards understanding of the subsequent talks.
Pawan Lingras will also provide general introduction to all the talks and presenters at the beginning of the workshop.
Rough sets application to data warehousing: a commercial success story (Live Online)
Dominik Slezak received his PhD in Computer Science in 2002 from Warsaw University, Poland. In 2005, he co‐founded Infobright Inc., where he is currently working as chief scientist. He is also an adjunct professor at McMaster University, York University, and University of Regina, as well as in the Polish‐Japanese Institute of Information Technology. Dominik serves as an associate editor and reviewer for a number of international scientific journals, and chair of several international scientific conferences. He has published over 50 pier‐reviewed papers for books, journals, and conference proceedings. He has delivered a number of invited talks in Canada, China, Czech Republic, Egypt, India, Japan, Korea, Poland, Russia, Singapore, UK, and US. His research interests are related mainly to rough sets, data warehousing, data mining, KDD, bioinformatics, as well as medical and multimedia data.
Abstract: The theory of rough sets provides a powerful model for representation of patterns and dependencies, applicable both in databases and data mining. On the one hand, although there are numerous rough set applications to data mining and knowledge discovery, the usage of rough sets inside the database engines is still quite an uncharted territory. On the other hand, however, this situation is not so exceptional given that even the most well-known paradigms of machine learning, soft computing, artificial intelligence, and approximate reasoning are still waiting for more recognition in the database research.
Rough set-based algorithms and similar techniques can be applied to improve database performance in several ways. We focus on the idea of using available information to calculate rough approximations of data needed to resolve queries and to assist the database engine in accessing relevant data. We partition data onto rough rows, each consisting of 64K of original rows. We automatically label rough rows with compact information about their values on data columns, often involving multi-column and multi-table relationships. One may say that we create new information systems where objects correspond to rough rows and attributes - to various flavours of rough information.
In this talk, we show how the above ideas guided us toward implementing the fully functional data warehouse product, with interfaces provided via integration with MySQL and internals based on the newest database trends. Thanks to compact, flexible rough information, we became especially competitive in the field of analytical data warehouses, where users want to query terabytes of data in a complex, dynamically changing way. Recently, we announced at www.infobright.org the open source edition of our data warehouse, ready for free usage and further extensions. In the talk, we illustrate the best scenarios of applying our software to various aspects of data processing. We also discuss the most promising directions for further improvement of our technology, with a special attention to the ideas based on the theory of rough sets and corresponding techniques.
Applications of rough and fuzzy hybridizations to bioinformatics and biomedicine
Sushmita Mitra is a Professor at the Machine Intelligence Unit, Indian Statistical Institute, Kolkata. Dr. Mitra received the National Talent Search Scholarship (1978-1983) from NCERT, India, the IEEE TNN Outstanding Paper Award in 1994 for her pioneering work in neuro-fuzzy computing, and the CIMPA-INRIA-UNESCO Fellowship in 1996. She is the author of three books, more than 75 research publications in referred international journals, and associated with editing of books and journals. She is listed as one of the top 100 Women Scientists, in Lilavati's Daughters: The Women Scientists of India, published by the Indian Academy of Sciences in 2008. She served in the capacity of Program Chair, Tutorial Chair, Plenary Speaker, and as member of programme committees of many international conferences. Her current research interests include data mining, pattern recognition, soft computing, image processing, and Bioinformatics.
Abstract: In this talk we cover some of the hybridizations of rough sets with neural networks, fuzzy sets and genetic algorithms, in the broader framework of soft computing. Applications are presented for knowledge encoding, rule extraction, dimensionality reduction, biclustering, and segmentation. Results demonstrate the suitability of the methodologies for feature selection with improved recognition, in diverse domains such as microarray gene expressions for bioinformatics and face recognition. Segmentation of CT scan images of the infracted regions of the brain also exhibit superior results.
Rough sets applications to biological and agricultural applications
Sonajharia Minz is a Professor Computer & Systems Sciences at the Jawaharlal Nehru University in New Delhi. Dr. Minz has been working in topics related to Rough set theory since 2002 having guided 2 PhD’s and 8 M.Tech projects. Her research focuses on issues relating granular computing for Data mining along with application of Rough set theory with Machine learning techniques. Dr. Minz has widely published in the application of rough sets in bioinformatics.
Perspectives of granular computing: past and future research directions (Live Online)
Yiyu Yao is a Professor of computer science in the Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada. His research interests include information retrieval, rough sets, interval sets, granular computing, Web intelligence, data mining and fuzzy sets. He has published over 200 journal and conference papers. He is an area editor of International Journal of Approximate Reasoning, a member of the editorial boards of the Web Intelligence and Agent Systems journal, Transactions on Rough Sets, Journal of Intelligent Information Systems, Journal of Chongqing University of Posts and Telecommunication, The International Journal of Cognitive Informatics & Natural Intelligence (IJCiNi), International Journal of Software Science and Computational Intelligence (IJSSCI). He has served and is serving as a program co chair of several international conferences. He is a member of ACM and IEEE.
Abstract: Granular computing has emerged as a new multidisciplinary study and has received much attention in recent years. A conceptual framework is presented by extracting shared commonalities from many fields. The framework stresses multiple views and multiple levels of understanding in each view. It is argued that granular computing is more about a philosophical way of thinking and a practical methodology of problem solving. By effectively using levels of granularity, granular computing provides a systematic, natural way to analyze, understand, represent, and solve real world problems. With granular computing, one aims at structured thinking at the philosophical level, and structured problem solving at the practical level.
Rough Clustering and Its Dynamic Extension
Georg Peters is a Professor in the Department of Computer Sciences and Mathematics at University of Applied Sciences - Muenchen, Munich, Germany. He received diploma degrees (equivalent to master degrees) in electrical engineering, industrial engineering and in business administration from RWTH Aachen University. He also obtained a PhD in the field of intelligent data analysis from the same university. He has published more than 40 papers in the fields of information systems and soft computing. Currently, his interests include applications of soft computing concepts, in particular rough sets.
Abstract: Since its introduction by Lingras rough clustering has gained increasing attention. As in original rough set theory in rough clustering the concept of two approximations are utilized to define a cluster. In the recent years it has been successfully applied to several real life applications. Recently a dynamic version of rough clustering was suggested which adapts to changing data structures. The presentation gives an overview of rough clustering approaches and discusses areas of applications. Then dynamic rough clustering is introduced.
An Evaluation of Result Merging Models in Metasearch
Dr. Vijay Raghavan is the Distinguished Professor of Computer Science at the Center for Advanced Computer Studies and a co-director of the Laboratory for Internet Computing. His research interests are in data mining, information retrieval, machine learning and Internet computing. He has published over 170 peer-reviewed research papers- many of which appear in top-level journals and proceedings- that cumulatively accord him an h-index* of 21, based on citations. He has served as major advisor for 20 doctoral students and has garnered $8 million in external funding. Dr. Raghavan brings substantial technical expertise, interdisciplinary collaboration experience, and management skills to his projects. His service work at the university includes coordinating the Louis Stokes-Alliance for Minority Participation (LS-AMP) program. From 1997 to 2003, he worked closely with the USGS National Wetlands Research Center and with the Department of Energy's Office of Science and Technical Information on a digital library with data mining capabilities incorporated. He chaired the IEEE International Conference on Data Mining in 2005 and received the ICDM 2005 Outstanding Service Award. He is a member of the Advisory Committee of the NSF Computer and Information Science and Engineering directorate. Dr. Raghavan was honored as the Grand Marshal for the Fall-2008 Graduate School Commencement Exercises at UL Lafayette.
Search engines queried by a metasearch engine return results in the form of a ranked list of documents. The key issue is to combine these lists to achieve the best performance. In our work, we apply fuzzy aggregation operators to result merging. Our work is an extension of s  fuzzy Ordered Weighted Average (OWA) operator based result merging model proposed by Diaz . We propose three extensions to the OWA model for metasearch. These are the Importance Guided OWA (IGOWA), the algebraic t-norm OWA, and the algebraic t-norm IGOWA models. While the first two are based on s extension of the OWA operator, the third is a combination of the first two.
The first model (IGOWA) allows weights to be applied to search engine result lists. The second model (t-norm OWA) allows for alternative t-norm functions to be used in aggregation. The third t-norm IGOWA model allows for both. In our work, for the second and third models we use the algebraic (product) t-norm. We compare and contrast our models and also compare them with existing models such as the OWA model for metasearch proposed by Diaz  and the Borda-Fuse model proposed by Aslam and Montague .
Two of our models, the algebraic t-norm IGOWA model and the IGOWA model, require search engine weights. Thus we develop a new scheme for obtaining search engine weights. We apply our scheme to the above models and observe that using our weighting scheme results in improved result merging.
 Yager, R. R. On ordered weighted averaging aggregation operators in multi-criteria decision making, Fuzzy Sets and Systems, 10, 2 ( July 1983), 243-260.
 Diaz, E. D., De, A., and Raghavan, V. V. A comprehensive OWA-based framework for result merging in metasearch. In Rough Sets, Fuzzy Sets, Data Mining, and Granular-Soft Computing (05) (Regina, Canada, August 31 - September 3, 2005). Springer-Verlag, Heidelberg, DE, 193-201.
 Aslam, J., and Montague, M. Models for metasearch, In Proceedings of the 24th annual international ACM SIGIR Conference on Research and Development in Information Retrieval (01) (New Orleans, LA, USA, September 1-6, 2001). ACM Press, New York, NY, 2001, 276-284. SIGIR RSFDGrC YagerYager
Brief presentations from interested participants in Rough sets, Fuzzy sets, and Soft computing
Participants are encouraged to submit an extended abstract and a copy of their ten minute presentation on firstname.lastname@example.org for review by June 30, 2009.
The session will be led by Dr. Ashok Deshpande, who is an adjunct professor of Bioinformatics at University of Pune and College of Engineering Pune (COEP). Since early 80’s he is involved in fuzzy logic and its application to variety of systems. He is co-chair of Berkley Initiatives in Soft Computing in Bioinformatics. His group is trying to develop fuzzy logic based infusion pump for anaesthesia control at the Bio Medical Engineering at COEP.
© PUCSD. Pune University, Computer Science Department, Ganeshkhind Road, Pune 411 007.