COMAD 2017 | CODS 2017 | |
Mar 8, 2017 (COMAD Exclusive) : IC & SR Auditorium | ||
08:45 AM - 09:00 AM | Welcome and Inaugural Remarks | X |
09:00 AM - 10:00 AM | Opening Keynote: Ramesh Bhashyam (Teradata) | X |
Challenges for an Analytical Ecosystem
Speaker: Ramesh Bhashyam
Data generation has evolved from transactional data to first behavioral data and then onto observational or sensor data. The first step in this evolution was from transactional to web log and machine generated data. Social media data increased with more devices such as smart phones that enable human interactions. Sensor data marks the next phase of this evolution. Bio: Ramesh Bhashyam has been with Teradata Corporation for over 20 years. He interest areas include query optimization and parallel execution. He was voted as a Teradata Fellow in 2010. Prior to Teradata he worked for about 6 years at Inference Corporation - an AI company in Los Angeles. Ramesh has a bachelor's in electrical engineering and a master's in computer science. Ramesh has several patents to his credit. |
10:00 AM - 11:00 AM | COMAD Premier Papers | X |
11:00 AM - 11:30 AM | Coffee Break | X |
11:30 AM - 12:30 PM | COMAD Research track - Session 1 | X |
12:30 PM - 02:00 PM | Lunch | X |
02:00 PM - 04:00 PM | X |
Estimating Network Properties via Sampling
Speaker : Anirban Dasgupta Abstract: Large networks are ubiquitous in various fields and knowledge of the values of different network properties is often important in making various scientific and business decisions. Such value estimates can also provide key insights about the current "status" and "health" of the network, as well as about possible generative processes. However, in various cases, the network is either implicit and has to be inferred, or is accessible only indirectly via queries made or experiments done. Examples include social networks formed by relations among people, various chemical interaction networks, or networks that are inaccessible due to privacy concerns. Often the large size of the network itself might be a bottleneck to arbitrary access patterns. In such settings, sampling the network judiciously and using the sample to infer the desired network property is often an effective technique. Such techniques have been much studied from both the data mining and theory perspectives. In this tutorial, we will survey a number of network sampling strategies for different property estimation tasks e.g. sizes of different subpopulations, average degree, various motif counts and other structural properties. Such sampling often has to be implemented via various random walks and crawling techniques. We will discuss some of these methods, their practical significance as well as approaches to prove theoretical guarantees and outline some of the open questions in the area. The tutorial will be mostly self-contained and accessible to anyone with a knowledge of the basics of graph theory, probability and linear algebra. Bio: Anirban Dasgupta is currently an Associate Professor of Computer Science & Engineering at IIT Gandhinagar. Prior to this, he was a Senior Scientist at Yahoo! Labs Sunnyvale. Anirban works on algorithmic problems for massive data sets, large-scale machine learning, analysis of large social networks and randomized algorithms in general. He did his undergraduate studies at IIT Kharagpur and doctoral studies at Cornell University. He has also received the Google Faculty Research Award (2015), the Cisco University grant (2016), and the ICDT Best Newcomer Award (2016). |
Papers / Presentations
04:00 PM - 04:30 PM | Coffee Break | X |
04:30 PM - 06:00 PM | COMAD Tutorial 2: Partha Prathim Talukdar (IISc): Large-scale Knowledge Harvesting | X |
Large-scale Knowledge Harvesting
Abstract: Knowledge harvesting from Web-scale text datasets has emerged as an important and active research area over the last decade or so, resulting in the automatic construction of large knowledge bases (KBs) consisting of millions of entities and relationships among them. This has the potential to revolutionize Artificial Intelligence and intelligent decision making by removing the knowledge bottleneck which has plagued systems in these areas all along. Knowledge harvesting has also seen prominent commercial adoptions in the form of the Google Knowledge Graph and the IBM Watson system. In spite of this early success, several challenging research questions spanning Machine Learning, Natural Language Processing, Crowdsourcing, Knowledge Representation, Data Management, Systems, and Large Data Analytics are wide open in this area of Web-scale knowledge harvesting. This tutorial will given an overview of relevant foundational and recent literature on this topic, with the goal of preparing the participant for further research in this exciting and emerging area. Bio: Partha Talukdar is an Assistant Professor in the Department of Computational and Data Sciences (CDS) at the Indian Institute of Science (IISc), Bangalore. Before that, he was a Postdoctoral Fellow in the Machine Learning Department at Carnegie Mellon University, working with Tom Mitchell on the NELL project. Partha received his PhD (2010) in CIS from the University of Pennsylvania, working under the supervision of Fernando Pereira, Zack Ives, and Mark Liberman. Partha is broadly interested in Machine Learning, Natural Language Processing, and Cognitive Neuroscience, with particular interest in large-scale learning and inference. Partha is a recipient of IBM Faculty Award, Google’s Focused Research Award, and Accenture Open Innovation Award. He is a co-author of a book on Graph-based Semi-Supervised Learning published by Morgan Claypool Publishers. Homepage: |
Mar 9, 2017 (COMAD - CODS Shared) : IC & SR Auditorium | ||
08:45 AM - 09:00 AM | Welcome and Inaugural Remarks | |
09:00 AM - 10:00 AM | Invited Talk: Lise Getoor (UCSC) [slides] | |
Big Graph Data Science: Making Useful Inferences from Graph Data
Abstract: Graph data (e.g., communication data, financial transaction networks,ion hierarchies, etc.) is ubiquitous. While this observational data is useful, it is usually noisy, often only partially observed, and only hints at the actual underlying social,scientific or technological structures that gave rise to the interactions. One of the challenges in big data analytics lies in being able to reason collectively this kind of extremely large, heterogeneous, incomplete, and noisy interlinked data. In this talk, I will describe some common inference patterns needed for graph dataodes), link prediction (predicting edges), and entity resolution (determining when two nodes refer to the same underlying entity). I will describe some key capabilities required to solve these problems, and finally I will describe probabilistic soft logic (PSL), a highly scalable open-source probabilistic programming language being developed within my group to solve these challenges. Bio: Lise Getoor is a professor in the Computer Science Department at the University of California, Santa Cruz. Her research areas include machine learning, data integration and reasoning under uncertainty, with an emphasis on graph and network data. She has over 200 publications and extensive experience with machine learning and probabilistic modeling methods for graph and network data. She is a Fellow of the Association for Artificial Intelligence, an elected board member of the International Machine Learning Society, serves on the board of the Computing Research Association (CRA), and was co-chair for ICML 2011. She is a recipient of an NSF Career Award and eleven best paper and best student paper awards. In 2014, she was recognized by KDD Nuggets as one of the emerging research leaders in data mining and data science based on citation and impact. She received her PhD from Stanford University in 2001, her MS from UC Berkeley, and her BS from UC Santa Barbara, and was a professor in the Computer Science Department at the University of Maryland, College Park from 2001-2013. |
10:00 AM - 11:00 AM | COMAD-CODS Premier Papers | |
11:00 AM - 11:30 AM | Coffee Break | |
11:30 AM - 12:30 PM | COMAD Research track - Session 2 (IC & SR Hall 3) | CODS Oral Papers I (IC &SR Auditorium) |
12:30 PM - 02:00 PM | Lunch | |
02:00 PM - 03:30 PM | COMAD/CODS Tutorial - Vineet Chaoji, Rajeev Rastogi and Gourav Roy (Amazon India) - Machine Learning in the Real World (IC & SR Hall 3) | CODS Oral Papers II (IC & SR Auditorium) |
COMAD Research track - Session 3 (IC & SR Auditorium) | ||
Machine Learning in the Real World
Abstract: Machine Learning (ML) has become a mature technology that is being applied to a wide range of business problems such as web search, online advertising, product recommendations, object recognition, and so on. As a result, it has become imperative for researchers and practitioners to have a fundamental understanding of ML concepts and practical knowledge of end-to-end modeling. This tutorial takes a hands-on approach to introducing the audience to machine learning. The first part of the tutorial gives a broad overview and discusses some of the key concepts within machine learning. The second part of the tutorial takes the audience through the end-to-end modeling pipeline for a real-world income prediction problem. The tutorial includes some hands-on exercises. If you want to follow along, you will need a laptop with at least 2 GB of RAM and Firefox/Google Chrome browser installed. Note that your laptop must be capable of connecting to internet via Wifi or your mobile data connection. We will be using docker containers, so specific software does not need to be installed on laptops.
Vineet Chaoji is an Applied Science Manager within the Core Machine Learning team at Amazon where he leads projects related to econometric models of customer behavior, customer targeting and malware detection. Prior to joining Amazon, he was a Scientist at Yahoo! Labs in Bangalore where his research focused on online advertising and social networks. Vineet obtained a PhD in Computer Science from Rensselaer Polytechnic Institute. He has published at top-tier data mining and database conferences and journals. Vineet has also served on the program committees of leading data and web mining conferences. |
CODS Papers
03:30 PM - 05:30 PM | Posters & Demo over Coffee (IC & SR Hall 4, Dining Hall) | |
COMAD Demos:
07:00 PM - 09:00 PM | Banquet (Westin, Velachery) | |
Mar 10, 2017 (COMAD-CODS Shared) : IC & SR Auditorium | ||
09:00 AM - 10:00 AM | Invited Talk: Srini Parthasarathy (OSU) [slides] | |
Stochastic Flow Clustering: Consolidation and Renewed Bearing
Since its introduction in the late nineties, the idea of Markov Clustering, a graph clustering approach based on the principle of simulating stochastic flows (random walks) has seen wide use -- particularly in the area of bioinformatics. In this talk I will review this basic idea and then describe several enhancements to this approach that in turn improve the quality (via regularization, and the accommodation of overlapped clustering) and speed (via sparsification, and a multi-level mechanism) of such stochastic flow algorithms so that they can be deployed on large scale problems. Results on real world interaction networks demonstrate both the efficacy and efficiency of the approach. Time permitting I will discuss some ongoing efforts on leveraging these ideas in the setting of remote sensing and flood mapping for emergency response. Bio: Srinivasan Parthasarathy is a Professor of Computer Science and Engineering and the director of the data mining research laboratory at Ohio State. His research interests span databases, data mining and high performance computing. He is among a handful of researchers nationwide to have won both the Department of Energy and National Science Foundation Career awards. He and his students have won multiple best paper awards or "best of" nominations from leading forums in the field including: SIAM Data Mining, ACM SIGKDD, VLDB, ISMB, WWW, ICDM, and ACM Bioinformatics. He chairs the SIAM data mining conference steering committee and serves on the action board of ACM TKDD and ACM DMKD --leading journals in the field. Since 2012 he also helped lead the creation of OSU's first-of-a-kind nationwide (US) undergraduate major in data analytics and serves as one of its founding directors. |
10:00 AM - 11:00 AM | COMAD Premier papers (IC & SR Hall 3) | CODS Premier Papers (IC & SR Auditorium) |
11:00 AM - 11:30 AM | Coffee Break | |
11:30 AM - 12:30 PM | COMAD Research track - Session 4 (IC & SR Hall 3) | CODS Data Challenge (IC & SR Auditorium) |
12:30 PM - 02:00 PM | Lunch | |
02:00 PM - 03:00 PM | COMAD-CODS Industry Track Invited Talk: "Trust, Security, and Compliance in a Cognitive Era" - Sriram Raghavan (IBM Research-India) | |
Trust, Security, and Compliance in a Cognitive Era
Bio : Sriram Raghavan is the Director for IBM Research in India and CTO for IBM in India/South Asia. In this role he is responsible for establishing and executing the technical agenda of IBM's India Research Lab (IRL), working closely with worldwide research labs and business units. Until 2015, Sriram was the senior manager of the Information & Analytics Department at IRL, where he established and drove new research directions at the intersection of large scale data management, text analytics, and distributed systems. Sriram has been with IBM since 2004 when he first joined the Almaden Research Center in San Jose, California, as a Research Staff Member and later served as the Manager for the Search and Analytics Research Group. Sriram is a member of the IBM Academy of Technology and alumnus of the Indian Institute of Technology (Madras) and Stanford University. |
03:00 PM - 04:30 PM | COMAD-CODS Industry Track Accepted Paper Presentations | |
04:30 PM - 05:00 PM | Coffee Break | |
05:00 PM - 06:00 PM | COMAD-CODS Industry Track Panel Discussion “Taking Science to Practice” | |
Mar 11, 2017 (CODS Exclusive) : IC & SR Auditorium | ||
09:00 AM - 10:00 AM | X | Invited Talk: Ruslan Salakhutdinov (CMU) |
Recent Advances in Deep Learning: Learning Unsupervised and Multimodal Models
Building intelligent systems that are capable of extracting meaningful representations from high-dimensional data lies at the core of solving many Artificial Intelligence tasks, including visual object recognition, information retrieval, speech perception, and language understanding. Bio: Ruslan Salakhutdinov received his PhD in computer science from the University of Toronto in 2009. After spending two post-doctoral years at the Massachusetts Institute of Technology Artificial Intelligence Lab, he joined the University of Toronto as an Assistant Professor in the Departments of Statistics and Computer Science. In 2016 he joined the Machine Learning Department at Carnegie Mellon University as an Associate Professor. Ruslan's primary interests lie in deep learning, machine learning, and large-scale optimization. He is an action editor of the Journal of Machine Learning Research and served on the senior programme committee of several learning conferences including NIPS and ICML. He is an Alfred P. Sloan Research Fellow, Microsoft Research Faculty Fellow, Canada Research Chair in Statistical Machine Learning, a recipient of the Early Researcher Award, Google Faculty Award, Nvidia's Pioneers of AI award, and is a Senior Fellow of the Canadian Institute for Advanced Research. |
10:00 AM - 11:00 AM | X | Invited Talk: Soumen Chakrabarty (IIT Bombay) [slides] |
Attention Models for Entity Resolution and Search
Bio: Soumen Chakrabarti received his B.Tech in Computer Science from the Indian Institute of Technology, Kharagpur, in 1991 and his M.S. and Ph.D. in Computer Science from the University of California, Berkeley in 1992 and 1996. At Berkeley he worked on compilers and runtime systems for running scalable parallel scientific software on message passing multiprocessors. He was a Research Staff Member at IBM Almaden Research Center from 1996 to 1999, where he worked on the Clever Web search project and led the Focused Crawling project. In 1999 he joined the Department of Computer Science and Engineering at the Indian Institute of Technology, Bombay, where he was Associate Professor during 2003--2014, and Professor since then. In 2004 he was Visiting Associate professor at Carnegie-Mellon University. During 2014--2016 he was Visiting Scientist at Google. He has published in the WWW, SIGIR, SIGKDD, EMNLP, SIGMOD, VLDB, ICDE, SODA, STOC, SPAA and other conferences as well as Scientific American, IEEE Computer, VLDB and other journals. He won the best paper award at WWW 1999. He was coauthor on the best student paper at ECML 2008. His work on keyword search in databases got the 10-year influential paper award at ICDE 2012. He won the Bhatnagar Prize in 2014. He is fellow of Indian National Academy of Engineering and of the Indian Academy of Sciences. He holds eleven patents on Web-related inventions. He is also author of one of the earliest books on Web search and mining. He has served as technical advisor to search companies and vice-chair or program committee member for WWW, SIGIR, SIGKDD, VLDB, ICDE, SODA and other conferences, and guest editor or editorial board member for Foundations and Trends in Information Retrieval, DMKD and TKDE journals. He has served as program chair for WSDM 2008 and WWW 2010. His current research interests include integrating, searching, and mining text and graph data models, exploiting types and relations in search, and dynamic personalization in graph-based retrieval and ranking models. Abstract:
We discuss two problems: linking entity mentions in a text corpus to
corresponding nodes in a knowledge graph (KG), and using this
KG-corpus combination for better entity search. Coherence models for
entity linking encourage all mentions in a document to resolve to
entities that are related in the KB. We enhance coherence with
attention, where the evidence for each candidate is based on a small
set of strong supporting relations, rather than relations to all other
entities in the document. The rationale is that document-wide support
may simply not exist for non-salient entities, or entities not densely
connected in the KB. Our system outperforms state-of-the-art systems
on the CoNLL 2003, TAC KBP 2010, 2011 and 2012 tasks. Traditionally,
question answering (QA) has focused on either side of the structure
spectrum, using either a corpus or a KG. Corpus-only QA loses the
benefit of structured KG knowledge, whereas KG-only QA ``drops off the
structure cliff'' when KG coverage fails, or the query cannot be
semantically parsed into a structured form. Only recently have corpus
and KG combined forces to improve entity search. A major challenge is
robust query interpretation, in the face of queries that range between
syntax-rich, well-formed questions (In which band was Jimmy Page
before Led Zeppelin?) and syntax-poor ``telegraphic'' Web queries
(jimmy page band before led zeppelin). We present a system that
analyzes the query using multiple convolutional networks, locates
plausible candidate entities in the KG, generates a multitude of
features from the convolution outputs and KG entity neighborhood, and
directly ranks candidate entities rather than choose structured KG
queries. our system gets the best accuracy for both syntax-poor and
syntax-rich queries. On four public query workloads amounting to over
8,000 queries in different query formats, we see 8--30% absolute
improvement in mean average precision (MAP), compared to recent
systems. |
11:00 AM - 11:30 AM | X | Coffee Break |
11:30 AM - 12:30 PM | X | Graduate Research Workshop (GRW) Posters |
12:30 PM - 02:00 PM | X | Lunch |
02:00 PM - 03:00 PM | X | GRW: Visualization Seminar by Kathirmani Sukumar (Gramener) |
03:00 PM - 04:00 PM | X | GRW: Writing Seminar by Karthik Ramaswamy (IISc) |
04:00 PM - 04:30 PM | X | Coffee Break |
04:30 PM - 06:00 PM | X | GRW: Writing Seminar by Karthik Ramaswamy (IISc) |