Program

COMAD 2017 CODS 2017
Mar 8, 2017 (COMAD Exclusive) : IC & SR Auditorium
08:45 AM - 09:00 AM Welcome and Inaugural Remarks X
09:00 AM - 10:00 AM X
Challenges for an Analytical Ecosystem

Speaker: Ramesh Bhashyam

Abstract: Data generation has evolved from transactional data to first behavioral data and then onto observational or sensor data. The first step in this evolution was from transactional to web log and machine generated data. Social media data increased with more devices such as smart phones that enable human interactions. Sensor data marks the next phase of this evolution.

It is not data volume alone that requires attention. It is the complexity of the data model. There are strongly coupled well-structured data such as transactional data in relational tables. There are loosely coupled semi structured data such as XML and JSON data. There are finally unstructured data from free text, voice, and social media data. All these have different complexities and different models for data or lack thereof.

Deriving value from such data require combining these different sources. Unlike in the past when analysis was restricted to highly-structured transactional data analytic value is derived from combining the different sources. Data is made more meaningful when context is added to the data items. For example knowing the repair history of a part makes it possible to derive meaning from the current sensor data. More contexts add more insight. We show some use cases for such analysis in our talk.

An integrated analytic ecosystem is needed to derive value from the different forms of data. The ecosystem can be categorized as: 1) Data ingest where all data is acquired and stored. The metric here is cost per TB storage. 2) Data platform where the right platform is used to store and manage the data. Data warehouses are a part of this category. There are different types of data and whether different forms of file system are appropriate or a polymorphic file system for storing all types are some consideration in this category. 3) Analytic where machine learning and other forms of deep analytic is applied on the data. Considerations such as standalone systems versus multi-genre analytics are a part of this category.

There are many challenges in building and managing such an analytic ecosystem. There are different aspects to programming for access to cross-platform data and cross-analytic engines. The options range from virtual data frames using procedural languages like R and Python to enhancements to SQL. Query optimization is no longer about optimizing for a single platform but across platforms with more dimensions that must be optimized. Support must be provided for complex analytics from UDF in SQL to stand alone deep learning open system sources like TensorFlow. Finally acceleration technologies such as GPU that speed up execution must find a place in the ecosystem for efficient execution. There are many other challenges. We will consider some of these in our talk.

Bio: Ramesh Bhashyam has been with Teradata Corporation for over 20 years. He interest areas include query optimization and parallel execution. He was voted as a Teradata Fellow in 2010. Prior to Teradata he worked for about 6 years at Inference Corporation - an AI company in Los Angeles. Ramesh has a bachelor's in electrical engineering and a master's in computer science. Ramesh has several patents to his credit.

10:00 AM - 11:00 AM X
  • 10:00-10:30AM - Generic Keyword Search over XML Data - Manoj Agarwal, Krithi Ramamritham and Prashant Agarwal (from EDBT 2016)
  • 10:31-11:00AM - Extracting Equivalent SQL From Imperative Code in Database Applications - Venkatesh Emani, Karthik Ramachandra, S. Sudarshan and Subhro Bhattacharya (from SIGMOD 2016)
11:00 AM - 11:30 AM Coffee Break X
11:30 AM - 12:30 PM X
  • 11:30-12:00PM - Distributed Data Aggregation with Privacy Preservation at Endpoint - Snehkumar Shahani, Jibi Abraham and R Venkateswaran
  • 12:01-12:30PM - Answering Conjunctive Queries using Sources with Access Restrictions - Prakash Ramanan
12:30 PM - 02:00 PM Lunch X
02:00 PM - 04:00 PM X
Estimating Network Properties via Sampling

Speaker : Anirban Dasgupta

Abstract: Large networks are ubiquitous in various fields and knowledge of the values of different network properties is often important in making various scientific and business decisions. Such value estimates can also provide key insights about the current "status" and "health" of the network, as well as about possible generative processes. However, in various cases, the network is either implicit and has to be inferred, or is accessible only indirectly via queries made or experiments done. Examples include social networks formed by relations among people, various chemical interaction networks, or networks that are inaccessible due to privacy concerns. Often the large size of the network itself might be a bottleneck to arbitrary access patterns. In such settings, sampling the network judiciously and using the sample to infer the desired network property is often an effective technique. Such techniques have been much studied from both the data mining and theory perspectives.

In this tutorial, we will survey a number of network sampling strategies for different property estimation tasks e.g. sizes of different subpopulations, average degree, various motif counts and other structural properties. Such sampling often has to be implemented via various random walks and crawling techniques. We will discuss some of these methods, their practical significance as well as approaches to prove theoretical guarantees and outline some of the open questions in the area. The tutorial will be mostly self-contained and accessible to anyone with a knowledge of the basics of graph theory, probability and linear algebra.

Bio: Anirban Dasgupta is currently an Associate Professor of Computer Science & Engineering at IIT Gandhinagar. Prior to this, he was a Senior Scientist at Yahoo! Labs Sunnyvale. Anirban works on algorithmic problems for massive data sets, large-scale machine learning, analysis of large social networks and randomized algorithms in general. He did his undergraduate studies at IIT Kharagpur and doctoral studies at Cornell University. He has also received the Google Faculty Research Award (2015), the Cisco University grant (2016), and the ICDT Best Newcomer Award (2016).

Papers / Presentations
  • 02:00-02:30 PM - ATHENA: An Ontology-Driven System for Natural Language Querying over Relational Data Stores, Diptikalyan Saha, Avrilia Floratou, Karthik Sankaranarayanan, Umar Farooq Minhas, Ashish Mittal and Fatma Ozcan (from VLDB 2016)
  • 02:31-03:00 PM - Premier Demos - Short Presentations 15 min each
  • 03:01-3:30 PM - Industry Presentation 1: Oracle Database in the Cloud, presented by Sharad Lal
  • 03:31-04:00 PM - Industry Presentation 2: Oracle Stream Analytics presented by Krishnaprem Bhatia
04:00 PM - 04:30 PM Coffee Break X
04:30 PM - 06:00 PM X
Large-scale Knowledge Harvesting

Abstract: Knowledge harvesting from Web-scale text datasets has emerged as an important and active research area over the last decade or so, resulting in the automatic construction of large knowledge bases (KBs) consisting of millions of entities and relationships among them. This has the potential to revolutionize Artificial Intelligence and intelligent decision making by removing the knowledge bottleneck which has plagued systems in these areas all along. Knowledge harvesting has also seen prominent commercial adoptions in the form of the Google Knowledge Graph and the IBM Watson system. In spite of this early success, several challenging research questions spanning Machine Learning, Natural Language Processing, Crowdsourcing, Knowledge Representation, Data Management, Systems, and Large Data Analytics are wide open in this area of Web-scale knowledge harvesting. This tutorial will given an overview of relevant foundational and recent literature on this topic, with the goal of preparing the participant for further research in this exciting and emerging area.

Bio: Partha Talukdar is an Assistant Professor in the Department of Computational and Data Sciences (CDS) at the Indian Institute of Science (IISc), Bangalore. Before that, he was a Postdoctoral Fellow in the Machine Learning Department at Carnegie Mellon University, working with Tom Mitchell on the NELL project. Partha received his PhD (2010) in CIS from the University of Pennsylvania, working under the supervision of Fernando Pereira, Zack Ives, and Mark Liberman. Partha is broadly interested in Machine Learning, Natural Language Processing, and Cognitive Neuroscience, with particular interest in large-scale learning and inference. Partha is a recipient of IBM Faculty Award, Google’s Focused Research Award, and Accenture Open Innovation Award. He is a co-author of a book on Graph-based Semi-Supervised Learning published by Morgan Claypool Publishers. Homepage: http://talukdar.net

Mar 9, 2017 (COMAD - CODS Shared) : IC & SR Auditorium
08:45 AM - 09:00 AM Welcome and Inaugural Remarks
09:00 AM - 10:00 AM slides]
Big Graph Data Science: Making Useful Inferences from Graph Data

Abstract: Graph data (e.g., communication data, financial transaction networks,ion hierarchies, etc.) is ubiquitous. While this observational data is useful, it is usually noisy, often only partially observed, and only hints at the actual underlying social,scientific or technological structures that gave rise to the interactions. One of the challenges in big data analytics lies in being able to reason collectively this kind of extremely large, heterogeneous, incomplete, and noisy interlinked data. In this talk, I will describe some common inference patterns needed for graph dataodes), link prediction (predicting edges), and entity resolution (determining when two nodes refer to the same underlying entity). I will describe some key capabilities required to solve these problems, and finally I will describe probabilistic soft logic (PSL), a highly scalable open-source probabilistic programming language being developed within my group to solve these challenges.

Bio: Lise Getoor is a professor in the Computer Science Department at the University of California, Santa Cruz. Her research areas include machine learning, data integration and reasoning under uncertainty, with an emphasis on graph and network data. She has over 200 publications and extensive experience with machine learning and probabilistic modeling methods for graph and network data. She is a Fellow of the Association for Artificial Intelligence, an elected board member of the International Machine Learning Society, serves on the board of the Computing Research Association (CRA), and was co-chair for ICML 2011. She is a recipient of an NSF Career Award and eleven best paper and best student paper awards. In 2014, she was recognized by KDD Nuggets as one of the emerging research leaders in data mining and data science based on citation and impact. She received her PhD from Stanford University in 2001, her MS from UC Berkeley, and her BS from UC Santa Barbara, and was a professor in the Computer Science Department at the University of Maryland, College Park from 2001-2013.

10:00 AM - 11:00 AM
  • 10:00-10:20AM - Abir De, Isabel Valera, Niloy Ganguly, Sourangshu Bhattacharya and Manuel Gomez Rodriguez. Learning and Forecasting Opinion Dynamics in Social Networks
  • 10:21-10:40AM - Srinivas Karthik, Jayant Haritsa, Sreyash Kenkre and Vinayaka Pandit. Platform Independent Robust Query Processing
  • 10:41-11:00AM - Abhishek Laddha and Arjun Mukherjee. Extracting Aspect Specific Opinion Expression
11:00 AM - 11:30 AM Coffee Break
11:30 AM - 12:30 PM
  • 11:30-12:00 PM - Relationship Queries on Large Graphs - Puneet Agarwal, Maya Ramanath and Gautam Shroff
  • 12:01-12:30 PM - Extracting Temporal Relations from Dependency-trees Using Edge Embeddings Gautam Singh and Nishtha Madaan
  • 11:30-11:50AM - Monidipa Das and Soumya Ghosh. Spatio-temporal Autocorrelation Analysis for Regional Land-cover Change Detection from Remote Sensing Data
  • 11:51-12:10PM - Prakhar Ojha and Partha Talukdar. KGEval: Estimating Accuracy of Automatically Constructed Knowledge Graphs
  • 12:11-12:30PM - Vanika Singhal and Angshul Majumdar. Noisy Deep Dictionary Learning
12:30 PM - 02:00 PM Lunch
02:00 PM - 03:30 PM
Machine Learning in the Real World

Abstract: Machine Learning (ML) has become a mature technology that is being applied to a wide range of business problems such as web search, online advertising, product recommendations, object recognition, and so on. As a result, it has become imperative for researchers and practitioners to have a fundamental understanding of ML concepts and practical knowledge of end-to-end modeling. This tutorial takes a hands-on approach to introducing the audience to machine learning. The first part of the tutorial gives a broad overview and discusses some of the key concepts within machine learning. The second part of the tutorial takes the audience through the end-to-end modeling pipeline for a real-world income prediction problem. The tutorial includes some hands-on exercises. If you want to follow along, you will need a laptop with at least 2 GB of RAM and Firefox/Google Chrome browser installed. Note that your laptop must be capable of connecting to internet via Wifi or your mobile data connection. We will be using docker containers, so specific software does not need to be installed on laptops.

Bios: Vineet Chaoji is an Applied Science Manager within the Core Machine Learning team at Amazon where he leads projects related to econometric models of customer behavior, customer targeting and malware detection. Prior to joining Amazon, he was a Scientist at Yahoo! Labs in Bangalore where his research focused on online advertising and social networks. Vineet obtained a PhD in Computer Science from Rensselaer Polytechnic Institute. He has published at top-tier data mining and database conferences and journals. Vineet has also served on the program committees of leading data and web mining conferences.
Rajeev Rastogi is the Director of Machine Learning at Amazon where he directs the development of machine learning platforms and applications such as product classification, product recommendations, customer targeting, and deals ranking. Previously, he was the Vice President of Yahoo! Labs in Bangalore where he was responsible for research programs impacting Yahoo!s web search and online advertising products. He was named a Bell Labs Fellow in 2003 for his contributions to Lucent's networking products while he was at Bell Labs Research in Murray Hill, New Jersey. Rajeev was named an ACM Fellow in 2012 for his contributions to large-scale data analysis and management. He has published over 100 papers in top-tier international conferences and 33 papers in international journals. Rajeev has also been a prolific inventor with 57 issued US Patents. He is currently a member of the News editorial board of the CACM, and was previously an Associate editor for TKDE. He has served on over 50 program committees of the leading database and data mining conferences, and was a Program Co-chair for the Applied Data Science track of the KDD conference in 2016, the CIKM conference in 2013 and the ICDM conference in 2005.
Gourav Roy is a Senior Software Engineer in the Core Machine Learning team at Amazon where he builds scalable machine learning platforms and applications. He is interested in streaming approximate algorithms and distributed systems. His work on streaming anomaly detection recently got accepted at the International Conference on Machine Learning. Prior to joining Amazon, he got a bachelors degree in Computer Science at BIT Mesra.

CODS Papers
  • 02:00-02:20PM - Deepali Joshi, Nikhil Supekar, Rashi Chauhan and Manasi Patwardhan. Modeling and detecting change in user behavior through his social media posting using cluster analysis
  • 02:21-02:40PM - Protim Bhattacharjee, Shisagnee Banerjee, Manoj Gulati, Shobha Sundar Ram and Angshul Majumdar. Supervised Analysis Dictionary Learning: Application in Consumer Electronics Appliance Classification
  • 02:41-03:00PM - Deshana Desai, Harsh Nisar and Rishabh Bhardwaj. Role of Temporal Diversity in Inferring Social Ties Based on Spatio-Temporal Data
COMAD Paper
  • 03:01-03:30 PM : Topic-Wise Segmentation of Slide Decks - Monika Gupta and Vibha Sinha
03:30 PM - 05:30 PM
COMAD Demos:
  • Keyword Search on microblog Data Streams: Finding Contextual Messages in Real Time - Manoj K Agarwal, Divyam Bansal, Mridul Garg, Krithi Ramamritham (EDBT 2016)
  • GARUDA: A System for Large-Scale Mining of Statistically Significant Connected Subgraphs - Satyajit Bhadange, Akhil Arora, Arnab Bhattacharya (VLDB 2016)
  • Partial Marking for Automated Grading of SQL Queries - Bikash Chandra, Mathew Joseph, Bharath Radhakrishnan, Shreevidhya Acharya, S. Sudarshan (VLDB 2016)
  • GeoScop: A System for Visual Exploration of Geo-social Clusters - Jasper Little, Shiladitya Pande, Shivam Srivastava, Sayan Ranu
  • DBridge: Translating Imperative Code to SQL - K. Venkatesh Emani, Tejas Deshpande, Karthik Ramachandra, S. Sudarshan
CODS Posters
  1. Monidipa Das and Soumya Ghosh. Spatio-temporal Autocorrelation Analysis for Regional Land-cover Change Detection from Remote Sensing Data
  2. Prakhar Ojha and Partha Talukdar. KGEval: Estimating Accuracy of Automatically Constructed Knowledge Graphs
  3. Vanika Singhal and Angshul Majumdar. Noisy Deep Dictionary Learning
  4. Deepali Joshi, Nikhil Supekar, Rashi Chauhan and Manasi Patwardhan. Modeling and detecting change in user behavior through his social media posting using cluster analysis
  5. Protim Bhattacharjee, Shisagnee Banerjee, Manoj Gulati, Shobha Sundar Ram and Angshul Majumdar. Supervised Analysis Dictionary Learning: Application in Consumer Electronics Appliance Classification
  6. Deshana Desai, Harsh Nisar and Rishabh Bhardwaj. Role of Temporal Diversity in Inferring Social Ties Based on Spatio-Temporal Data
  7. Moumita Sinha, Harsh Jhamtani, Sanket Mehta and Balaji Vasan Srinivasan. Modelling End of Online Session from Streaming Data
  8. Durga Prasad Muni, Suman Roy, Yeung Tack Yan John John Lew Chiang, Antonie Jean-Marie Viallet and Navin Budhiraja. Recommending resolutions of ITIL services tickets using Deep Neural Network
  9. Amod Aggarwal, Dhaval Patel and Markand Oza. On Discovery of permanent land cover changes using time series segmentation approach
  10. Nanda Dulal Jana, Jaya Sil and Swagatam Das. Protein Structure Optimization in 3D AB off-lattice model using Biogeography Based Optimization with Chaotic Mutation
  11. Arpit Merchant and Navjyoti Singh. Hybrid Trust-Aware Model for Personalized Top-N Recommendation
  12. V Tejaswi, P V Bindu and P Santhi Thilagam. Target Specific Influence Maximization: An Approach To Maximize Adoption In Labeled Social Networks
  13. Ishan Sahu and Debapriyo Majumdar. Detecting Factual and Non-Factual Content in News Articles
  14. Akshit Trehan, Sumit Khurana and Amitabha Bagchi. A user activity-based measurement study characterizing and classifying Stack Exchange communities across multiple domains
  15. Rama Syamala Sreepada and Bidyut Kr. Patra. Multi-criteria Recommendations through Preference Learning
  16. Shreshtha Mundra, Manjira Sinha, Sandya Mannarswamy, Anirban Sen and Shourya Roy. Embedding Learning of Figurative Phrases for Emotion Classification in Micro-Blog Texts
07:00 PM - 09:00 PM Banquet (Westin, Velachery)
Mar 10, 2017 (COMAD-CODS Shared) : IC & SR Auditorium
09:00 AM - 10:00 AM slides]
Stochastic Flow Clustering: Consolidation and Renewed Bearing

Since its introduction in the late nineties, the idea of Markov Clustering, a graph clustering approach based on the principle of simulating stochastic flows (random walks) has seen wide use -- particularly in the area of bioinformatics. In this talk I will review this basic idea and then describe several enhancements to this approach that in turn improve the quality (via regularization, and the accommodation of overlapped clustering) and speed (via sparsification, and a multi-level mechanism) of such stochastic flow algorithms so that they can be deployed on large scale problems. Results on real world interaction networks demonstrate both the efficacy and efficiency of the approach. Time permitting I will discuss some ongoing efforts on leveraging these ideas in the setting of remote sensing and flood mapping for emergency response.
Joint work with Peter Jacobs, Albert Liang, Venu Satuluri and Yu-Keng Shih

Bio: Srinivasan Parthasarathy is a Professor of Computer Science and Engineering and the director of the data mining research laboratory at Ohio State. His research interests span databases, data mining and high performance computing. He is among a handful of researchers nationwide to have won both the Department of Energy and National Science Foundation Career awards. He and his students have won multiple best paper awards or "best of" nominations from leading forums in the field including: SIAM Data Mining, ACM SIGKDD, VLDB, ISMB, WWW, ICDM, and ACM Bioinformatics. He chairs the SIAM data mining conference steering committee and serves on the action board of ACM TKDD and ACM DMKD --leading journals in the field. Since 2012 he also helped lead the creation of OSU's first-of-a-kind nationwide (US) undergraduate major in data analytics and serves as one of its founding directors.

10:00 AM - 11:00 AM
  • 10:00-10:30 AM - Galhotra, Sainyam and Arora, Akhil and Roy, Shourya,Holistic Influence Maximization: Combining Scalability and Efficiency with Opinion-Aware (from SIGMOD 2016)
  • 10:31-11:00 AM - An Optimal Algorithm for Heavy Hitters in Insertion Streams and Related Problems, Arnab Bhattacharyya, Palash Dey and David Woodruff (from PODS 2016).
  • 10:00-10:20AM - Lavanya Sita Tekumalla and Chiranjib Bhattacharyya. Copula-HDP-HMM: Non-parametric Modeling of Temporal Multivariate Data for I/O Efficient Bulk Cache Preloading
  • 10:21-10:40AM - Ankit Anand, Aditya Grover, Mausam and Parag Singla. Contextual Symmetries in Probabilistic Graphical Models
  • 10:41-11:00AM - Janarthanan Rajendran, Mitesh M Khapra, Sarath Chandar and Balaraman Ravindran. Bridge Correlational Neural Networks for Multilingual Multimodal Representation Learning
11:00 AM - 11:30 AM Coffee Break
11:30 AM - 12:30 PM
  • 11:30-12:00 PM - Coupling Multi-Criteria Decision Making and Ontologies for Recommending DBMS - Lahcene Brahimi, Ladjel Bellatreche and Yassine Ouhammou
  • 12:01-12:30 PM - An Effective POI Recommendation in various Cold-start Scenarios - Pramit Mazumdar, Bidyut Kr. Patra and Korra Sathya Babu
  • 11:30-11:45 AM - Overview, motivation and a summary of the responses obtained -- Data Challenge Coordinator
  • 11:46-12:00 PM - Presentation by Team "Nautilus" (10min+5min QA)
  • 12:01-12:15 PM - Presentation by Team "flytxt_datasciences_iitd" (10min+5min QA)
  • 12:16-12:30 PM - Presentation by Team "VITians" (10min+5min QA)
12:30 PM - 02:00 PM Lunch
02:00 PM - 03:00 PM
Trust, Security, and Compliance in a Cognitive Era

Bio : Sriram Raghavan is the Director for IBM Research in India and CTO for IBM in India/South Asia. In this role he is responsible for establishing and executing the technical agenda of IBM's India Research Lab (IRL), working closely with worldwide research labs and business units. Until 2015, Sriram was the senior manager of the Information & Analytics Department at IRL, where he established and drove new research directions at the intersection of large scale data management, text analytics, and distributed systems. Sriram has been with IBM since 2004 when he first joined the Almaden Research Center in San Jose, California, as a Research Staff Member and later served as the Manager for the Search and Analytics Research Group. Sriram is a member of the IBM Academy of Technology and alumnus of the Indian Institute of Technology (Madras) and Stanford University.

03:00 PM - 04:30 PM
  • 03:00-03:15PM - Akhil PM; "Aspects and Sentiment Mining Using Neighbor Graphs of Word Vectors from User Reviews" [MobMe Wireless Solutions Pvt. Ltd.]
  • 03:16-03:30PM - Shubham Atreja and Anjali Singh; "Entity Extraction on Real Estate Twitter Data" [IBM/IIT Delhi]
  • 03:31-03:45PM - Himanshu S. Bhatt, Manjira Sinha, Balaji Peddamuthu and Shourya Roy; "Transfer Learning for Cross-domain Topic Classification" [Xerox Research]
  • 03:46-04:00PM - Suman Roy, Dipanjan Dutta, Durga Prasad Muni and Adrija Bhattacharya; "Fuzzy Prediction of QoS for IT Maintenance Tickets" [Infosys/University of Calcutta]
  • 04:01-04:15PM - Karamjit Singh, Garima Gupta, Gautam Shroff and Puneet Agarwal; "Minimally-Supervised Federated Attribute Fusion" [TCS Research]
  • 04:16-04:30PM - Paridhi Jain, Sandya Mannarswamy, Preethi Raajaratnam and Shourya Roy; "Mining Medical Literature to Extract Influence Factors For Personalizing Clinical Care Pathways" [Xerox Research]
04:30 PM - 05:00 PM Coffee Break
05:00 PM - 06:00 PM COMAD-CODS Industry Track Panel Discussion “Taking Science to Practice”
Mar 11, 2017 (CODS Exclusive) : IC & SR Auditorium
09:00 AM - 10:00 AM X
Recent Advances in Deep Learning: Learning Unsupervised and Multimodal Models

Abstract: Building intelligent systems that are capable of extracting meaningful representations from high-dimensional data lies at the core of solving many Artificial Intelligence tasks, including visual object recognition, information retrieval, speech perception, and language understanding.

In this talk I will first introduce a broad class of deep learning models and show that they can learn useful hierarchical representations from large volumes of high-dimensional data with applications in information retrieval, object recognition, and speech perception. I will next introduce deep models that are capable of extracting a unified representation that fuses together multiple data modalities. In particular, I will introduce models that can generate natural language descriptions (captions) of images, as well as generate images from captions using attention mechanism. Finally, I will discuss an approach for unsupervised learning of a generic, distributed sentence encoder, as well as introduce multiplicative and fine-grained gating mechanisms with application to question/answering systems and reading comprehension.

Bio: Ruslan Salakhutdinov received his PhD in computer science from the University of Toronto in 2009. After spending two post-doctoral years at the Massachusetts Institute of Technology Artificial Intelligence Lab, he joined the University of Toronto as an Assistant Professor in the Departments of Statistics and Computer Science. In 2016 he joined the Machine Learning Department at Carnegie Mellon University as an Associate Professor. Ruslan's primary interests lie in deep learning, machine learning, and large-scale optimization. He is an action editor of the Journal of Machine Learning Research and served on the senior programme committee of several learning conferences including NIPS and ICML. He is an Alfred P. Sloan Research Fellow, Microsoft Research Faculty Fellow, Canada Research Chair in Statistical Machine Learning, a recipient of the Early Researcher Award, Google Faculty Award, Nvidia's Pioneers of AI award, and is a Senior Fellow of the Canadian Institute for Advanced Research.

10:00 AM - 11:00 AM X slides]
Attention Models for Entity Resolution and Search

Bio: Soumen Chakrabarti received his B.Tech in Computer Science from the Indian Institute of Technology, Kharagpur, in 1991 and his M.S. and Ph.D. in Computer Science from the University of California, Berkeley in 1992 and 1996. At Berkeley he worked on compilers and runtime systems for running scalable parallel scientific software on message passing multiprocessors. He was a Research Staff Member at IBM Almaden Research Center from 1996 to 1999, where he worked on the Clever Web search project and led the Focused Crawling project. In 1999 he joined the Department of Computer Science and Engineering at the Indian Institute of Technology, Bombay, where he was Associate Professor during 2003--2014, and Professor since then. In 2004 he was Visiting Associate professor at Carnegie-Mellon University. During 2014--2016 he was Visiting Scientist at Google. He has published in the WWW, SIGIR, SIGKDD, EMNLP, SIGMOD, VLDB, ICDE, SODA, STOC, SPAA and other conferences as well as Scientific American, IEEE Computer, VLDB and other journals. He won the best paper award at WWW 1999. He was coauthor on the best student paper at ECML 2008. His work on keyword search in databases got the 10-year influential paper award at ICDE 2012. He won the Bhatnagar Prize in 2014. He is fellow of Indian National Academy of Engineering and of the Indian Academy of Sciences. He holds eleven patents on Web-related inventions. He is also author of one of the earliest books on Web search and mining. He has served as technical advisor to search companies and vice-chair or program committee member for WWW, SIGIR, SIGKDD, VLDB, ICDE, SODA and other conferences, and guest editor or editorial board member for Foundations and Trends in Information Retrieval, DMKD and TKDE journals. He has served as program chair for WSDM 2008 and WWW 2010. His current research interests include integrating, searching, and mining text and graph data models, exploiting types and relations in search, and dynamic personalization in graph-based retrieval and ranking models.

Abstract: We discuss two problems: linking entity mentions in a text corpus to corresponding nodes in a knowledge graph (KG), and using this KG-corpus combination for better entity search. Coherence models for entity linking encourage all mentions in a document to resolve to entities that are related in the KB. We enhance coherence with attention, where the evidence for each candidate is based on a small set of strong supporting relations, rather than relations to all other entities in the document. The rationale is that document-wide support may simply not exist for non-salient entities, or entities not densely connected in the KB. Our system outperforms state-of-the-art systems on the CoNLL 2003, TAC KBP 2010, 2011 and 2012 tasks. Traditionally, question answering (QA) has focused on either side of the structure spectrum, using either a corpus or a KG. Corpus-only QA loses the benefit of structured KG knowledge, whereas KG-only QA ``drops off the structure cliff'' when KG coverage fails, or the query cannot be semantically parsed into a structured form. Only recently have corpus and KG combined forces to improve entity search. A major challenge is robust query interpretation, in the face of queries that range between syntax-rich, well-formed questions (In which band was Jimmy Page before Led Zeppelin?) and syntax-poor ``telegraphic'' Web queries (jimmy page band before led zeppelin). We present a system that analyzes the query using multiple convolutional networks, locates plausible candidate entities in the KG, generates a multitude of features from the convolution outputs and KG entity neighborhood, and directly ranks candidate entities rather than choose structured KG queries. our system gets the best accuracy for both syntax-poor and syntax-rich queries. On four public query workloads amounting to over 8,000 queries in different query formats, we see 8--30% absolute improvement in mean average precision (MAP), compared to recent systems.
Collaborators: Amir Globerson, Fernando Pereira, Nevena Lazic, Mandar Joshi, Uma Sawant, Ganesh Ramakrishnan, Amarnag Subramaniam, Michael Ringgaard.

11:00 AM - 11:30 AM X Coffee Break
11:30 AM - 12:30 PM X
  1. Sujoy Chatterjee : Dependent Judgment Analysis based on Crowdsourced Opinions.
  2. Chandra Sekhar : A Novel Trust Centric Epidemic Model Based Evaluation of Malware Propagation in Twitter.
  3. Sadu Chiranjeevi : Graph Similarity Self-Join.
  4. Nihal Jain and Amit Awekar : FP-Tree Based Disjunctive Itemset Mining Algorithms.
  5. Rizul Aggarwal, Bhaskaran Raman, Deepthi Chander and Divya Bansal : Enhancing Localization Accuracy of Road Anomalies Using Crowdsourcing.
  6. Monidipa Das : Spatio-temporal Prediction of Time Series Data: An Approach Based on Spatial Bayesian Network (SpaBN).
  7. Sujata Sinha, Krishna Prasad Miyapuram and Kamalakar Karlapalem : Biclustering Text-mined Neuroimaging Data to Understand Human Brain Functions.
  8. Satheesh Kumar : Computational Approaches For Improving Protein Function Prediction.
  9. Arun Verma,Yogesh Simmhan and Nandyala Hemachandra : Scalable Online Analytics for IoT Applications using Big Data Platforms.
  10. Madhav Mantri and Abhinav Agarwalla : Winning Team Prediction using Weighted PageRank.
12:30 PM - 02:00 PM X Lunch
02:00 PM - 03:00 PM X GRW: Visualization Seminar by Kathirmani Sukumar (Gramener)
03:00 PM - 04:00 PM X GRW: Writing Seminar by Karthik Ramaswamy (IISc)
04:00 PM - 04:30 PM X Coffee Break
04:30 PM - 06:00 PM X GRW: Writing Seminar by Karthik Ramaswamy (IISc)