Inderjit Dhillon - U. Texas, USA

Divide and Conquer Methods for Big Data Analytics
Inderjit Dhillon is a Professor of Computer Science and Mathematics at UT Austin, where he is the Director of the ICES Center for Big Data Analytics. His main research interests are in big data, machine learning, network analysis, linear algebra and optimization. Inderjit received his B.Tech. degree from IIT Bombay, and Ph.D. from UC Berkeley. Inderjit is an IEEE Fellow, and has received several prestigious awards, including the ICES Distinguished Research Award in 2013, the SIAM Outstanding Paper Prize in 2011, the Moncrief Grand Challenge Award in 2010, the SIAM Linear Algebra Prize in 2006, the University Research Excellence Award in 2005, and the NSF Career Award in 2001. Inderjit has published over 100 journal and conference papers, and has served on the Editorial Board of the Journal of Machine Learning Research, the IEEE Transactions of Pattern Analysis and Machine Intelligence, Foundations and Trends in Machine Learning and the SIAM Journal for Matrix Analysis and Applications. Data is being generated at a tremendous rate in modern applications as diverse as internet applications, genomics, health care, energy management and social network analysis. There is a great need for developing scalable methods for analyzing these data sets. In this talk, I will present some new Divide-and-Conquer algorithms for various challenging problems in large-scale data analysis. Divide-and-Conquer has been a common paradigm that has been widely used in computer science and scientific computing, for example, in sorting, scalable computation of n-body interactions via the fast multipole method and eigenvalue computations of symmetric matrices. However, this paradigm has not been widely employed in problems that arise in machine learning. I will introduce some recent divide-and-conquer methods that we have developed for three representative problems: (i) classification using kernel support vector machines, (ii) dimensionality reduction for large-scale social network analysis, and (iii) structure learning of graphical models. For each of these problems, we develop specialized algorithms, in particular, tailored ways of "dividing" the problem into subproblems, solving the subproblems, and finally "conquering" them. It should be noted that the subproblem solutions yield localized models for analyzing the data; an intriguing question is whether the hierarchy of localized models can be combined to yield models that are not only easier to compute, but are also statistically more robust.

This is joint work with Cho-Jui Hsieh, Donghyuk Shin and Si Si.

Juliana Freire - New York University, USA

Exploring Big Urban Data
Juliana Freire is a Professor at the Department of Computer Science and Engineering at New York University. She also holds an appointment in the Courant Institute for Mathematical Science and is a faculty member at the NYU Center of Data Science. Her research interests are in large-scale data analysis, visualization, and provenance management. An important theme is Professor Freire's work is the development of data management techniques and infrastructure to address problems introduced by emerging applications. Recently, her work has focused on urban, scientific and Web data. Professor Freire is an active member of the database and Web research communities, having co-authored over 130 technical papers and holding 8 U.S. patents. She has chaired or co-chaired several workshops and conferences, and has participated as a program committee member in over 60 events. She has received several awards, including an NSF CAREER, an IBM Faculty award, and a Google Faculty Research award. Her research has been funded by grants from the National Science Foundation, Department of Energy, National Institutes of Health, University of Utah, NYU, Sloan Foundation, Betty Moore Foundation, Google, Amazon, Microsoft Research, Yahoo! and IBM. Today, 50% of the world's population lives in cities and the number will grow to 70% by 2050. Cities are thus the loci of economic activity, and will continue to be the source of many of the innovations and novel approaches to the challenges of the 21st century. At the same time, cities are also the cause of looming sustainability problems and face huge challenges around, for example, transportation, resource consumption, housing affordability, and inadequate or aging infrastructure. The large volumes of urban data currently available, along with vastly increased computing power, open up new opportunities for us to better understand cities. In fact, there are already successful stories that have resulted in better operations, informed planning, improved policies and a better quality of life for citizens. However, analyzing urban data often requires a staggering amount of work, and currently, most analysis are just confirmatory. In this talk, we will present some of our ongoing work to support exploratory analyses over large, spatio-temporal urban data. In particular, we will discuss new techniques and systems we have developed to increase the level of interactivity, scalability, and usability, with a view to empower a broad range of stakeholders, from social science researchers, policy makers, and urban residents, to freely explore the vast repositories of urban data that are emerging online.

Jure Leskovec - Stanford University, USA

Mining the Structure of Networks and Communities
Jure Leskovec is assistant professor of Computer Science at Stanford University. His research focuses on mining large social and information networks. Problems he investigates are motivated by large scale data, the Web and on-line media. This research has won several awards including a Microsoft Research Faculty Fellowship, the Alfred P. Sloan Fellowship and numerous best paper awards. Leskovec received his bachelor's degree in computer science from University of Ljubljana, Slovenia, and his PhD in in machine learning from the Carnegie Mellon University and postdoctoral training at Cornell University. You can follow him on Twitter @jure. Networks are all around us: social networks allow for information and influence flow through society, viruses become epidemics by spreading through networks, and networks of neurons allow us think and function. With the recent technological advances and the development of online social media we can study networks that were once essentially invisible to us. In this talk we discuss how computational perspectives and machine learning models can be developed to abstract networked phenomena like: How will a community or a social network evolve in the future? What are social circles a person belongs to? What kinds of network structures are there and how can they be modeled?

Pedro Domingos - U. Washington, USA

Sum-Product Networks: Deep Models with Tractable Inference
Pedro Domingos is Professor of Computer Science and Engineering at the University of Washington. His research interests are in artificial intelligence, machine learning and data mining. He received a PhD in Information and Computer Science from the University of California at Irvine, and is the author or co-author of over 200 technical publications. He is member of the editorial board of the Machine Learning journal, co-founder of the International Machine Learning Society, and past associate editor of JAIR. He was program co-chair of KDD-2003 and SRL-2009, and has served on numerous program committees. He is a AAAI Fellow, and has received several awards, including a Sloan Fellowship, an NSF CAREER Award, a Fulbright Scholarship, an IBM Faculty Award, and best paper awards at several leading conferences. Big data makes it possible in principle to learn very rich probabilistic models, but inference in them is prohibitively expensive. Since inference is typically a subroutine of learning, in practice learning such models is very hard. Sum-product networks (SPNs) are a new model class that squares this circle by providing maximum flexibility while guaranteeing tractability. In contrast to Bayesian networks and Markov random fields, SPNs can remain tractable even in the absence of conditional independence. SPNs are defined recursively: an SPN is either a univariate distribution, a product of SPNs over disjoint variables, or a weighted sum of SPNs over the same variables. It's easy to show that the partition function, all marginals and all conditional MAP states of an SPN can be computed in time linear in its size. SPNs have most tractable distributions as special cases, including hierarchical mixture models, thin junction trees, and nonrecursive probabilistic context-free grammars. I will present generative and discriminative algorithms for learning SPN weights, and an algorithm for learning SPN structure. SPNs have achieved impressive results in a wide variety of domains, including object recognition, image completion, collaborative filtering, and click prediction. Our algorithms can easily learn SPNs with many layers of latent variables, making them arguably the most powerful type of deep learning to date. (Joint work with Rob Gens and Hoifung Poon.)

Ravi Kannan - Microsoft Research India

Ravi Kannan is a Principal Researcher at Microsoft Research India, where he leads the algorithms research group. He also holds an adjunct faculty position in the Computer Science and Automation Department of the Indian Institute of Science. Before joining Microsoft, Kannan was the William K. Lanman Jr. Professor of Computer Science and Applied Mathematics at Yale University. He has also taught at MIT and CMU. Ravi Kannan's research interests include algorithms, theoretical computer science and discrete mathematics as well as optimization. His work has mainly focused on efficient algorithms for problems of a mathematical (often geometric) flavor that arise in computer science. He has worked on algorithms for integer programming and the geometry of numbers, random walks in n-space, randomized algorithms for linear algebra and learning algorithms for convex sets. He was awarded the Knuth Prize in 2011 for developing influential algorithmic techniques aimed at solving long-standing computational problems, the Fulkerson Prize in 1991 for his work on estimating the volume of convex sets, and the Distinguished Alumnus award of the Indian Institute of Technology, Bombay in 1999.

Stephen Muggleton - Imperial College, London, UK

Meta-Interpretive Learning and Program Induction
Stephen Muggleton FREng is Professor of Machine Learning at the Department of Computing at Imperial College. He has been the recipient of two Royal Academy of Engineering Research Chairs in part supported by Microsoft (2007-2012) and Syngenta (2013-2018). His recent awards include being elected Fellow of the British Computing Society (2008), Fellow of the IET (2008), Fellow of the Royal Academy of Engineering (2010) and Fellow of the Society of Biology (2011). His work concentrates on the development of theory, implementations and applications of Machine Learning, particularly in the fields of Inductive Logic Programming (ILP) and Probabilistic ILP (PILP). This includes the development of widely applied machine learning systems including the Progol ILP system. He has strong research collaborations involving applications of his Machine Learning algorithms to biological applications with colleagues at Imperial College. This talk will review work at Imperial College on the development of Meta-Interpretive Learning, a technique which supports efficient predicate invention and learning of recursive logic programs. We will illustrate how the approach has been successfully applied to the learning of regular and context-free grammars, and further extended to the learning of dyadic datalog programs. In a recent development, we have started to investigate how a generalised meta-interpreter of Stochastic Logic Programs (SLP) can be used to implement a Bayesian posterior distribution over the hypothesis space, derived using Stochastic Refinement. We show that the SLP implements a structural Bayes' prior over the hypothesis space. In this case, the posterior is updated by using the positive and negative examples to prune sub-trees from the prior. Following pruning, selection probabilities for each sub-tree are renormalised in the posterior. We show that a) sampling hypotheses from the posterior produces a high accuracy approximation to a Bayes' predictor and b) super-imposition of logic programs in the posterior a unique ProbLog program. Ongoing work appears to indicate that robust super-imposed logic programs can in some cases be learned from a single example. We relate this to the idea of One-shot human learning as well as learning from sparse datasets such as NELL.

Industrial Keynote


Shaliesh Kumar, Google

Co-occurrence Analytics-
A framework for finding "interesting" needles in "crazy" Haystacks!
Dr. Shailesh Kumar works in the Machine Intelligence Group at Google on various products involving Machine Learning, Information Retrieval, Data Mining, and Computer Vision. Prior to joining Google, he has worked as a Principal Dev. Manager at Microsoft (Bing) Hyderabad, Sr. Scientist at Yahoo! Labs Bangalore, and Principal Scientist at Fair Isaac Research in San Diego, USA. Dr. Kumar has over fifteen years of experience in applying and innovating machine learning, statistical pattern recognition, and data mining algorithms to hard prediction problems in a wide variety of domains including information retrieval, web analytics, text mining, computer vision, retail data mining, risk and fraud analytics, remote sensing, and bioinformatics. He has published over 20 conference papers, journal papers, and book chapters and holds over a dozen patents in these areas. Dr. Kumar received his PhD in Computer Engineering in 2000 (with a specialization in statistical pattern recognition and data mining) and Masters in Computer Science in 1997 (with a specialization in artificial intelligence and machine learning), both from the University of Texas at Austin, USA. He received his B.Tech. in Computer Science and Engineering from the Institute of Technology, Banaras Hindu University in 1995.

Important Dates
Research Papers
4th Oct 2013 [Extended]
Data Challenge Proposals
15th Oct 2013
Paper & Proposal Desisions
26th Dec 2013
Final Camera Ready
5th Jan 2014
21st to 23rd March 2014
General Chairs:
Gautam Shroff TCS Research, Delhi
Lokendra Shastri Samsung R&D
Program Chairs:
Balaraman Ravindran IIT Madras
Kamalakar Karlapalem IIIT Hyd
Data Challenge Chair:
Srikanta Bedathur IIIT Delhi
Local Organising Committee:
Ashwin Srinivasan IIT Delhi
Maya Ramnath IIT Delhi
LV Subramaniam IBM Research Delhi
Hiranmay Ghosh TCS Research Delhi
Puneet Agarwal TCS Research Delhi