Keynotes

Jeffrey Ullman

MapReduce Algorithms

Jeffrey David "Jeff" Ullman is a computer scientist and professor at Stanford University. His textbooks on compilers (various editions are popularly known as the Dragon Book), theory of computation (also known as the Cinderella book), data structures, and databases are regarded as standards in their fields.

We begin with a sketch of how MapReduce works and how MapReduce algorithms differ from general parallel algorithms. While algorithm analysis usually centers on the serial or parallel running time of the algorithms that solve a given problem, in the MapReduce world, the critical issue is a tradeoff between interprocessor communication and the parallel running time. We examine a fundamental problem, in which the output depends on comparison of all pairs of inputs (the "all-pairs" problem), and show matching upper and lower bounds for the communication/time tradeoff. Finally, we consider special cases of all-pairs, where only a subset of the pairs of inputs are of interest; an example is the problem of similarity join.

Jim Hendler

Broad Data: Challenges on the emerging Web of data

James Hendler is the Director of the Institute for Data Exploration and Applications and the Tetherless World Professor of Computer, Web and Cognitive Sciences at RPI. He also serves as Chair of the Board of Directors of the UK’s charitable Web Science Trust. Hendler has authored over 350 technical papers in the areas of Semantic Web, open data, agent-based computing and high performance processing. One of the originators of the “Semantic Web,” Hendler was the recipient of a 1995 Fulbright Foundation Fellowship, is a former member of the US Air Force Science Advisory Board, and is a Fellow of the American Association for Artificial Intelligence, the British Computer Society, the IEEE and the AAAS. He is also the first computer scientist to serve on the Board of Reviewing editors for Science. In 2010, Hendler was named one of the 20 most innovative professors in America by Playboy magazine and was selected as an “Internet Web Expert” by the US government. In 2012, he was one of the inaugural recipients of the Strata Conference “Big Data” awards for his work on large-scale open government data . In 2013, he was appointed Open Data Advisor to New York State by Governor Cuomo, and he is a columnist and associate editor of the Big Data journal

"Big Data" usually refers to the very large datasets generated by scientists, to the many petabytes of data held by companies like Facebook and Google, and to analyzing real-time data assets like the stream of twitter messages emerging from events around the world. Key areas of interest include technologies to manage much larger datasets, technologies for the visualization and analysis of databases, cloud-based data management and datamining algorithms.

Recently, however, we have begun to see the emergence of another, and equally compelling data challenge -- that of the "Broad data" that emerges from millions and millions of raw datasets available on the World Wide Web. For broad data the new challenges that emerge include Web-scale data search and discovery, rapid and potentially ad hoc integration of datasets, visualization and analysis of only-partially modeled datasets, and issues relating to the policies for data use, reuse and combination. In this talk, we present the broad data challenge and discuss potential starting points for solutions including those arising from research in the Semantic Web area. We illustrate these approaches using data from a "meta-catalog" of over 1,000,000 open datasets that have been collected from about two hundred governments around the world.

Madhav Marathe

Resilient cities and urban analytics:
The role of big data and high performance pervasive computing

Marathe is an expert in interaction-based modeling and the simulation of large, complex biological, information, social, and technical systems. As the Director of the Network Dynamics and Simulation Science Laboratory, he leads the basic and applied research program where researchers are advancing the science and engineering of co-evolving complex networks and developing innovative computational tools based on these advances to support policy informatics. Marathe is an ACM Fellow for his contributions to high-performance computing algorithms and software environments for simulating and analyzing socio-technical systems. Marathe is also named a Fellow of IEEE for his contributions to the development of formal models and software tools for understanding socio-technical networks.

Developing practical informatics tools and decision support environments to analyze socio-technical systems that support our cities is complicated and scientifically challenging. The increased urbanization across the globe, specifically in the developing countries poses further challenges.

Recent quantitative changes in high performance and pervasive computing, Bigdata and network science have created new opportunities for collecting, integrating, analyzing and accessing information related to coupled urban socio-technical systems. Innovative information systems that leverage this new capability have already proved immensely useful.

After a brief overview, I will describe an urban analytics approach rooted in synthetic information, pervasive high performance computing and data analytics to study resilient and sustainable cities. Examples in public health epidemiology and urban transport planning and security will be used to guide the discussion. Computational challenges and directions for future research will be discussed.

Sarit Kraus

Computer Agents that Interact Proficiently with People

Sarit Kraus is a professor of computer science at the Bar-Ilan University in Israel and an adjunct professor at the University of Maryland. Kraus has made highly influential contributions to numerous subfields, most notably to multiagent systems and non-monotonic reasoning. One of her important contributions is to strategic negotiation. Her work in this area is one of the first to integrate Game Theory with Artificial Intelligence

Automated agents that interact proficiently with people can be useful in supporting or replacing people in complex tasks. The inclusion of people presents novel problems for the design of automated agents’ strategies. People do not adhere to the optimal, monolithic strategies that can be derived analytically. Their behavior is affected by a multitude of social and psychological factors. In this talk I will show how combining machine learning techniques for human modeling, human behavioral models, formal decision-making and game theory approaches enables agents to interact well with people. Applications include intelligent agents that help drivers reduce energy consumption, agents that support rehabilitation, employer-employee negotiation and agents that support a human operator in managing a team of low-cost robots in search and rescue tasks.

Rajeev Rastogi

Machine Learning @ Amazon

Rajeev Rastogi is the Director of Machine Learning at Amazon. Previously, he was the Vice President of Yahoo! Labs Bangalore, and the founding Director of the Bell Labs Research Center in Bangalore. Rajeev is active in the fields of machine learning, databases, data mining, and networking, and has served on the program committees of several conferences in these areas. He currently serves on the editorial board of the CACM, and has been an Associate editor for IEEE Transactions on Knowledge and Data Engineering in the past. He has published over 125 papers, and holds over 50 patents. Rajeev is an ACM Fellow and a Bell Labs Fellow. He received his B. Tech degree from IIT Bombay, and a PhD degree in Computer Science from the University of Texas, Austin.

In this talk, I will first provide an overview of the key Machine Learning (ML) applications we are developing at Amazon. I will then describe a matrix factorization model that we have developed for making product recommendations – the salient characteristics of the model are: (1) It uses a Bayesian approach to handle data sparsity, (2) It leverages user and item features to handle the cold start problem, and (3) It introduces latent variables to handle multiple personas associated with a user account (e.g. family members). Our experimental results with synthetic and real-life datasets show that leveraging user and item features, and incorporating user personas enables our model to provide lower RMSE and perplexity compared to baselines.
Announcements
Important Dates
Early Bird Registration
March 5, 2015
Conference
March 18-21, 2015