IKDD - Workshop

Workshop on Understanding Big Data Analytics (Inaugural workshop of IKDD) February 15th - 16th, 2013. Organizers: Gautam Shroff, TCS; B. Ravindran, IIT Madras; Kamal Karlapalem, IIIT Hyderabad; Lokendra Shastri, Infosys Venue: Mysore Infosys Campus 1. Theme:

In recent years with the burgeoning amount of data available in digitized forms Big Data Analytics has led to new business opportunities as well as thrown up extremely challenging research problems. In synchrony with the rest of the world, competencies in large-scale data management and analytics have been developed in India in both academia and industry. Till date there have been very few academia-industry events that bring together researchers from both backgrounds in order to build a larger community that can benefit enormously from the synergy. There are many aspects to big data analytics and it is not often that all are highlighted in the same forum. One of the goals of this workshop is to highlight the different aspects of big data in different sessions.

2. Organization:

The event will consist of 3 technical sessions, 2 invited talks, and a discussion session. In addition there will be student poster sessions over breaks. Each of the technical sessions will have a typically about an hour set aside for one or more talks followed by couple of hours of break out sessions discussing sub-topics related to the main theme of the session. Ideally these discussions should continue beyond the scheduled timings. The final half-day (Day 2, Session 2) will be devoted to a wrap up discussion session, during which all the session chairs will report on the discussions in the break out sessions, and we will try to come up with some action points / directions for the Indian research community at large for the near future.

3. Technical Sessions 3.1 Information extraction from online textual data – Day 1, Session 1:

Chair: L. V. Subramanian, IBM-IRL

Featured Speaker: Bing Liu, University of Illinois, Chicago.

Summary: Much of recent research in online data has been directed toward understanding content in order to provide more focused search results. This has lead to numerous techniques for deriving structured data from unstructured text and has lead to advances in many ancillary technologies like handling noisy textual data, semantic representations, information extraction from semi-structured data, etc. This session will look at the challenges in the area, recent advances and future applications.

3.2 Analytics on linked data – Day 1, Session 2:

Chair: Indrajit Bhattacharrya, IBM-IRL

Summary: Data generated by real systems usually has an underlying relational structure; similarly facts extracted from web-scale document collections, as well as data extracted from social networks, are often represented as `triples’ (e.g. RDF) codifying relations. In each of these cases such linked data is best interpreted as a graph. While traditional mining/ML ignored these dependencies for lack of models/computational capabilities, more recently researchers are developing algorithms and tools for handling linked (graph) data. Moreover, this setting has given raise to newer problems (such as link prediction) or newer interpretations of older problems (clustering/community detection). This session will focus on models and applications and paradigms that are unique to linked data.

3.3 Machine learning on large data sets – Day 2, Session 1:

Chair: B. Ravindran, IIT Madras

Featured Speaker: Srinivasan Parthasarathy, Ohio State.

Summary: The recent availability of large volumes of data has necessitated the development of new algorithms and a different mindset to learning from data. This session will look at issues ranging from distributed learning algorithms suited for map-reduce and other deployments, to advances in theoretical analysis of algorithms where we are more interested in time of execution and memory than optimal solutions.

4. Plenary Talks 4.1 Zoubin Ghahramani, Cambridge University – Day 1, FN:

Tentative topic: Information Extraction

4.2 V. S. Subrahmanian, University of Maryland, College Park – Day 1, AN:

Tentative topic: Tracking, monitoring and forecasting behaviors of global networks.

5. Student Participation

Students will be invited to submit a 2 page extended abstract of their work and approximately 10-15 will be shortlisted. The student participants will display posters of their work during sessions organized during breaks. In addition each student participant will make a five-minute spotlight presentation during a relevant technical session.

Program of the workshop Day 1: Friday, February 15th

8:45 – 9:00 Welcome and Introduction to IKDD – Gautam Shroff

9:00 – 9:50 Zoubin Ghahramani

9:50 – 10:20 Coffee Break (with student poster sessions)

10:20 – 1:00 Session 1: Information extraction from online textual data (Chair: L. V. Subramanian)

10:20-11:00 Bing Liu

11:00-11:20 Vasudeva Varma

11:20-11:40 Lipika Dey

11:40-1:00 Breakout Sessions

Entity Extraction on massive data
Using background knowledge, side information, language models
Noisy text (Uncertainty in the data)

1:00 – 2:00 Lunch

2:00 – 5:10 Session 2: Analytics on linked data (Chair: Indrajit Bhattacharya)

2:00-2:40 Soumen Chakrabarti

2:40-3:00 Srikanta Bedathur

3:00-3:20 Sumeet Agarwal

3:20 – 3:50 Coffee Break (with student poster sessions)

3:50-5:10 Breakout sessions

Scalable inference on linked data
Applications of linked data - new challenges and solutions
Dynamic Linked Data Analysis

5:10 6:00 V. S. Subrahmanian

7:00 8:00 IKDD Executive Team Meeting

Day 2: Saturday, February 16th

8:30 – 9:00 Sponsor Speak

9:00 – 12:10 Session 3: Machine learning on large data sets (Chair: B. Ravindran)

9:00-9:40 Srinivasan Parthasarathy

9:40-10:00 P. S. Sastry

10:00-10:20 Sourangshu Bhattacharya

10:20 – 10:50 Coffee Break (with student poster sessions)

10:50-12:10 Breakout Sessions (3 out of these, to be voted on Day 1)

Non-traditional data analytics
Large scale Data organization
Scalable architectures
What problems are classified as big data?
Security, privacy issues on the cloud

12:10 – 2:30 Working Lunch

Sum up of break out sessions Discussions on future directions

2:30 Conclusion