KDD Data Science in India Workshop

Program Highlights

Panel Discussion on Generative AI for India: Possibilities, Pitfalls and Dilemmas

Moderator - Debdoot Mukherjee (Meesho)

09:30 AM - 10:30 AM (IST)

An engaging panel discussion on Generative AI for India, as experts delve into the challenges that need to be addressed and uncover opportunities for advancing the state of the art in Gen AI.

Pratyush Kumar

Ai4Bharat, Microsoft Research

Preethi Jyothi

IIT Bombay

Krishnaram Kenthapadi

Fiddler AI

Keynote By Ronen Eldan

10:30 AM - 11:30 AM (IST)

An awe-inspiring keynote by Ronen Eldan, Principal Researcher at Machine Learning Foundations group in Microsoft Research, centered on "The Power of Synthetic Datasets: From TinyStories to Phi-1".

Ronen Eldan

Microsoft Research

Abstract:
This talk presents two recent papers that show how synthetic datasets generated by large language models can enable training smaller and more efficient models for specific tasks.

The first paper introduces TinyStories, a dataset of short stories using only very simple words, generated by GPT-3.5/4. TinyStories attempts to preserve the essential elements of natural language, such as grammar, vocabulary, facts, and reasoning, while being more compact and focused than typical corpora. While language models as big as 1B parameters often struggle to produce coherent text beyond one or two sentences, we show that TinyStories can be used to train language models that are much smaller than the state-of-the-art models (below 10 million parameters), or have much simpler architectures (with only one transformer block), yet still produce fluent and consistent stories with several paragraphs that are diverse and have almost perfect grammar, and demonstrate certain reasoning capabilities.

However, most attempts to create synthetic data using LLMs usually end up in datasets which are very repetitive and seem to lack the diversity which is needed so that a model trained on them would exhibit any ability beyond the memorization of these repeating patterns. The generation of TinyStories relies on the (new) idea of attaining this diversity by injecting randomness into the prompt.

A second paper, based on the same paradigm, presents Phi-1, a new large language model for code, trained using a combination of "textbook quality" data from the web and a dataset of synthetically generated textbooks and exercises. Despite having only 1.3B parameters, it achieves pass@1 accuracy 50.6% on HumanEval and 55.5% on MBPP, surpassing models more than 10 times its size. We discuss the implications of these results for the development, analysis and research of language models, especially for low-resource or specialized domains, and the potential of synthetic datasets to improve the performance and efficiency of LLMs.

Kaleidoscope of Papers from India in Top AI Conferences

11:30 AM - 12:30 AM (IST)

A video kaleidoscope of a representative sample of relevant research papers from Indian institutions which appeared in recent editions of top AI conferences - AAAI, CVPR, ECCV, ACL, NeurIPS, WSDM, ICLR etc.

ACL 2023

Towards Leaving No Indic Language Behind: Building Monolingual Corpora, Benchmark and Models for Indic Languages

SumanthDoddapaneni (IIT Madras, AI4Bharat), Rahul Aralikatte (MILA - Quebec AI Institute, McGill University), Gowtham Ramesh (AI4Bharat), Shreya Goyal (AI4Bharat), Mitesh M. Khapra (IIT Madras, AI4Bharat), AnoopKunchukuttan (Microsoft, AI4Bharat, IIT Madras), Pratyush Kumar (Microsoft, AI4Bharat, IIT Madras)
AAAI 2023

Interactive concept bottleneck models

Kushal Chauhan (Google Research India), Rishabh Tiwari (Google Research India), Jan Freyberg (Google Research), Pradeep Shenoy (Google Research India), DJ Dvijotham (Google Research)
AAAI 2023

Clustering What Matters: Optimal Approximation for Clustering with Outliers

Akanksha Agrawal (Indian Institute Of Technology Madras), TanmayInamdar (University Of Bergen), SaketSaurabh (Institute Of Mathematical Sciences), JieXue (NYU Shanghai)
ECCV 2022

Novel Class Discovery without Forgetting

Joseph K J (IIT Hyderabad), Sujoy Paul (Google Research), Soma Biswas (Indian Institute Of Science), Piyush Rai (IIT Kanpur), Kai Han (The University Of Hong Kong), Vineeth N Balasubramanian (IIT Hyderabad)
CVPR 2023

Canonical Fields: Self-Supervised Learning of Pose-Canonicalized Neural Fields

RohithAgaram (IIIT Hyderabad), ShauryaDewan (IIIT Hyderabad), Rahul Sajnani (Brown University), Adrien Poulenard (Stanford University), Madhava Krishna (IIIT Hyderabad), Srinath Sridhar (Brown University)
Interspeech 2023

Speech Taskonomy: Which Speech Tasks are the most Predictive of fMRI Brain Activity?

Subba Reddy Oota (Inria Bordeaux, France), Veeral Agarwal (IIIT Hyderabad), MounikaMarreddy (IIIT Hyderabad), Manish Gupta (Microsoft, India), Bapi S. Raju (IIIT Hyderabad)
WSDM 2023

BLADE: Biased Neighborhood Sampling based Graph Neural Network for Directed Graphs

Srinivas Virinchi (Amazon), AnoopSaladi (Amazon)
ICLR 2023

Enhancing the Inductive Biases of Graph Neural ODE for Modeling Dynamical Systems

Suresh Bishnoi (IIT Delhi), RavinderBhattoo (IIT Delhi), Jayadeva (IIT Delhi), SayanRanu (IIT Delhi), N.M. Anoop Krishnan (IIT Delhi)
CVPR 2023

Few-Shot Referring Relationships in Videos

Yogesh Kumar (IIT Jodhpur), Anand Mishra (IIT Jodhpur)
NeurIPS 2022

Neural Estimation of Submodular Functions with Applications to Differentiable Subset Selection

Abir De (IIT Bombay), SoumenChakrabarti (IIT Bombay)
ACL 2023

Multi-Row, Multi-Span Distant Supervision For Table+Text Question Answering

Vishwajeet Kumar (IBM Research), Saneem Chemmengath (IBM Research), Yash Gupta (IIT Bombay), Jaydeep Sen (IBM Research), Samarth Bharadwaj (IBM Research), Feifei Pan (IBM Research), Soumen Chakrabarti (IIT Bombay)

Networking Roundtables

12:30 PM - 01:15 PM (IST)

An opportunity for researchers with similar interests to connect with each other in virtual breakout rooms and discuss mutual interests.

Topics:

Knowledge Representation in the Age of LLMs [ Moderator : Indrajit Bhattacharya ]
Generative AI - Beyond Text (Images, Audio, Video, Graph, 3D modeling) [ Moderators : Debdoot Mukherjee, Vikram Gupta ]
Careers in ML - Industry and Academia [ Moderators : Anirban Dasgupta, Abinaya K ]
Interaction with Kaleidoscope Authors [ Moderators : Shreyas Shetty, Shourya Roy ]
LLMs for India [ Moderators : Mitesh Khapra, Anoop Kunchukuttan ]

TIME (IST)	TITLE
09:15 AM - 09:30 AM	Welcome Address
09:30 AM - 10:30 AM	Panel Discussion on “Generative AI for India: Possibilities, Pitfalls, and Ethical Dilemmas”
10:30 AM - 11:30 AM	Keynote By Ronen Eldan
11:30 AM - 12:30 PM	A Kaleidoscope of Papers from India in Top AI Conferences
12:30 PM - 01:15 PM	Roundtable Networking Sessions

data science in india An ACM IKDD EventConjunction With KDD2023

Time : 9:30AM - 12:30PM

Virtual Event

Sunday, 6th August, 2023

09:15 AM - 01:15PM (IST)

Program Highlights

Panel Discussion on Generative AI for India: Possibilities, Pitfalls and Dilemmas

Moderator - Debdoot Mukherjee (Meesho)

09:30 AM - 10:30 AM (IST)

Ai4Bharat, Microsoft Research

IIT Bombay

Fiddler AI

Keynote By Ronen Eldan

10:30 AM - 11:30 AM (IST)

Microsoft Research

Kaleidoscope of Papers from India in Top AI Conferences

11:30 AM - 12:30 AM (IST)

ACL 2023

AAAI 2023

Kushal Chauhan (Google Research India), Rishabh Tiwari (Google Research India), Jan Freyberg (Google Research), Pradeep Shenoy (Google Research India), DJ Dvijotham (Google Research)

AAAI 2023

Akanksha Agrawal (Indian Institute Of Technology Madras), TanmayInamdar (University Of Bergen), SaketSaurabh (Institute Of Mathematical Sciences), JieXue (NYU Shanghai)

ECCV 2022

Joseph K J (IIT Hyderabad), Sujoy Paul (Google Research), Soma Biswas (Indian Institute Of Science), Piyush Rai (IIT Kanpur), Kai Han (The University Of Hong Kong), Vineeth N Balasubramanian (IIT Hyderabad)

CVPR 2023

RohithAgaram (IIIT Hyderabad), ShauryaDewan (IIIT Hyderabad), Rahul Sajnani (Brown University), Adrien Poulenard (Stanford University), Madhava Krishna (IIIT Hyderabad), Srinath Sridhar (Brown University)

Interspeech 2023

Subba Reddy Oota (Inria Bordeaux, France), Veeral Agarwal (IIIT Hyderabad), MounikaMarreddy (IIIT Hyderabad), Manish Gupta (Microsoft, India), Bapi S. Raju (IIIT Hyderabad)

WSDM 2023

Srinivas Virinchi (Amazon), AnoopSaladi (Amazon)

ICLR 2023

Suresh Bishnoi (IIT Delhi), RavinderBhattoo (IIT Delhi), Jayadeva (IIT Delhi), SayanRanu (IIT Delhi), N.M. Anoop Krishnan (IIT Delhi)

CVPR 2023

Yogesh Kumar (IIT Jodhpur), Anand Mishra (IIT Jodhpur)

NeurIPS 2022

Abir De (IIT Bombay), SoumenChakrabarti (IIT Bombay)

ACL 2023

Vishwajeet Kumar (IBM Research), Saneem Chemmengath (IBM Research), Yash Gupta (IIT Bombay), Jaydeep Sen (IBM Research), Samarth Bharadwaj (IBM Research), Feifei Pan (IBM Research), Soumen Chakrabarti (IIT Bombay)

Networking Roundtables

12:30 PM - 01:15 PM (IST)

Speakers and Panelists

Associate Professor & Principal Researcher

Univ. of Michigan & Microsoft Research, India

Principal Researcher

Microsoft Research, India

Professor

IIT Madras

Assistant Professor

IIT Madras

Professor

IIT Kharagpur

Assistant Professor

IIT Delhi

Professor

Indian Institute of Science

Associate Professor

Indian Institute of Science

Vice President

TATA CONSULTANCY SERVICES

Director, Credit Decision Science

AMERICAN EXPRESS

Distinguished Scientist

GOOGLE

Artificial Intelligence Research Engineer

FACEBOOK

schedule

TIME

TOPIC

SPEAKER

AI & ML in the Times of COVID Talk Slides

Partha P. Chakrabarti

A data-driven approach for country-level COVID Management Talk

Kamakoti V.

Misinformation and Culture: Learning from small scale qualitative data analysis

Joyojeet Pal

City-scale agent-based simulator for modelling COVID-19 spread Talk Slides

Rajesh Sundaresan

- Break -

-

The State and Fate of Linguistic Diversity and Inclusion in the NLP World Talk Slides

AI & ML in the Times of COVID

Talk

Slides

A data-driven approach for country-level COVID Management

Talk

City-scale agent-based simulator for modelling COVID-19 spread

Talk

Slides

The State and Fate of Linguistic Diversity and Inclusion in the NLP World

Talk

Slides

Object Detection in Scientific Plots

Talk

Slides

The Landscape of regularized Auto-Encoders for generative modelling: Introduction, challenges and new directions

Talk

Slides

Sketch-based Image Retrieval

Talk

Slides