The Data Science Faculty Seminar Series is a part of the Moore Sloan Data Science Environment. It features events focused on bringing together faculty, postdocs, and PhD students within NYU’s Data Science initiative to interact. Each event will feature a talk given by one of the Faculty members.

Upcoming Events

9th Data Science Showcase
Tuesday, Mar. 7, 2017, 4:30 – 7 PM

Kimmel Center, Colloquium Room (5th Floor)
60 Washington Square S
New York, NY 10012

Title: Data science for Educational Research, Industry, and Policy

Abstract: This showcase addresses the current and future roles of data science in the education sector. The showcase will consist of short presentations by a panel of four leaders from industry and academia, who will provide an overview of current initiatives in their research domains and identify core opportunities for data science to move the field forward. The panel presentations will be followed by a moderated discussion with the Moore-Sloan community.


Ryan Baker is an Associate Professor at the Graduate School of Education at the University of Pennsylvania. He was the founding president of the International Educational Data Mining Society, and currently serves as co-lead of the Big Data in Education spoke of the National Science Foundation’s Northeast Big Data Hub. His research draws on data mining, learning analytics, and human–computer interaction, to study how students respond to and learn from educational software.

Saad Khan is a Managing Senior Research Scientist at Educational Testing Service, where he leads research focused on creating highly innovative forms of educational assessments and learning systems. Prior to joining ETS he was at SRI where he led design and development of advanced training systems that can adapt to both changing pedagogical objectives and learners’ behavior. His technical background is in computer vision and machine learning, and his research interests include cognitive computing, deep learning, behavioral analytics, affective computing, and multimodal data fusion.

Jeff Olson is Vice President and Chief Data Officer at the College Board. He is responsible for ensuring that data drives the services, outreach, and intervention strategies provided by the College Board. Prior to joining the College Board in 2013, he led data science and market intelligence for Kaplan Test Prep, informing the development of products and services and helping shape organizational strategy. His surveys and commentary have appeared in many national publications, including The New York Times, The Washington Post, The Wall Street Journal, Los Angeles Times, USA Today, and Newsweek.

Jan L. Plass is the Paulette Goddard chair in Digital Media and Learning Sciences at NYU Steinhardt, where he co-directs the Games for Learning Institute and is the founding director of the CREATE Consortium for Research and Evaluation of Advanced Technology in Education. His research is at the intersection of learning science, cognitive sciences, and design, and seeks to enhance the design and effectiveness of visual environments. His current focus is on cognitive and emotional aspects of information design and interaction design of simulations and educational games for science education and second language acquisition.

Past Events

8th Data Science Showcase
Wednesday, Mar. 9, 2016, 4:30 – 7 PM

Kaufman Management Center, Stern School of Business, RM KMC 5-50
44 West 4th Street,
New York, NY 10012

Zaid Harchaoui, on the history of AI research and its public perception, followed by a panel discussion on the future of AI with Ernest Davis, Vasant Dhar, Yann LeCun, and Gary Marcus

7th Data Science Showcase
 Monday, Oct. 26, 2015,  4:30 – 7 PM

Kimmel Center Rm 905/907
60 Washington Sq. South,
New York, NY 10010

Daniela Huppenkothen, Moore-Sloan Postdoctoral Fellow at the NYU, Center for Data Science

Talk Title:  Exploring the Violent Universe: Data Science for X-ray Astronomy

Abstract:  X-ray astronomy, the study of the universe at very short wavelengths, helps us unravel physical laws under the most extreme conditions known: the densest stars, the brightest explosions, the strongest gravity, and magnetic fields, the hottest plasmas. The data sets that we have, taken with X-ray telescopes on board a fleet of space satellites, are abundant and diverse, and can be used to tackle some of the mysteries surrounding black holes, neutron stars, and stellar explosions. All of these types of sources are highly variable on timescales ranging from milliseconds to decades, and studying how their brightness changes with time is a prime tool to help us understand the underlying physics. In this talk, I will give a brief overview of the fundamental processes of nature we study with this kind of data, and show a few snapshots of projects on neutron stars and black holes I am working on at the CDS.

Bio: Daniela Huppenkothen is a Moore-Sloan Postdoctoral Fellow at the NYU Center for Data Science. She is primarily interested in time series methods for astronomy; so far, her work has focussed on developing methods for characterising variability in fast transient events (in particular magnetar bursts and black hole binary systems) in data from X-ray space telescopes, and on using empirical models to make inferences about the underlying physics of the system. She is also interested in machine learning and astrostatistics.

Stefan Karpinski, Research Engineer at the NYU Center for Data Science

Talk Title:  Julia Solves the Four Language Problem for Data Science

Abstract: Once upon a time, I was forced to write my data science projects in patchwork of four languages: Matlab, R, C and Ruby. Today, I only use one: Julia. In this talk I’ll explore the kinds of data science problems that Julia excels at, how it cooperates in beautiful harmony with C, C++ and Python, and where we’re going with the language in the future. There will be lots of live coding and some slides with fancy transitions.

Bio: Stefan is a Research Engineer at the NYU Center for Data Science. Previously, he’s worked as a data scientist, researcher and software engineer at Etsy, Citrix Online, and Akamai.

6th Data Science Showcase
Wednesday, September 16th

NYU Torch Club
Tap Room
18 Waverly Pl
New York, NY 10003

  • 4:00-4:30pm: Wine & refreshments
  • 4:30-5:30pm: Talk given by Prof. Vasant Dhar
  • 5:30-6:30pm: Faculty mixer continues with more wine & refreshments

Vasant Dhar, Professor and Head, Information Systems Group, Co-Director, Center for Business Analytics
Talk Title:  Should You Trust Your Money to a Robot?
Abstract:  Financial markets emanate massive amounts of data from which machines can, in principle, learn to invest with minimal initial guidance from humans. I contrast human and machine strengths and weaknesses in making investment decisions. The analysis reveals areas in the investment landscape where machines are already very active and those where machines are likely to make significant inroads in the next few years.

Background reading:

5th Data Science Showcase
Wednesday, April 15th

Rumi Chunara, Assistant Professor in Computer Science and Engineering and the Global Institute of Public Health
Talk Title:  Data Science for Improved Public Health
Abstract:  Internet and mobile connectivity have enabled a plethora of new data that offer unprecedented opportunities in many domains. The real-time, quantitative and geo-located data generated through these crowdsourced sensors can improve population-public health surveillance, which suffers from limits due to latency, high cost, inherent contributor biases, and imprecise resolution. Simultaneously, the observational and informal nature of these data sources present some common and new data challenges. In this talk I will discuss a few approaches towards integrating crowdsourced data into epidemiological models, including increasing specificity, and relating the data to the underlying population at risk. I will also show how the proposed new data sources and methods can help approach new questions in both infectious and chronic disease.

Neil Rabinowitz, Postdoctoral Fellow, Laboratory for Computational Vision
Talk Title:  The statistical structure of noise in the brain
Abstract:  While those in machine learning have long drawn inspiration from brains for smart solutions to real-world problems, those of us who have tested out these analogue devices (either by owning one, or by interacting with others who do) are well aware that brains have some severe limitations. At a low level, they have to compute with noisy, slow, squidgy elements. At a high level, they’re prone to some serious mistakes. In this talk, I’ll argue that the noise in the brain is structured, and that by characterizing it, we can learn about the brain’s computational challenges, successes, and failures. In particular, I will focus on data from large populations of sensory neurons during a perceptual task, whose joint activity reveals the action of latent modulatory sources. Such data provide a rich testing ground for models that are both statistically principled and scientifically interpretable.

4th Data Science Showcase
Wednesday, October 22nd

Brian McFee, NYU Data Science Fellow
Talk title: Analyzing and Visualizing Musical Structure
Musical structure analysis seeks to produce a high-level description of the structural components of an audio recording, such as the boundaries marking transitions between sections (e.g., verse to chorus) or labeling repeated sections.  The results of structure analysis can be used to generate visualizations and aid in discovery and understanding of musical content. In this talk, I’ll give a brief overview of the structure analysis problem and existing solutions, and then demonstrate a new technique based on spectral clustering.  The proposed method simultaneously exploits local consistency and global repetition of acoustic features, and provides a natural encoding of structure at multiple levels of granularity.

Brenden Lake, NYU Data Science Fellow
Talk title: Towards Richer Models of Concept Learning in Machines
People can learn a new concept almost perfectly from just a single example, yet machine learning algorithms typically require tens, hundreds, or thousands of examples to perform similarly. People can also use their learned concepts in richer ways than conventional machine learning systems — for action, imagination, and explanation. I will describe a model that captures some of these human learning abilities by representing concepts as simple programs, or structured procedures that generate the observed data. The model achieves human-level performance on a one-shot classification task, and “visual Turing tests” show the model’s more creative generalizations can mimic the behavior of people.

3rd Data Science Showcase
Thursday, September 18th
Professor Panos Ipeirotis, Associate Professor and George A. Kellner Faculty Fellow at the Department of Information, Operations, and Management Sciences at NYU’s Stern School of Business, will discuss Crowdsourcing.

Panos Ipeirotis’ recent research interests focus on crowdsourcing and on mining user-generated content on the Internet. He received his Ph.D. degree in Computer Science from Columbia University in 2004. He has received four “Best Paper” awards (IEEE ICDE 2005, ACM SIGMOD 2006, WWW 2011, ICIS 2012), two “Best Paper Runner Up” awards (JCDL 2002, ACM KDD 2008), and is also a recipient of a CAREER award from the National Science Foundation and of several other grants.

Presentation Slides: Panos- Crowdsourcing

2nd Data Science Showcase
Monday, May 12th
Professor Enrico Bertini, Assistant Professor Computer Science and Engineering, NYU Poly, will give an overview of his work on visualization (for more info click here)

Enrico Bertini is a researcher in the area of information visualization, human computer interaction and visual analytics. Interactive data visualization helps people extract relevant information out of large and complex data and, as such, it has the potential to allow for remarkable discoveries and progress. His research focuses on the development of novel techniques and applications as well as on the understanding of how people interact with them. Enrico’s talk should be of broad interest to anyone at NYU interested in data science, and will be followed up next semester with a tutorial and a reading group on visualization.  Here is an article  that can serve as snapshot into his talk: Q&A with Enrico Bertini |

Presentation Slides: Bertini- Visualization in Data Science

Recording of Showcase: Bertini Visualization Video

1st Data Science Showcase
Monday, March 10th
Professor Jennifer Hill, Associate Professor of Applied Statistics, Steinhardt, gave a talk on “Causal Inference and Data Science — Why they need each other”

Jennifer Hill works on development of methods that help us to answer the causal question that are so vital to policy research and scientific development. In particular she focuses on situations in which it is difficult or impossible to perform traditional randomized experiments, or when even seemingly pristine study designs are complicated by missing data or hierarchically structured data. Most recently Hill has been pursuing two major strands of research. The first focuses on Bayesian nonparametric methods that allow for flexible estimation of causal models without the need for methods such as propensity score matching. The second line of work pursues strategies for exploring the impact of violations of typical assumptions in this work that require that all confounders have been measured.  Hill is also the Co-Director of the Center for Research Involving Innovative Statistical Methodology (PRIISM) and the new Master’s Program in Applied Statistics for Social Science Research (A3SR)

Presentation Slides: Hill- Casual Inference & Data Science