• Prior-knowledge-based Mammalian Gene Regulatory Network Inference

    During metazoan animal development, changes in chromatin states give rise to diverse patterns of gene expression that direct the differentiation of progenitor cells into various tissues and organs. We use information from TF binding motifs and chromatin accessibility data to infer cell-type specific TF occupancy, and incorporate inferred TF-target interactions as prior knowledge to learn […]

  • Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests

    We give a polynomial-time algorithm for provably learning the structure and parameters of bipartite noisy-or Bayesian networks of binary variables where the top layer is completely hidden. Unsupervised learning of these models is a form of discrete factor analysis, enabling the discovery of hidden variables and their causal relationships with observed data. We obtain an […]

  • Massive Multi-Species Function Prediction

    The rate of new protein discovery has, in recent years, outpaced our ability to annotate and characterize new proteins and proteomes. In order to combat this functional annotation deficit, many groups have successfully turned to computational techniques, attempting to predict the function of proteins in order to guide experimental verification. The most prolific methods come from […]

  • Big Data for Traffic Safety Performance Evaluation

    “Vision Zero” puts traffic safety at the forefront of NYC’s agenda with the goal of ending traffic deaths and injuries on City’s streets[1]. Traffic safety has also been and continues to be one of the main focus areas of USDOT for more than several decades. This research project is one of the several on-going research […]

  • Interactive Visualization of Density Estimation Using Adaptive Bandwidths

    This work presents a novel technique for real-time density estimation using adaptive bandwidth and GPUs. Density estimation and heatmaps are one of the most commonly used types of visualization for geo-referenced data; it allows the user to easily get insights from the data in a simple and straightforward way, by visualizing the density of a […]

  • Provably and Practically Learning Topic Models

    Topic models provide a useful method for dimensionality reduction and exploratory data analysis in large text corpora. Most approaches to topic model learning have been based on a maximum likelihood objective. Efficient algorithms exist that attempt to approximate this objective, but they have no provable guarantees. Recently, algorithms have been introduced that provide provable bounds […]

  • Visual Inter-Comparison of Multifaceted Climate Models

    Inter-comparison and similarity analysis to gauge consensus among multiple simulation models is a critical problem for understanding climate change patterns. Climate models represent time and space variable ecosystem processes, like, simulations of photosynthesis and respiration, using algorithms and driving variables such as climate and land use. It is widely accepted that effective use of visualization […]

  • Visual Exploration of Big Spatio-temporal Urban Data: A Study of New York City Taxi Trips

    As increasing volumes of urban data are captured and become available, new opportunities arise for data-driven analysis that can lead to improvements in the lives of citizens through evidence-based decision making and policies. In this work, we focus on a particularly important urban data set: taxi trips. Taxis are valuable sensors and information associated with […]

  • Using Topological Analysis to Support Event-Guided Exploration in Urban Data

    The explosion in the volume of data about urban environments has opened up opportunities to inform both policy and administration and thereby help governments improve the lives of their citizens, increase the efficiency of public services, and reduce the environmental harms of development. However, cities are complex systems and exploring the data they generate is challenging. The interaction between the […]

  • Two Machine-Learning Models of Object Recognition Exhibit Key Features of Human Performance

    We have implemented two machine-learning models of object recognition by human observers. Both models capture two hallmarks of human performance: (1) spatial frequency channels and (2) effects of font complexity. One model is a Convolutional Neural Network (ConvNet), and the other is a texture statistics model followed by a simple classifier. With appropriate training and […]

  • Weakening the Effect of System Justification on Support for Drone Attacks: A Moral Intervention

    According to system justification theory, people are motivated to justify and defend the status quo, that is the prevailing social, economic, and political systems (Jost & Banaji, 1994). In the real world, one’s system justifying tendencies can be in conflict with his/her moral orientation when making judgments on issues such as targeted killings using unmanned drones. In a […]

  • Skytree: A Scalable Data Science Environment for Massive Datasets

    Skytree is the worldʼs most advanced machine learning software. It acts as a machine learning server to allow advanced data mining on large data, e.g., within oneʼs data processing pipeline, or more specialized science project. Skytree’s Alex Gray also headed the FASTlab group at the Georgia Institute of Technology. The group holds several records for […]

  • Modeling and Analysis of Chronic Diseases

    Chronic diseases such as type II diabetes have become increasingly prevalent over the past few decades. If left untreated, these conditions lead to many complications that tremendously affect patients’ quality of life, and also create a large economic burden to the healthcare industry and governments. The goal of this project is to model the disease trajectory, and identify risk […]

  • Swept Under the Rug

    In 2012, Donor’s Trust, a tax exempt organization emphasizing a libertarian, free-market oriented political ideology, and the Planned Parenthood Action Fund each spent around three times as much as did the largest Political Action Committee. These organizations (organized under section 501 of the US Code) expend significant money to promote their political opinions (which run the gamut of the ideological […]

  • Collaborative Statistical Modeling

    The Nobel Prize in Physics 2013 was awarded jointly to François Englert and Peter W. Higgs “for the theoretical discovery of a mechanism that contributes to our understanding of the origin of mass of subatomic particles, and which recently was confirmed through the discovery of the predicted fundamental particle, by the ATLAS and CMS experiments at CERN’s […]

  • Testing Ignorability: Methods and Tools for Sensitivity Analysis and Causal Inference

    In as much as that the ignorability assumption, or “all confounders measured”, is hard to justify and yet rarely quantitatively challenged, we are developing methods to perform sensitivity analyses that are both easy to use and interpret. Our end goal is to change the way that research from observational studies is conducted, in that tests […]

  • VisTrails

    VisTrails is an open-source provenance management and scientific workflow system that was designed to support the scientific discovery process. VisTrails provides unique support for data analysis and visualization, a comprehensive provenance infrastructure, and a user-centered design. The system combines and substantially extends useful features of visualization and scientific workflow systems. Similar to visualization systems, VisTrails […]

  • ReproZip

    Reproducibility is a core component of the scientific process: it helps researchers all around the world to verify the results and also to build on them, alowing science to move forward. In natural science, long tradition requires experiments to be described in enough detail so that they can be reproduced by researchers around the world. […]

  • Citygram-Sound

    Citygram-Sound Project is a collaboration between NYU Steinhardt, NYU CUSP,  and CalArts. The Citygram Project is a large-scale project that began in 2011. Citygram aims to deliver a real-time visualization/mapping system focusing on non-ocular energies through scale-accurate, non-intrusive, and data-driven interactive digital maps. The first iteration, Citygram One, focuses on exploring spatio-acoustic energies to reveal […]

  • EDM and LAK in Games and Simulations for Learning

    Applying educational data mining and learning analytics approaches to data collected from games and simulations for learning is an emergent practice. The CREATE Lab and the Games for Learning Institute have identified three different types of datasets of interest to researchers, including intensive lab data sets, extensive data sets, and intensive field data sets. Intensive […]

  • Social Media and Protest: Information and Influence in Turkey’s Gezi Park Protests

    Turkey’s Gezi Park was and is the site of a major protest movement with an enormous presence on social media. Social media sites such as Twitter and Facebook were not only used explicitly to coordinate protest movements and foster communication between those on the ground, but also to represent the movement worldwide and to nurture […]

  • The Persuasive Power of Data Visualization

    Data visualization has been used extensively to inform users. However, little research has been done to examine the effects of data visualization in influencing users or in making a message more persuasive. In this study, we present experimental research to fill this gap and present evidence-based analysis of persuasive visualization. We build on persuasion research […]

  • The CUSP Urban Observatory

    The Urban Observatory (UO) at NYU’s Center for Urban Science+Progress (CUSP) is a unique user facility for the large-scale observation and analysis of cities.  Vantage points in NYC will support persistent and synoptic remote sensing in diverse modalities including visible, infrared, and hyperspectral imaging, as well as LIDAR and RADAR. The data acquired will be […]

  • Regularization of Neural Networks Using DropConnect

    We introduce DropConnect, a generalization of Dropout for regularizing large fully-connected layers within neural networks. When training with Dropout, a randomly selected subset of activations are set to zero within each layer. DropConnect instead sets a randomly selected sub-set of weights within the network to zero. Each unit thus receives input from a random subset of units […]

  • Restoring An Image Taken Through a Window Covered with Dirt or Rain

    Photographs taken through a window are often compromised by dirt or rain present on the window surface. Common cases of this include pictures taken from inside a vehicle, or outdoor security cameras mounted inside a protective enclosure. At capture time, defocus can be used to remove the artifacts, but this relies on achieving a shallow […]

  • INFUSE: Interactive Feature Selection for Predictive Modeling of High Dimensional Data

    Predictive modeling techniques are increasingly being used by data scientists to understand the probability of predicted outcomes. However, for data that is high-dimensional, a critical step in predictive modeling is determining which features should be included in the models. Feature selection algorithms are often used to remove non-informative features from models. However, there are many different classes of feature selection algorithms. Deciding […]

  • Fast Holographic Characterization of Colloidal Particles

    Images of micrometer-scale colloidal spheres created with holographic video microscopy incorporate comprehensive information on each particle’s three-dimensional position, its size, and its complex-valued refractive index.  NYU’s Holographic Characterization team uses state-of-the-art techniques in image analysis to extract all of this information for each particle in the accessible volume of a holographic microscope, for each snapshot […]

  • Medical Artificial Intelligence

    Sontag is working with Beth Israel Deaconess Medical Center to integrate all available demographic, laboratory, radiographic, and continuous hemodynamic data to more effectively monitor patients and alert clinicians to patients at risk of sepsis.

  • Genomics and the Rice Stress Response

    Among the projects of the Purugganan Laboratory at the Center for Genomics and Systems Biology is the Environmental Gene Regulatory Interaction Network (EGRINs), which aims to characterize gene regulatory networks involved in the rice stress response.

  • Computational Vision

    Simoncelli’s computational vision research aims to answer questions about how sensory systems work, and how we can use the principles of sensory systems to design better man-made systems for processing sensory signals.

  • Inference of Gene Regulatory Networks

    Bonneau and colleagues have developed the cMonkey and the Inferelator algorithms that integrate large multi-level systems biology data-sets to learn dynamic models of how cells make decisions. The Bonneau lab is developing these methods as part of larger efforts to understand the immune system and make more drought tolerant rice.

  • Sampling Techniques for Rare Events

    Vanden-Eijnden and colleagues have developed theoretical tools and their numerical counterparts to identify the pathways of rare events in complex systems and estimate their rate of occurrence and associated free energy.

  • Cosmological Measurement and Exoplanet Discovery

    Hogg’s main research interests are in observational cosmology, especially approaches that use galaxies (including our own Milky Way) to infer the physical properties of the Universe. He is also interested in developing the engineering systems that make these projects possible.

  • Reproducibility in Science

    Freire is working with scientists from different domains to develop infrastructure that supports the process of sharing, testing, and re-using scientific experiments and results. She is also investigating new search and query mechanisms to help scientists explore repositories of reproducible experiments.

  • Improving Quantitative Research

    Hill co-directs the Center for the Promotion of Research Involving Innovative Statistical Methodology (PRIISM), which is dedicated to improving the caliber of research in quantitative social, educational, behavioral, allied health, and policy science.

  • Semi-Supervised Learning in Large Image Collections

    Fergus and co-authors show how to utilize recent results in machine learning to obtain highly efficient approximations for semi-supervised learning. Their algorithm can be used to extract and combine label information from a database of 80 million images gathered from the Internet.

  • Learning Feature Hierarchies for Object Recognition

    The goal of this series of projects is to produce category-level object recognition systems with state-of-the-art performance that can run in real time. The systems are hierarchical (multi-stage) and use “deep learning” methods (unsupervised and supervised) to train the features at all levels.

  • Biological Computing

    Examples of Shasha’s biological computing research includes software to contribute to the visualization of the intersections and unions of collections of multiple experiments, multiple genomes, or even multiple baseball players. He has also been involved in machine learning work to infer the function of genes, the redundancy of genes, and regulatory networks.