Harvard Data Science Initiative
Harvard Data Science Initiative
  • 87
  • 145 804
Causal Seminar: Paul Rosenbaum, University of Pennsylvania
Being Realistic About Unmeasured Biases in Observational Studies
The talk is intended as an introduction to some recent technical work, but I don’t recommend reading the technical work prior to the talk; so, there is no pre-reading. Someone who entirely new to the subject might optionally take a look at Chapter 9 (Sensitivity Analysis) and Chapter 10 (Design Sensitivity) of my book “Observation and Experiment: An Introduction to Causal Inference” (Harvard University Press, 2017); however, the talk will be a gentle introduction to new and somewhat more technical material. It looks like Harvard’s library provides on-line access to “Observation and Experiment”.
Observational studies of the effects caused by treatments are always subject to the concern that an ostensible treatment effect may reflect a bias in treatment assignment, rather than an effect actually caused by the treatment. The degree of legitimate concern is strongly affected by simple decisions that an investigator makes during the design and analysis of an observational study. Poor choices lead to heightened concern; that is, poor choices make a study sensitive to small unmeasured biases where better choices would correctly report insensitivity to larger biases. Indeed, perhaps surprisingly, unambiguous evidence of the presence of unmeasured bias may increase insensitivity to unmeasured bias. These issues are discussed with the aid of some theory and a simple example of an observational study.
Speaker:
Paul Rosenbaum, Robert G. Putzel Professor Emeritus of Statistics and Data Science, Wharton School of the University of Pennsylvania
zhlédnutí: 199

Video

Causal Seminar: Elizabeth Stuart, Johns Hopkins University
zhlédnutí 272Před 14 dny
Combining experimental and non-experimental data to examine treatment effect heterogeneity Determining “what works for whom” is a key goal in prevention and treatment across a variety of areas, including mental health. Identifying effect moderators-factors that relate to the size of treatment effects-is crucial for delivery of treatment and prevention interventions, but doing so is incredibly d...
HDSI Industry Seminar: Foundation Medicine, Leah Comment
zhlédnutí 142Před 14 dny
Causal data science: personalizing treatment decisions in cancer Speaker: Leah Comment, Director, Decision Sciences, Foundation Medicine
HDSI Industry Seminar: Ron Papka, Voya Investment Management
zhlédnutí 195Před měsícem
Monetizing Data Science on Wall Street In this seminar, Ron Papka will discussed: - Use of Machine Learning, NLP and Analytics in the financial sector - Skills needed for change management projects leveraging Data Science and AI innovations - The business value of data and analytics on Wall Street Speaker: Ron Papka, SVP, Head of Data Engineering and Governance, Voya Investment Management
Using Machine Learning to Unveil the Invisible Universe
zhlédnutí 478Před měsícem
Workshop | Data Science in the Physical Sciences Speaker: Carlos Arguelles Delgado, Assistant Professor of Physics, Harvard University; IAIFI | Using Machine Learning to Unveil the Invisible Universe
Machine Learning the Genealogy of the Milky Way
zhlédnutí 477Před měsícem
Workshop | Data Science in the Physical Sciences Speaker: Lina Necib, Assistant Professor of Theoretical Astrophysics, MIT
Workshop | Data Science in the Physical Sciences (Standard Format)
zhlédnutí 857Před měsícem
Instructor: Matthew Schwartz, Professor of Physics, Department of Physics, Harvard University Presenters: Carlos Arguelles Delgado, Assistant Professor of Physics, Harvard University; IAIFI | Using Machine Learning to Unveil the Invisible Universe Lina Necib, Assistant Professor of Theoretical Astrophysics, MIT | (Machine) Learning the Genealogy of the Milky Way
Tutorial | Bayesian causal inference: A critical review and tutorial (Standard Format)
zhlédnutí 10KPřed měsícem
This tutorial aims to provide a survey of the Bayesian perspective of causal inference under the potential outcomes framework. We review the causal estimands, assignment mechanism, the general structure of Bayesian inference of causal effects, and sensitivity analysis. We highlight issues that are unique to Bayesian causal inference, including the role of the propensity score, the definition of...
Tutorial | LLMs in 5 Formulas (Standard Format)
zhlédnutí 13KPřed 2 měsíci
Slide deck: drive.google.com/file/d/1DGXbMU4cCK15nbLiI3zcuwmvClwzoEsY/view?usp=sharing One year after the release of GPT-4, large language models (LLMs) remain the most exciting topic in AI. While much about their qualitative capabilities remain poorly understood, there are some areas where we can quantitatively measure, bound, and forecast their behavior. This tutorial will introduce the topic...
Workshop | AI: A Serious Look at Big Questions (Standard Format)
zhlédnutí 1,4KPřed 2 měsíci
This interdisciplinary workshop brings together leading figures in neuroscience, philosophy, and AI to address the most pressing questions on artificial intelligence. Entitled “A Serious Look at the Big Questions,” the event is structured as a series of discussions and panels that seeks to take seriously the “big questions” posed by AI including the similarities between AI and human intelligenc...
Industry Seminar: Praveen Pankajakshan, Cropin
zhlédnutí 263Před 2 měsíci
Geospatial AI for Monitoring Food Security and Climate Resilient Agriculture Amid the rising food insecurity and climate challenges, there is a need for some urgent and crucial, coordinated actions. Some recent reports from the United Nations have highlighted that in 2023, about 25 million people in the west sub-saharan regions are at a high risk of food insecurity, indicating a sharp rise in h...
Workshop | AI: A Serious Look at Big Questions (360°)
zhlédnutí 715Před 2 měsíci
WATCH IN STANDARD FORMAT: czcams.com/video/AR67AvEkVGU/video.html This interdisciplinary workshop brings together leading figures in neuroscience, philosophy, and AI to address the most pressing questions on artificial intelligence. Entitled “A Serious Look at the Big Questions,” the event is structured as a series of discussions and panels that seeks to take seriously the “big questions” posed...
Tutorial | LLMs in 5 Formulas (360°)
zhlédnutí 32KPřed 2 měsíci
WATCH IN STANDARD FORMAT: czcams.com/video/k9DnQPrfJQs/video.html One year after the release of GPT-4, large language models (LLMs) remain the most exciting topic in AI. While much about their qualitative capabilities remain poorly understood, there are some areas where we can quantitatively measure, bound, and forecast their behavior. This tutorial will introduce the topic of LLMs through 5 ke...
Tutorial | Bayesian causal inference: A critical review and tutorial (360°)
zhlédnutí 874Před 2 měsíci
WATCH IN STANDARD FORMAT: czcams.com/video/7Cwl6DgL64o/video.html This tutorial aims to provide a survey of the Bayesian perspective of causal inference under the potential outcomes framework. We review the causal estimands, assignment mechanism, the general structure of Bayesian inference of causal effects, and sensitivity analysis. We highlight issues that are unique to Bayesian causal infere...
Scenes from the 2024 HDSI Annual Conference
zhlédnutí 64KPřed 2 měsíci
Scenes from the 2024 HDSI Annual Conference
Industry Seminar: Francesca Lazzeri, Microsoft
zhlédnutí 242Před 4 měsíci
Industry Seminar: Francesca Lazzeri, Microsoft
Industry Seminar: Isabel Fulcher, Delfina
zhlédnutí 257Před 5 měsíci
Industry Seminar: Isabel Fulcher, Delfina
HDSI Postdoc Experience: Minsuk Shin
zhlédnutí 72Před 6 měsíci
HDSI Postdoc Experience: Minsuk Shin
HDSI Causal Seminar: Alberto Abadie, MIT
zhlédnutí 142Před 6 měsíci
HDSI Causal Seminar: Alberto Abadie, MIT
HDSI Causal Seminar: Postdoctoral Fellow Showcase
zhlédnutí 131Před 6 měsíci
HDSI Causal Seminar: Postdoctoral Fellow Showcase
HDSI Causal Seminar: Georgia Papadogeorgou, University of Florida
zhlédnutí 126Před 6 měsíci
HDSI Causal Seminar: Georgia Papadogeorgou, University of Florida
HDSI Postdoc Experience: Max Kleiman-Weiner
zhlédnutí 70Před 6 měsíci
HDSI Postdoc Experience: Max Kleiman-Weiner
Industry Seminar: Slawek Kierner, Intuitive
zhlédnutí 116Před 6 měsíci
Industry Seminar: Slawek Kierner, Intuitive
HDSI Postdoc Experience: Isabel Fulcher
zhlédnutí 155Před 6 měsíci
HDSI Postdoc Experience: Isabel Fulcher
HDSI Annual Conference Recap
zhlédnutí 223Před 6 měsíci
HDSI Annual Conference Recap
HDSI Causal Seminar: Fan Li, Duke University
zhlédnutí 337Před 7 měsíci
HDSI Causal Seminar: Fan Li, Duke University
Adopting Digital Trust: Culture, Talent, and Capabilities
zhlédnutí 242Před rokem
Adopting Digital Trust: Culture, Talent, and Capabilities
Data Ethics - Leaders and Frameworks
zhlédnutí 265Před rokem
Data Ethics - Leaders and Frameworks
Digital Trust as a Global Concept
zhlédnutí 570Před rokem
Digital Trust as a Global Concept
The Chief Trust Officer - What it means to be a technical leader
zhlédnutí 346Před rokem
The Chief Trust Officer - What it means to be a technical leader

Komentáře

  • @olorunfemijoshua9586

    How can I be part of this program?

  • @dskbiswas
    @dskbiswas Před 7 dny

    This video can be a terrific example of overfitting 😁😆

  • @JMSproduction-qk2rx
    @JMSproduction-qk2rx Před 13 dny

    Pleass my youtube compleat now

  • @dj...channel2549
    @dj...channel2549 Před 14 dny

    So amazing 😊

  • @hanhphuclaemtn92
    @hanhphuclaemtn92 Před 21 dnem

    Yydhkxhxhx

  • @hanhphuclaemtn92
    @hanhphuclaemtn92 Před 21 dnem

    Đìh

  • @shrirangmoghe3784
    @shrirangmoghe3784 Před 22 dny

    Please upload slides and perhaps sync it to the audio if you can. Can’t believe we are in age of AI and humans are already losing it

    • @shrirangmoghe3784
      @shrirangmoghe3784 Před 22 dny

      What are we even looking at here ? Are we at an edge of some black hole? Totally nauseating

  • @davidbadmus-yh9bo
    @davidbadmus-yh9bo Před 27 dny

    like this

  • @swenic
    @swenic Před měsícem

    Not liking the current layout. It is not possible to make up a lot of detail (you can see a large room w people in it but nothing of value is discernible) in the small image, and placing it side by side w the presented material essentially makes that harder to see as well, resulting in a need for more screen estate or get used to watching things in half the size that is customary. How about instructing presenters to leave a small box area in their presentations and paste the room in that?

  • @ericgibson2079
    @ericgibson2079 Před měsícem

    Please support The End and Prevention of Homelessness USA 2030!

  • @JOHNSMITH-ve3rq
    @JOHNSMITH-ve3rq Před měsícem

    If y’all could make sure you actually have clean audio in future that’d be great. The banging is super super distracting and brings down the quality of the whole thing.

  • @santaespinal1540
    @santaespinal1540 Před měsícem

    🎯 Key Takeaways for quick navigation: 00:12 *🎓 Introduction of Sasha Rush by David Parks* - Introduction of Sasha Rush, a professor at Cornell University and a contributor to Hugging Face, now part of Google, known for his work in AI and large language models. - Sasha's background in computer science, his contributions to AI research, and his commitment to open-source initiatives. 04:28 *🗣️ Overview of Large Language Models (LLMs)* - Large language models (LLMs) are discussed in terms of their significance, complexity, and implications. - LLMs are described as extremely useful, expensive, important, and sometimes perceived as intimidating, but also capable of providing remarkable outputs with consistency and creativity. - The talk aims to provide insights into LLMs and address questions regarding their functioning, reasoning, and impact on various domains. 07:38 *📊 Dividing LLM Understanding into Five Formulas* - Sasha introduces the concept of understanding LLMs through five formulas: perplexity, attention, gem, chinchilla, and rasp. - Each formula represents a different aspect of LLMs, such as generation, memory, efficiency, scaling, and reasoning, providing a comprehensive view of their functionalities. - These formulas aim to simplify complex concepts and enable a deeper understanding of LLMs for both technical and non-technical audiences. 11:33 *🔍 Understanding Perplexity in Language Models* - Perplexity in language models is explained, focusing on the probabilistic model of documents and the probability distribution of word sequences. - The Markov assumption and the representation of Theta as categorical distributions are discussed as historical aspects of language models. - Modern advancements in LLMs are highlighted, indicating departures from past assumptions and embracing more complex neural network architectures. - Language models' historical evolution is discussed, from early Markov models to modern neural network-based LLMs. - Changes in assumptions and approaches, such as fixed-length history and categorical representations, are contrasted with contemporary methods emphasizing neural network architectures. - The session concludes with reflections on the enduring relevance and evolution of language models in the field of natural language processing. 21:04 *🧮 Language Model Development Pre-2010* - Explanation of early methods to develop language models. - Discussion of Shannon's paper from 1948 outlining language model development. - Overview of the challenges and limitations of early language models. 22:32 *📊 Quantifying Language Model Performance* - Introduction to evaluating language model performance. - Explanation of the challenges in quantifying language model effectiveness due to the lack of a definitive correct answer. - Illustration of the evaluation process using an example sentence and the concept of word prediction difficulty. 29:17 *📏 Metric Relating to Compression and Shannon's Work* - Introduction to a metric related to language compression inspired by Shannon's work. - Explanation of encoding words with binary strings based on their probabilities. - Discussion on how the metric, perplexity, measures the efficiency of language compression. 38:21 *📈 Evaluation and Practical Applications of Perplexity* - Discussion on the practical evaluation of perplexity using a test set. - Exploration of the relationship between perplexity and model quality. - Overview of historical perplexity scores achieved by various language models and their implications. 44:09 *📊 Correlation between Perplexity and Task Accuracy* - Perplexity correlates almost perfectly with translation accuracy. - General language perplexity correlates extremely well with task accuracy on different tasks. 45:59 *📈 Impact of Perplexity Reduction on Model Performance* - Lower perplexity corresponds to better model performance. - Graphs illustrate the relationship between perplexity reduction and resource investment. 53:17 *🧠 Introduction to Memory and Attention Models* - Explains the concept of memory and attention in neural network models. - Introduces the limitations of Markov models and the need for attention mechanisms. 01:05:03 *🧠 Understanding Query, Key, and Value Operations in LLMs* - Query, key, and value operations in LLMs enable prediction of the next word based on contextual information. - The process involves querying a lookup table using the query, retrieving the key, and using it to predict the next word. 01:07:03 *🔄 Transitioning from Argmax to Softmax* - Argmax selection poses challenges in neural networks due to its discontinuous nature. - Softmax provides a smooth alternative for selecting the best choice, enabling meaningful training. 01:15:08 *📊 Evolution of Attention Mechanisms in Language Modeling* - The attention mechanism, popularized by the "Attention is All You Need" paper, revolutionized language modeling. - Transformers utilize attention mechanisms to process input sequences efficiently and capture long-term dependencies. 01:18:05 *🖥️ Efficient Computation with Attention Mechanisms* - Attention mechanisms efficiently compute long-term context by utilizing matrix operations. - Matrix multiplication and softmax operations allow for effective computation of attention scores. 01:27:10 *🖥️ Transition from CPUs to GPUs for efficient computation* - The shift from CPUs to GPUs revolutionized the efficiency of running applications like language models. - GPUs allowed for faster computation of complex operations like softmax, significantly altering the research landscape. 01:29:19 *🧠 Understanding GPU architecture and parallel computing* - GPUs operate as parallel computers with multiple threads running simultaneously. - Threads within blocks share block memory, enabling fast data exchange and computation. 01:34:24 *🧮 Efficient matrix multiplication on GPUs* - Matrix multiplication on GPUs involves loading data into block memory, performing computations within blocks, and minimizing reads from global memory. - Leveraging shared memory and parallel processing allows for efficient computation of matrix multiplication. 01:47:04 *💡 Maximizing GPU performance for ML applications* - ML applications benefit from efficient GPU performance, measured in ML operations per second (MLops). - Optimizing data formats, such as using smaller floating-point values, enhances GPU efficiency. 01:53:20 *🚀 GPU Optimization: Why Speed Matters* - GPU programming optimization is crucial for efficient computation. - Efficient GPU programming enables faster computation, which is essential for large-scale models like LLMs. 01:53:49 *🔍 Scaling in LLMs: Model Size vs. Training Data* - The performance of LLMs depends on the interplay between model size and training data. - Increasing model size and training data improves the model's ability to generalize and understand complex patterns. 01:54:42 *📊 Compute Optimization Formula: The Chinchilla Formula* - The Chinchilla Formula extrapolates the expected perplexity of LLMs based on model size and training data. - It suggests a proportional relationship between model size and training data for optimal performance. 02:11:35 *📊 Cost Comparison and Scaling Laws* - Understanding the costs associated with large language models (LLMs). - Comparing and incorporating costs related to data acquisition and model scaling. 02:13:09 *💰 Cost Considerations in Model Development* - Addressing cost considerations in LLM development, especially focusing on compute and data. - Exploring scenarios where reducing specific costs benefits certain stakeholders. 02:14:46 *🔄 Data Reusability and Synthesis* - Discussing the reusability of training data and its diminishing returns. - Exploring the potential of synthetic text generation and its current applications. 02:17:29 *📰 Importance of Training Data Quality* - Highlighting the significance of high-quality training data, such as from sources like the New York Times. - Discussing challenges in quantifying and assessing the quality of training data. 02:18:19 *🛑 Addressing Token Exhaustion Concerns* - Discussing concerns related to token exhaustion and its potential impact on model training. - Exploring strategies to address token scarcity, including alternative data sources. 02:32:52 *🧠 Exploring Formal Logic in Language Models* - Introducing a formal logic approach to understanding language model behavior. - Discussing the use of logical operations to manipulate and analyze model behavior. 02:34:57 *🔄 RASP Language and Multi-Layered Models* - Explaining the RASP language and its deterministic, logical approach to modeling. - Highlighting the benefits of using multiple layers in language model architectures. 02:36:30 *🔍 Potential and Limitations of Formal Logic in LLMs* - Discussing the implications of formal logic approaches for understanding model capabilities. - Exploring the feasibility of implementing complex tasks using logical formulations in language models. Made with HARPA AI

  • @noninvasive_rectal_probe8990

    Lmao this talk is trash

    • @420_gunna
      @420_gunna Před měsícem

      What do you think is bad about it? Haven't listened yet, but Sasha always puts out great content in the past.

  • @imaspacecreature
    @imaspacecreature Před měsícem

    Wanted to hear more!

  • @ShivaprakashYaragal
    @ShivaprakashYaragal Před měsícem

    This is awesome. I loves these tools and Taxi data

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Před měsícem

    If Harvard wasn’t so woke. Only woke DEI look down upon Extension students.

  • @AlgoNudger
    @AlgoNudger Před měsícem

    Thanks.

  • @AlgoNudger
    @AlgoNudger Před měsícem

    Thanks.

  • @r2internet
    @r2internet Před měsícem

    Thanks for the informative talk. Can you please share the slides?

  • @mikaackermann4072
    @mikaackermann4072 Před 2 měsíci

    Why 360? How about a normal Video?

  • @kalmyk
    @kalmyk Před 2 měsíci

    you can just imagine formulas

    • @noomade
      @noomade Před měsícem

      what do you mean?

  • @jaredtweed7826
    @jaredtweed7826 Před 2 měsíci

    Can you upload this with the slide not in VR

  • @RocketmanUT
    @RocketmanUT Před 2 měsíci

    Going to need to reupload this, the slides are distorted.

  • @pankajsinghrawat1056
    @pankajsinghrawat1056 Před 2 měsíci

    make normal videos please

  • @travelcatchannel8657
    @travelcatchannel8657 Před 2 měsíci

    Thanks very much for this presentation. It helps a lot. Could you kindly tell me the tool you use for demo?

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Před 3 měsíci

    why does every harvard presentation have such a long preamble. just skip the first 10 minutes if you don't want to sit through it.

  • @KyleXLS
    @KyleXLS Před 3 měsíci

    I'd love to have access to the data to play around with in ArcGIS Pro.

  • @TheNighter
    @TheNighter Před 4 měsíci

    Harvard should not exist.

  • @kinmacpherson9302
    @kinmacpherson9302 Před 5 měsíci

    'Promosm' 🤪

  • @mgophern
    @mgophern Před 6 měsíci

    could you share slides?

  • @toddbrous_untwist
    @toddbrous_untwist Před rokem

    This was such an awesome video! Thank you HDSI for posting this.

  • @lorenzo2775
    @lorenzo2775 Před rokem

    I like your style!!!! #1 YT views provider -> Promo>SM!

  • @optimism90
    @optimism90 Před rokem

    Thanks for sharing the tutorial video. Could you share the R code and slides if possible, thanks!

  • @marilinfykes7558
    @marilinfykes7558 Před rokem

    p̷r̷o̷m̷o̷s̷m̷ 😑

  • @chinwevivianaliyu
    @chinwevivianaliyu Před rokem

    Would it be possible to have a recorded version for those who registered?

  • @muhammadsyukri746
    @muhammadsyukri746 Před rokem

    Thanks so much indeed for this video. Helps me a lot

  • @haow85
    @haow85 Před 2 lety

    Are their any open datasets for fair AI ?

  • @somewheresomeone3959
    @somewheresomeone3959 Před 2 lety

    Great work and thanks for sharing! Is it just me or the video is a lil bit asynchronous of the voice and the pictures?

  • @ThinkQbD
    @ThinkQbD Před 2 lety

    Thank you. Great panel discussion!