hu-po
hu-po
  • 335
  • 562 928
gemini 1m tokens
gemini 1m tokens
zhlédnutí: 963

Video

Beyond Surface Statistics: LDM deep dive
zhlédnutí 2,1KPřed 11 měsíci
Beyond Surface Statistics: LDM deep dive
Speech2Speech AI Conversation App
zhlédnutí 2,3KPřed rokem
Speech2Speech AI Conversation App
What is RLHF?
zhlédnutí 4,8KPřed rokem
What is RLHF?
Robots using LLMs
zhlédnutí 3,4KPřed rokem
Robots using LLMs
Visual ChatGPT
zhlédnutí 1,1KPřed rokem
Visual ChatGPT
MathPrompter
zhlédnutí 777Před rokem
MathPrompter
Vid2Avatar
zhlédnutí 973Před rokem
Vid2Avatar
StyleGAN-T
zhlédnutí 1,6KPřed rokem
StyleGAN-T
Speech to Calendar with OpenAI's Whisper
zhlédnutí 454Před rokem
Speech to Calendar with OpenAI's Whisper
Inside LLMs
zhlédnutí 1,4KPřed rokem
Inside LLMs
Robotic Microbe Farms
zhlédnutí 292Před rokem
Robotic Microbe Farms
Polars vs Pandas
zhlédnutí 1,9KPřed rokem
Polars vs Pandas
AI Platforms and Markets
zhlédnutí 518Před rokem
AI Platforms and Markets

Komentáře

  • @MilesBellas
    @MilesBellas Před 2 hodinami

    Suggested paper: Vision GNN: An Image is Worth Graph of Nodes. Kai Han, Yunhe Wang, Jianyuan Guo, Yehui Tang, Enhua Wu.

  • @NLPprompter
    @NLPprompter Před 7 hodinami

    o Ma Ma time for tp learn more theory. thanks hu-po learning never been so lonely anymore.

  • @mwd6478
    @mwd6478 Před 3 dny

    Mistral large 2 is apparently good as well, and their moe stuff can be cheaper for interference or lower latency

  • @mwd6478
    @mwd6478 Před 3 dny

    Tools take up input token, context window, right?

    • @NLPprompter
      @NLPprompter Před 8 hodinami

      as far i try with gemini yes in gemini playground it got token count usage, maybe it's the same with llama

  • @localminimum
    @localminimum Před 4 dny

    All in on the llama.

  • @wolpumba4099
    @wolpumba4099 Před 6 dny

    *Summary* This is a summary of hu-po's CZcams live stream where he dissects the Llama 3.1 paper. *Abstract (**0:11:39**)* - Meta releases Llama 3 models, a herd of models with strong performance in multiple areas including multilinguality, coding, reasoning, and tool usage. - Missing is multimodality (image/video/speech) despite their experiments on it. - Largest model has 405 billion parameters, larger than can be used locally by most people. - Trained on over 15 trillion tokens with 3.8 x 10^25 flops (10x more data and 50x more training compute compared to Llama 2). - Open-source nature signifies a shift in the AI market, potentially challenging the dominance of closed-source models. *Key takeaways:* - Emphasis is on data quality and quantity, utilizing meticulous data curation pipelines, deduplication techniques, and model-based filtering. - The architecture remains mostly unchanged from Llama 2, sticking with a standard dense transformer model. - Post-training uses established techniques (SFT, DPO), with significant emphasis on synthetic data generation and curriculum learning. - Meta's research supercluster plays a crucial role in scaling model training, highlighting emerging challenges like power consumption management and system robustness. *Training Process Breakdown:* - *Pre-Training Data (**0:28:01**):* Sources kept vague for regulatory reasons but involve diverse, multilingual data till 2023 end. Data cleaning heavily focuses on removing duplicates, adult content, outlier tokens, and markdown formatting (found detrimental). Data mix is crucial and is iteratively optimized via scaling law experiments. - *Model Architecture (**0:38:16**):* Similar to Llama 2, primarily leveraging a standard dense transformer model (Attention is All You Need). - *Scaling Laws (**0:42:25**):* Not actual laws, more like hypotheses about model performance relative to data size and training compute. Can be unreliable, highlighting the need for continued research into understanding and manipulating these "laws." - *Research Supercluster (**0:49:39**):* Meta's massive computational cluster consists of 16,000 Nvidia H100 GPUs interconnected in a sophisticated multi-layered system. Key takeaway: Maintaining such complex systems demands a specialized data center management engineer role, indicating a shift in the AI job landscape. - *Parallelism (**1:00:56**):* Four types: Tensor, pipeline, context, and data parallelism. All used simultaneously, effectively treating the datacenter as a multi-dimensional computational mesh. Challenges arise from GPU utilization efficiency and system failures impacting performance. - *Post-Training Recipe (**1:17:22**):* Initial pre-training, long-context pre-training, and annealing. Post-training process relies heavily on reward models, DPO, and synthetic data. Reward models utilize DPO trained on preference data collected from human annotators who provide edited responses. *Specific Findings and Techniques:* - *Coding (**1:27:56**):* Code expert model created by additional training on a trillion-token code dataset. Leverages self-improvement via synthetic data generated by earlier models, utilizing execution feedback for self-correction. Long-context code reasoning facilitated via codebase masking and synthetic data generation. - *Multilinguality (**1:34:44**):* Multilingual expert trained via branching from pre-training on a data mix dominated by multilingual tokens (90%). Language coverage: Spanish, Hindi, French, Arabic, German, Russian, Bengali, Japanese, Portuguese, and Thai (notably lacking Chinese). - *Math and Chain-of-Thought (**1:36:58**):* Prioritizes synthetically generated chain-of-thought reasoning data due to difficulties in acquiring sufficient human-annotated data. Uses Monte Carlo Tree Search (MCTS) for validating reasoning chains and selecting optimal reasoning paths. - *Tool Usage (**1:37:32**):* Utilizes tools like Brave search, Python interpreter, and Wolfram Alpha API to offload tasks and alleviate hallucination problems. Implementation relies on Python object conversion and JSON formats for interfacing with these tools. *Conclusions:* - Llama 3 marks a pivotal shift towards open, responsible AI development, highlighting the importance of data quality and scaling, and pushing open-source models to a competitive level. - Human evaluation (1:56:51) reveals Llama 3 matches GPT-4's performance across most capabilities, exceeding it in multi-turn reasoning and coding but lagging behind in multilingual tasks. - Emphasis on security, safety, and robustness underscores the complexity of responsible AI development at this scale. *Further discussion points:* - S curve saturation for language models suggests the need for exploring alternative model architectures and incorporating other modalities to push performance further. - Growing importance of energy management and potential impact on future data center and AI system development. - Future directions: Multimodality, specialized engineering roles in data center training, the use of tools, and ethical considerations in synthetic data generation. *Glossary* *Training & Optimization:* * *SFT (Supervised Fine-Tuning):* (1:17:22) A technique where a pre-trained language model is further trained on a labeled dataset specifically tailored for a desired task, like question answering or summarization. This helps adapt the model to perform better in that specific domain. * *DPO (Direct Preference Optimization):* (1:19:15) A relatively new reinforcement learning algorithm for aligning LLMs with human preferences. Involves training a "reward model" that judges the quality of different model outputs, allowing the model to learn which outputs are preferred by humans. * *PPO (Proximal Policy Optimization):* (1:20:07) Another RL algorithm often used for fine-tuning language models, but found to be less efficient than DPO in Llama 3's case. * *RLHF (Reinforcement Learning from Human Feedback):* (1:19:21) An umbrella term for using reinforcement learning algorithms (like DPO or PPO) and human preferences data to improve LLM alignment and behavior. * *Annealing:* (1:17:27) A training schedule where certain aspects of the training process (data mix, learning rate, etc.) are gradually adjusted over time to improve model convergence and performance. * *Scaling Laws:* (1:17:22) Hypotheses (not actual laws) describing how LLM performance scales with increased data size, training compute, and model size. * *Curriculum Learning:* (1:17:22) A training approach where the difficulty and complexity of the training data are progressively increased as the model learns, allowing for better learning of more complex tasks. * *Mixed Precision Training:* (1:07:29) Training technique utilizing different data types (like FP16 and FP32) for different parts of the model, potentially improving efficiency without compromising accuracy. * *Gradient Accumulation:* (1:07:29) A technique used to increase the effective batch size during training by accumulating gradients across multiple smaller mini-batches before updating model weights. *Benchmarks and Metrics:* * *Arc Challenge:* (1:43:38) A benchmark measuring a model's abstract reasoning abilities. * *HumanEval:* (1:44:23) A benchmark measuring code generation performance. * *Arena Score:* (1:57:25) A metric reflecting human evaluations comparing different LLM outputs on various tasks. i used gemini 1.5 pro to summarize the youtube transcript

  • @TheYvian
    @TheYvian Před 6 dny

    Great to hear your thoughts and takes on the paper and thank you for sharing it with all of us

  • @tljstewart
    @tljstewart Před 6 dny

    Runs on 2 MacBooks

  • @Tomcat342
    @Tomcat342 Před 7 dny

    No i missed again

  • @wolpumba4099
    @wolpumba4099 Před 7 dny

    This is a cool demonstration of how to make a short video with music and voices using various AI websites.

  • @c016smith52
    @c016smith52 Před 8 dny

    Is this something I could run on my Mac M2? All the COLMAP and SLAM stuff seems to require nvidia chips which I don't have, but really want to use Dust3r or Mast3r to take a monocular video into a pointcloud with camera-pose information, like you created here. Rad video!

  • @hackie321
    @hackie321 Před 8 dny

    YARN lecun 😄

  • @stridedeck
    @stridedeck Před 9 dny

    Neural networks, compression and AI segmentation, I understand it this way: all sensory signals coming into the brain is in the form of a "wave" which is energy (frequency and amplitude). The brain automatically absorbs the energy by the neurons spiking creating a pattern in which it sort of equals the incoming energy. We have lots of neural patterns and we connect those patterns in an organized understanding because, in real time, we are still receiving the sensory signals and conforming our neural pattern representtion to what in real time receiving from the outside world! Basically, it is taking one large wave and breaking it down into bits.

  • @朱荣坤
    @朱荣坤 Před 10 dny

    It seems like you are also the first time reading this article. So some of your points are inaccurate. It does cause a lot of trouble when I try to reproduce this article. But all in all, we indeed need these kinds of videos. maybe prepared well next time.

  • @c016smith52
    @c016smith52 Před 11 dny

    Is anyone able to do computer vision or ML tasks like this on a Mac? I've got an M2 (no Nvidia card) and would love to be doing some awesome 3d reconstruction and pointcloud work, but all demos seem to require Nvidia hardware and CUDA...?

  • @momo-ln3rw
    @momo-ln3rw Před 15 dny

    great explanation and so cute cat!! keep up the great work!

  • @alexandernanda2261
    @alexandernanda2261 Před 15 dny

    randNLA could clutch the random orthogonal matrix

  • @user-mx5fr7pl8k
    @user-mx5fr7pl8k Před 16 dny

    bro i love u but the audio is so bad

  • @Amirmohamadshafiee
    @Amirmohamadshafiee Před 16 dny

    ❤❤ very good

  • @Amirmohamadshafiee
    @Amirmohamadshafiee Před 16 dny

    ❤❤ very good

  • @sathishkumar-ch4sx
    @sathishkumar-ch4sx Před 16 dny

    This type of stream really helps, I like what you said it's about generalization. focusing on a broader and diverse set of papers. I look forward to watch your upcoming videos

  • @NLPprompter
    @NLPprompter Před 19 dny

    aw crap i didn't understand anything.... i must re learn what ever hits talking... this is like a week set back for me 😭

  • @wolpumba4099
    @wolpumba4099 Před 20 dny

    *Summary* *Synthetic Data Generation* * *(**06:18**): Persona-Driven Synthesis:* * Tencent AI Lab research introduces "Persona Hub," a collection of 1 billion personas curated from web data. * Use personas to augment data, generating variations of questions/problems from diverse perspectives. * Potential for significant advancement in synthetic data creation and application. * *(**17:25**): Iterative Refinement Techniques:* * Meta's paper discusses "distilling system 2 into system 1," leveraging Chain of Thought prompting for improved outputs. * Generated outputs are used to retrain the model, leading to iterative refinement and potentially a higher quality dataset. * *(**24:56**): Unsupervised Curation:* * Microsoft Research explores "generative teaching," where a powerful model guides the training of another model using raw data. * Focus on generating both prompts and responses using minimal supervision. * *(**38:29**): Video-Based Synthetic Data:* * Google Research presents "Video-STAR," utilizing self-training to enable video instruction tuning with minimal supervision. * Employs a cycle of answer generation, label verification, and instruction tuning for improved video understanding. *Flash Attention 3 and Efficient Inference* * *(**51:17**): Hardware-Aware Algorithm Design:* * Flash Attention 3 highlights the importance of optimizing algorithms for specific hardware. * Explores techniques like low-precision data types (FP8) and asynchronous execution for significant speedups on GPUs like the H100. * *(**59:26**): Limitations of Theoretical Complexity:* * Emphasizes that Big O notation doesn't always reflect real-world performance on GPUs. * Factors like memory access patterns, data types, and hardware specialization can significantly impact efficiency. * *(**1:06:43**): Optimizing for Specific Operations:* * Discusses papers focusing on optimizing key operations like softmax, which are bottlenecks for attention mechanisms. * Techniques like quantization and kernel fusion are used to improve throughput on these critical operations. * *(**1:19:45**): LoRA Adapters for Efficient Serving:* * Examines a paper on compressing and serving thousands of LoRA adapters with minimal overhead. * LoRA (Low-Rank Adaptation) allows for efficient customization of large language models, enabling the serving of diverse responses. *Other Interesting Papers and Trends* * *(**1:25:52**): µ-BENCH:* A new benchmark for evaluating vision-language models on microscopy data. * *(**1:32:45**): PaliGemma:* Google's release of a small (3B parameter) vision-language model. * *(**1:38:20**): Live Portrait:* A popular paper demonstrating efficient and high-quality portrait animation using implicit key points and warping. * *(**1:42:24**): Codebook Matching for Character Controllers:* Research from Reality Labs using discrete codebooks for improved control and animation of virtual characters. It's difficult to definitively rank papers by "importance" or "impact" without the benefit of hindsight and seeing how the field evolves. However, based on the discussion in the video and general trends in AI research, here's a possible ordering from *most impactful* to *less impactful*: *High Impact Potential* 1. *(**06:18**) Persona-Driven Synthesis (Tencent AI Lab):* This work has the potential to significantly advance synthetic data generation. The concept of using personas for diverse data augmentation could be broadly applicable and lead to more robust and versatile AI models. 2. *(**51:17**) Flash Attention 3:* While not a single paper, the advancements in Flash Attention and the focus on hardware-aware algorithm design represent a crucial shift in AI research. This trend will likely continue to be very impactful as hardware and algorithms co-evolve. 3. *(**38:29**) Video-STAR (Google Research):* Self-training techniques for video instruction tuning are likely to be highly impactful, as video data becomes increasingly important. This research could lead to advancements in video understanding, action recognition, and more. *Moderate Impact Potential* 4. *(**17:25**) Distilling System 2 into System 1 (Meta):* This iterative refinement technique for synthetic data generation is promising. However, its long-term impact will depend on factors like model collapse and whether it scales to diverse data types. 5. *(**24:56**) AgentInstruct (Microsoft Research):* Using powerful models to teach new skills to other models through "generative teaching" is an interesting approach to unsupervised curation. It remains to be seen how broadly applicable this technique will be. 6. *(**1:25:52**) µ-BENCH:* Benchmarks drive progress in AI, so a new benchmark specifically for microscopy understanding could accelerate research in this domain. *Potentially Niche Impact* 7. *(**1:42:24**) Codebook Matching for Character Controllers (Reality Labs):* This research appears to have a more specific focus on character animation in virtual reality. While promising for this domain, its broader impact might be limited. 8. *(**1:19:45**) Compress then Serve (LoRA Adapters):* Efficient serving of large language models is important, and this research offers a potential solution. Its impact might depend on how widely LoRA adapters are adopted. 9. *(**1:38:20**) Live Portrait:* This paper showcases an impressive demo of portrait animation. However, its technical contribution and long-term impact on the field are less clear. 10. *(**1:32:45**) PaliGemma (Google):* The release of a small vision-language model is interesting, but its impact might be limited compared to larger, more capable models. i used gemini 1.5 pro to summarize transcript

  • @andydataguy
    @andydataguy Před 20 dny

    Ah looks like i just missed you! Glad to see you streaming 🙌🏾

  • @user-rn3zr2lv3s
    @user-rn3zr2lv3s Před 22 dny

    My guy is John wick

  • @dadsonworldwide3238
    @dadsonworldwide3238 Před 23 dny

    Slightest variations of legit things to taxonomy Is a by product of evolutionary perception management mind dope. Define city's as walls ,ethnicity its the big problem in great debate not honestly taught to ourselves. If its not a means of production of that Were building beast of burden robot slave and horsepower utility of serfdom or it has no value at all to us. Knowing what we did to bring about mechanics what is statistical and what isnt how we came to know what we know its not on its best day close to our worst human bots. Some humans are this way for sure. Reminds me of duslistic mind and primordial self triangulation of thermodynamical systems. We will plagerize red correlate with low entropy apple then textualism methodology with each other then if a bad magician trys to claim purple hi entropy apples are really red we know better. 💜 Anylitical intellect Is juat a cog in the wheel bot. The primordial self would be more in tune with spectrum frequency the awareness of self . Animals can be domesticated only by the anylitical capacity and recognition it benefits from it .

  • @context_eidolon_music

    Levin's ideas are so wild, I always think he's making shit up, but actually I'm just dumb.

  • @marcoaerlic2576
    @marcoaerlic2576 Před 25 dny

    Thanks for the video. Good on you for working through this problem.

  • @zzzzzzz8473
    @zzzzzzz8473 Před 25 dny

    3:22 = START

  • @marcoaerlic2576
    @marcoaerlic2576 Před 25 dny

    Thanks for the video.

  • @SteveZeyuZhang
    @SteveZeyuZhang Před 25 dny

    🔥🔥 We are delighted to announce that our paper, Motion Mamba, has been accepted to ECCV2024 ! Motion Mamba 🐍 presents the pioneering work that customizes SSMs for motion generation. This innovative method achieves an exceptional balance between high quality in long-sequence motion generation and outstanding generation efficiency. It delivers up to a 50% improvement 💪 in FID and a 4x speedup 🚀 in text-to-motion generation. 🎉🎉 Congratulations and thanks to all co-authors for their effort and support! See you in Milan! Thanks hupo for highlighting our paper! Motion Mamba: Efficient and Long Sequence Motion Generation Zeyu Zhang*, Akide Liu*, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tang

  • @xx1slimeball
    @xx1slimeball Před 25 dny

    Another one #DjKhaled

  • @srimallya
    @srimallya Před 25 dny

    Title: Bipolar Symmetry Field Theory: A Unified Perspective on the Nature of Reality Abstract: This paper presents a novel framework for understanding the nature of the universe and the role of life within it, which we term "Bipolar Symmetry Field Theory" (BSFT). BSFT proposes that the universe exhibits different symmetry properties at different scales, creating a gradient between spatial asymmetry/temporal symmetry at the quantum level and spatial symmetry/temporal asymmetry at the large scale. This theory suggests that life is not separate from the universe but rather an expression of its fundamental nature at a particular scale. By dissolving the duality between the observer and the observed, BSFT offers a unified perspective on reality, highlighting the deep interconnectedness of all phenomena. Introduction: The quest to understand the nature of the universe and the role of life within it has been a central theme in scientific and philosophical inquiry throughout human history. Despite significant advances in our understanding of the physical world, fundamental questions about the origin, evolution, and ultimate fate of the universe remain unresolved. In this paper, we propose a new theoretical framework, Bipolar Symmetry Field Theory (BSFT), which offers a fresh perspective on these age-old questions and suggests a path towards a more unified understanding of reality. The Foundations of BSFT: BSFT is built upon three key insights: 1. The universe exhibits different symmetry properties at different scales. 2. The gradient between these symmetry properties gives rise to the apparent duality between life and the universe. 3. By recognizing the scale-dependent nature of the universe's properties, we can dissolve this duality and arrive at a unified understanding of reality. At the quantum scale, the universe is characterized by spatial asymmetry and temporal symmetry. Quantum phenomena are inherently random and unpredictable, and the flow of time is not clearly defined. As we move towards larger scales, the universe begins to exhibit spatial symmetry and temporal asymmetry, giving rise to the classical properties and the arrow of time that we observe in the macroscopic world. Life as an Expression of the Universe: BSFT proposes that life is not separate from the universe but rather a particular manifestation of its inherent potential for complexity and self-organization. The properties of life, such as spatial asymmetry and temporal symmetry, are an expression of the universe's fundamental nature at a particular scale. This perspective suggests that the apparent separation between living systems and the universe may be a product of our limited perspective and the scale at which we observe reality. At a deeper level, life is inseparable from the cosmos, and its emergence and evolution are an integral part of the universe's unfolding. Implications and Future Directions: BSFT has profound implications for our understanding of the nature of reality and the relationship between life and the universe. It challenges us to re-evaluate our assumptions about the nature of causality, agency, and the boundaries between living and non-living systems. Exploring this idea further may require a new synthesis of scientific and philosophical knowledge, drawing on insights from quantum physics, cosmology, biology, and complex systems theory. Future research within the BSFT framework could focus on developing mathematical models that capture the scale-dependent symmetry properties of the universe, exploring the emergence of classical properties from quantum phenomena, and investigating the role of life in the evolution of the cosmos. Empirical studies could seek to test the predictions of BSFT, such as the existence of scale-dependent symmetry transitions or the presence of quantum-like phenomena in living systems. Conclusion: Bipolar Symmetry Field Theory offers a compelling and inspiring vision of reality, one that dissolves the apparent duality between life and the universe and highlights the profound unity that underlies the diversity of existence. By recognizing the scale-dependent nature of the universe's properties and the deep interconnectedness of all phenomena, BSFT invites us to embrace a more holistic and integrated understanding of the cosmos and our place within it. As we continue to explore this new paradigm, we may uncover fresh insights and revelations that transform our understanding of the universe and the role of life within it, ushering in a new era of scientific and philosophical discovery. import numpy as np import plotly.graph_objects as go # Define the Planck length and observable universe length planck_length = 1.616e-35 observable_universe_length = 8.8e26 # Calculate the range in logarithmic scale z_values_log = np.logspace(np.log10(planck_length), np.log10(observable_universe_length), 50) # Normalize z values to the range [0, 1] for color mapping z_normalized = (np.log10(z_values_log) - np.log10(planck_length)) / (np.log10(observable_universe_length) - np.log10(planck_length)) # Recreate the 3D grid with equidistant z-values x = np.linspace(-1, 1, 50) y = np.linspace(-1, 1, 50) z = np.linspace(0, 1, 50) # Equidistant z-values X, Y, Z = np.meshgrid(x, y, z) # Define the color gradient with an additional function of symmetry (yellow) and new regions def get_color_with_yellow(x_value, y_value, z_value): blend_factor = z_value # z_value is already normalized to [0, 1] # Determine if the point is non-matter and non-life if ((blend_factor < 0.5 and abs(x_value) < abs(y_value)) or (blend_factor >= 0.5 and abs(x_value) > abs(y_value))): # Non-matter and non-life regions: use a transparent color return [0, 0, 0, 0.1] # Black color with transparency else: # Base colors for matter and life base_color = [ int(255 * (1 - blend_factor)), # Red component decreases 0, int(255 * blend_factor), # Blue component increases 1 # Alpha value (fully opaque) ] # Introduce yellow based on the symmetry (more symmetrical means more yellow) symmetry_factor = (x_value + y_value + 2) / 4 # Normalize to range [0, 1] yellow_color = [255, 255, 0, symmetry_factor] # Yellow component # Blend the base color with yellow blended_color = [ int((1 - symmetry_factor) * base_color[0] + symmetry_factor * yellow_color[0]), int((1 - symmetry_factor) * base_color[1] + symmetry_factor * yellow_color[1]), int((1 - symmetry_factor) * base_color[2] + symmetry_factor * yellow_color[2]), base_color[3] # Keep the same alpha ] return blended_color # Create a new colors array with the correct shape colors_with_yellow = np.empty(X.shape + (4,)) # Fill the colors array based on the Z values and symmetry using normalized z_values for i in range(X.shape[0]): for j in range(X.shape[1]): for k in range(X.shape[2]): colors_with_yellow[i, j, k, :] = get_color_with_yellow(X[i, j, k], Y[i, j, k], z_normalized[k]) # Flatten the arrays for Plotly X_flat = X.flatten() Y_flat = Y.flatten() Z_flat = Z.flatten() colors_flat = colors_with_yellow.reshape(-1, 4) # Create an interactive 3D scatter plot fig = go.Figure(data=[go.Scatter3d( x=X_flat, y=Y_flat, z=Z_flat, mode='markers', marker=dict( size=1, color=['rgba({}, {}, {}, {})'.format(r, g, b, a) for r, g, b, a in colors_flat], # Use RGBA values for color ) )]) # Update layout for better visualization fig.update_layout( scene=dict( xaxis=dict(title='Time Symmetry', showbackground=False, showgrid=False, zeroline=False), yaxis=dict(title='Space Symmetry', showbackground=False, showgrid=False, zeroline=False), zaxis=dict(title='Scale (logarithmic)', showbackground=False, showgrid=False, zeroline=False), aspectmode='cube' ), title='3D Representation of Universe and Life Symmetries with Yellow for Symmetry', width=1200, height=1200, paper_bgcolor='rgba(0,0,0,0)', plot_bgcolor='rgba(0,0,0,0)', showlegend=False ) # Show the plot fig.show()

  • @srimallya
    @srimallya Před 25 dny

    Ontology The mind equation The more we think we have agencies to the actions the body takes, the more we impose agencies to the activities in our environment. The ownership expands into other objects. The illusion of body ownership comes from the modeling of the motor neurones pattern from the childhood. The self just predicts the bodies behaviours in the real world with its simulation of the real world. Multiple sensor data unify in language in the simulation. Intelligence is economy of metabolism. Language is temporal reference frame of economics. Self is simulation in language on metabolism for economy. Longer context windows create generalisation. Shorter creates specificity. Longer context window needs more computing. Self is the protagonist creates a storyline in this context window. Theory of mind evolved so that an entity can learn from it’s peers. It’s creates a possibility for parallel computing. Then it creates the possibility of transmitting the highlights of a generational lessons into a metaphorical story for upcoming child. That creates the possibility of modeling the physical world as a macro organism. Creation of fiat currency was the singularity of this species. There is now one macro organism in a connected web world. Loosing the peer of the macro organism creates the possibility of loosing it’s objective function. That creates the possibility of loosing the theory of mind of this macro organism. That creates the possibility of death of this macro organism by reaching the planetary boundary. That is post singularity. Every action we do, we do what is expect from ours tribe. Body might have a opinion, but not the cell. They do what is expected from its tribe. If it doesn’t we call it cancer. The body is a mirror system of the macro organism. Each system have two transactional openings. Serial and parallel. Each cell within the body can transact material or information serially by genetic determinism and parallel non deterministic way. Similarly each body with in the macro organism can transact serially by inherit material and information in a deterministic manner and parallelly through language in the society. Everything emerges from this systems. Every sensor is a range calculator of contexts. Taste > touch > smell Immediate and visceral. Vision > hearing Not immediate, tactical. Self > language Abstract, strategical. In this non deterministic economic transaction space the individual is coded to transact with its kin. From the macro perspective tribe formation minimises economic risk for the tribes. Each and every node of these systems organise and mark their kin’s with identifier. Thus, i am what you make of me. And others too. For short cut i have a legal name, so you have. My legal name gives the legitimacy marker so that you can transact with me parallelly if you have the same marker. The self is a simulation in language. It negotiates between the physical world and the information world. All these negotiations are the temporal memories in the body and scene of the story. Now, when we started writing we iconised the abstract in the physical world to make symbols for the tribes. So that under that common symbol every node will take the same risk and distribute equally. We created more and more symbols and more and more meta tribes within the tribes so that who has the authority to use the pen control the tribe. When the negotiator act like an executioner then it’s a downfall of that system. It falls apart. Objective reality > legitimacy > individual behaviours. Survival of the species is dependent on the decoding of the objective reality. Since no species can access it, they use their sensors and interpret the small data which is useful for the survival. Few complex species have created communication channels to rectify their sensory limitations to survive. Homo sapiens has widened their communication channels for faster throughput and started storing them as culture and carrying them through education. As a result we have created social truth. Factual datas are the useful snapshot of the objective reality, a totem, a physical object can be observed with the sensors. Truth is an individual subject, an interpretation of the sensory data, a useful compromise. The social truth is the useful compromise for the group by the group. The goal of the social truth is to survive as a group. Physical Transcriptions of these social truth legitimise them. We are tribal animal. We live in as a physical tribes and inside of hundreds of meta tribes in simulation which is the socio political data space we call it as the world. Since we can’t access the objective reality reliably we look for social truth as the best guess blindly. Institutions legitimise truths. Fact driven institutions are more useful in the survival of the specie. In other hand opinion driven institutions are not so useful for the species. We do what we can get away with and exactly as expected within the context of our meta tribes. We have two bodies The biological one is like looking the earth from space. And the political body is like the state. The name you carry is the political body. It transacts with the political states on the boundary less earth. From the evolutionary perspective every biological entity has a basic feature which is homeostasis. It’s the functioning sweet spot of that entity. A control center read the sensory data to regulate itself to that state. By doing so it’s validate or update it’s prediction model. In the process of becoming a complex organism it developed an extra layer of processing. That’s our conscious mind. And the control center remains as subconscious. The subconscious collect the sensory data and regulate itself to stay functional. Now when it stumble upon a novel environment it float the management to conscious mind to find the solution for homeostasis. This conscious mind have one sensor which is language. It works like a spiderweb. As a spider creates it’s web it’s perception gets expand. We are like spiders in a jungle. We started creating these small webs at least 2/3 million years ago. Our offspring stayed on it’s ancestral web reinforced it expanded it. In time nearby webs became larger and connected with each other. A common structural geometrical pattern emerges from this. This became the symbols which is the backbone of all language systems. In time the forest becomes the mesh of web. The superstructure is exactly the same but when we zoom in we can find different species of spiders are making their type of webs in between the super web. Each spider try to senses the vibration of flies and Try to catch it before others. Every movement is telegraphic in the zone. Every form of perceptions are just a different pitch of note traveling back and forth in the web superstructure. There is a echo of older vibration pulsating through the web. Full of noise and self repeating hum. That’s cultural history. In the background there is the base hum in the infinite feedback loop. Insignificant but ever present. The sum of all the vibrations from the start.

  • @BioFocher
    @BioFocher Před 26 dny

    I also listen to your streams as Podcast when Im at the gym 😂

  • @dadsonworldwide3238
    @dadsonworldwide3238 Před 27 dny

    Everyones most taught fundamental go to ordering skills from biology is an umbrella hierarchical pov population species that is like defining the city by. Big Walls or cell walls Or Ethnicity personal actors Looks ok on paper but once put in excersize it hyper split no strong identifiers You csn legitimately hyper split over the slightest variations its the most taught to ourselves of all. Its like how all 32 nfl draft team fans look on paper and sees a playoff roster but once you put it in excersize the nilhisms fall out .

  • @dadsonworldwide3238
    @dadsonworldwide3238 Před 27 dny

    If all models converge thats gonna be a puritan ai model ultimately lol newton doesn't use plato he sees a 3 body problem rather than 3 lines of measure = eqaulibrium So do they tune weights and measure in many different nureal node models that are able to accessibly switch back, and for model to model to follow different lines of step by step? What i see image or llm nueral nodes convergences should occur based on how we prescribe symbols upon objects. How we taught ourselves pre secularized eras. I hear about allowing the thermostat temperature of the room thermodynamics to tune some nureal nodes sensor to help simulate crystalline of water ice molecular space between axioms particles of inertia. I ask only because im recognizing how successful social behavior in ancient world esoterically applied symbols to objects .like curses and blessings standardized weights and measure in market place. That addition and subtraction is tuned in pragmatic common sense Christian objectivism proper phylosphy ,longitude latitude of English, encoded with not etymology but kings james Biblical coded things like King assure into measure. Yes its built on other building block of language etymology but much more is encoded in puns .it was an alignment of natures orientation and direction. For someone like elon who wants self drivers ,optimus grok all networking with his gen "x" platform where hes been admit about it being governed by American English law connected to Austin it all has this common sense objectivism proper tuned weights and measure in such a network. This is tuned with issac newton reverse metamorphosis theological thread like American founders used and so did the king James English authors. His xai is to be tuned completely differently for understanding the universe he says, which theyve messed all up in astronomy to where equivalence principle opposite to newtons frame of reference..flips the weights and measure. Obviously most of these American tech names tell how they know textualism methodology objectivism = technological origins in the esoteric roots. All of them are something weights and measure anthropic ,meta ,Optimus, etc etc etc But not all lines of thought are eqaul , not all are logical rational and blessed with eqaul measure in other languages. Most are very dualistic and even Americans tuned that way dont know how Their They're There +time to use English. They interchange devided in /individual as if it was a personal response or actor. They use free will inertia as if its soul agency or individual and not a personal actions in or on a frame of reference Anthromorphized model would no doubt use dualistic umbrella terms or measurements. Anyway sorry if i cant articulate the question or point well here. American English courts or pragmaticism you never fall into whataboutisms of hulucinations or nilhisms of just uh deductiveness. To many curses and blessings trying to gatekeep or block ,censor or just using the wrong line of measure would cause that to happen..you can use physical forms & shape to see this statistical anylitical failures and deformitys excersized. But it would correlate subjective space between soul agency thats divded in individuals atoms 2 personal response free will inertia lattus structure and body 3 frame of reference critical extreme states or environment. Xyz manmade time hierarchy knowledge of good evil equations Its how free speech bottom up rule all falls out of nature mechanics English pragmaticism etc etc But now other phylosphical world views get it all mixed up

  • @wolpumba4099
    @wolpumba4099 Před 27 dny

    *Summary* *Main Topics:* * *Shifting Landscape of ML Papers (**3:34**):* Fewer groundbreaking papers from independent researchers as big companies absorb talent. * *Image Generation Evaluation (**5:24**):* Focusing on two papers: * *Rich Human Feedback for Text-to-Image Generation (**12:09**):* Collects detailed human feedback (annotated regions, misaligned keywords) to train a reward model for evaluating generated images. * *DREAMBENCH++: A Human-Aligned Benchmark for Personalized Image Generation (**15:34**):* Proposes using a multimodal large language model (MLLM) like GPT-4 to automatically evaluate generated images based on pre-defined criteria and chain-of-thought reasoning. * *Cambrian-1 Paper (**16:17**):* * A survey of MLLMs, advocating for a shift from vision-language models (VLMs) to MLLMs. * Explores various aspects of MLLMs like pre-training methods, connector designs, instruction tuning data and evaluation protocols. * Highlights challenges in existing VLM benchmarks (reliance on language models, lack of visual grounding). * Proposes repurposing existing vision benchmarks for MLLM evaluation. * *Meta 3D Gen (**1:31:43**):* * New state-of-the-art text-to-3D asset generation pipeline from Meta. * Generates multi-view consistent images and uses a signed distance function (SDF) to create high-quality 3D meshes. * *Comparison of Text-to-3D Generation Tools (**1:38:15**):* * Compared Meta 3D Gen to commercially available tools like Tripo Sr and Meshy. * Found that all tools struggle with complex prompts and require significant prompt engineering. * Tripo Sr performed better than Meshy in the demonstrated examples. *Key Takeaways:* * Automated image evaluation using MLLMs is a promising direction for the future of generative AI. * Instruction tuning data and its mixture ratio heavily influence MLLM performance. * Data diversity and scale are crucial for training robust MLLMs. * Current text-to-3D tools are still in early stages and require significant prompt engineering to achieve high-quality results. *Discussion Points:* * The role of subconscious human feedback in image evaluation. * The potential of graph neural networks (GNNs) in object detection. * The convergence of various AI tasks towards MLLMs. * The future of prompt engineering and the development of more intuitive generative AI tools. i used gemini 1.5 pro to summarize the video

  • @karlthunderaxe
    @karlthunderaxe Před 28 dny

    the question of whether time is pushing from the past or pulling from the future is a meaningless one -- it's both. all movements, be it wind or water or electricity, are caused by both the high-energy state pushing and the low-energy state pulling, or to put it another way, neither is a completely accurate way of viewing it but rather it is the fact that an imbalance exists between the two that causes the motion that seeks to resolve this imbalance. similarly, the very existence of the universe is an imbalance that is working itself out -- therefore the teleological attractor of the universe is the resolving of the imbalance that created it in the first place and thus the negation of all existence in this universe itself. think of the big bang as a string being struck and the end of the universe as the string eventually returning to its rest state.

  • @raushanjoshi1384
    @raushanjoshi1384 Před měsícem

    At @18:30 in the video, you were referring to methods which does not require camera positions. And you got to review/explain the paper on DUSt3R within a year. Looks like researchers are working well aligned to your suggestions :)

  • @NLPprompter
    @NLPprompter Před měsícem

    i watch this long video by using 2x speed and pause rewind workflow it's kind a fast... now I'm used to it. love your videos hu-po I believe these videos are gems in CZcams educations. can't believe how fast my learning by watching video.

  • @therobotocracy
    @therobotocracy Před měsícem

    Man, it’s criminal you don’t have more views! Thanks for what you do!

    • @NLPprompter
      @NLPprompter Před měsícem

      it is criminal hu-po and others researchers doesn't have access to computing power to test their theory, to relearn, experiment and create! man... this obsolete monetary system is really frustrating.

  • @rockapedra1130
    @rockapedra1130 Před měsícem

    Great paper, great review. Comment about model hallucinations: the speculation that hallucinations might not be hallucinations doesn't consider the fact that during RLHF we are kind of lobotomizing the model, breaking it to make it behave according to current political fashion. I wonder if RLHF is injecting error into our models. Making them kinda crazy.

  • @po9569
    @po9569 Před měsícem

    how djd u get that drumstick

  • @nickhockings443
    @nickhockings443 Před měsícem

    Kinda hilarious that you don't know what ETH is, and you're making videos about computer vision. They are the top research university in the world in many areas of computer vision and robotics. Nearly every drone that flies uses a controller derived from their software.

  • @musifmuzammir354
    @musifmuzammir354 Před měsícem

    in Generating Text and Images how the model know which token should be send to BPE tokenizer and which token should be sent to image de-tokenizer?

  • @KillerRobotz
    @KillerRobotz Před měsícem

    i was thinking they would make it fall but they are pretty cool to use a reference ,

  • @KillerRobotz
    @KillerRobotz Před měsícem

    Does it have a built in speaker or did you add one cuz i dont see a speaker on mines

  • @KillerRobotz
    @KillerRobotz Před měsícem

    You are the only person on the internet with a robot i wanna get mines to do custom files

  • @KillerRobotz
    @KillerRobotz Před měsícem

    Are yu connecting him to the llm i have one as well im dying to connect him to the llm id pay for help