Value Semantics: Safety, Independence, Projection, & Future of Programming - Dave Abrahams CppCon 22

CppCon

zhlédnutí 15 706

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 5. 05. 2024
cppcon.org/
---
C++ Value Semantics: Safety, Independence, Projection, and the Future of Programming - Dave Abrahams - CppCon 2022
github.com/CppCon/CppCon2022
Support for first-class user-defined value types may be among C++'s greatest strengths-one that most recent language designs have sadly failed to emulate. That said, although value types are everywhere in C++, we don't have a commonly accepted definition of “value semantics”, and we tend to use the phrase with only an intuitive idea of what it means. This talk offers a deeper understanding of value semantics, defining it in a way that in turn reveals surprising truths about programming in general. We'll expose the value semantics that underlies our mental model even when we're “forced” to use pointers or references, and discuss how a future C++ might close that expressivity gap, improving safety, performance, and programmer confidence. We'll conclude with some guidelines you can use today to improve your programs, and propose the next must-see session for value semantics lovers.
This presentation lays groundwork for another talk, “Val wants to be your friend.” If you're interested in that talk, you'll want to see this one first.
---
Dave Abrahams
Dave Abrahams is a founding contributor of the Boost C++ Libraries project and the founder of the first annual C++ conference, BoostCon/C++Now. He is a contributor to the C++ standard, and was a principal designer of the Swift programming language. He recently spent seven years at Apple, culminating in the creation of the declarative SwiftUI framework, worked at Google on the Swift for TensorFlow project and, briefly, on the Carbon language, and is now a principal scientist at Adobe's Software Technology Lab.
---
Videos Filmed & Edited by Bash Films: www.BashFilms.com
CZcams Channel Managed by Digital Medium Ltd events.digital-medium.co.uk
#cppcon #programming #cpp
Věda a technologie

Komentáře • 49

@SamWhitlock Před rokem ⁺¹⁴
This is gonna go down as one of the greatest C++ talks of all time!
@DanielLidstrom Před 4 měsíci ⁺¹
The final take is to switch to Rust?
@youknowwho5900 Před rokem ⁺¹⁵
So happy to see Dave back. A very timely, charmingly intelligent and even entertaining presentation of quite an important programming paradigm shift. Thank you.
@josseldenthuis498 Před rokem ⁺⁸
This is a great talk. It made me think about how you can apply this to large distributed applications. And I believe content-addressable data structures are a good example of how that might work. It allows the "parts" to be stored in different locations, or even machines, while the "whole" retains its value-like nature. For example, it's what git uses to store files and commit histories.
The problem is that pointers, references, indices, iterators, filenames, database indices etc. are all representations of a location. They are great for quick and easy access to (the contents of) a value. And they're necessary for mutability. But that makes them susceptible to all the problems mentioned in the talk. Even the type-safe ID mentioned at 44:19 does not prevent suprise mutations. It's like a Linux file descriptor. Which is an int, behaves like a value and provides some protection by preventing direct access to in-kernel data structures. But it does nothing to prevent another process from modifying the file behind your back.
If instead you represent a relationship to a "part" as a (type-safe) hash of the related value, you do get all the guarantees of value semantics. If anyone changes the value, the hash will change. But your original relationship (or hash) still refers to the original unchanged value. It's basically an enforced form of copy-on-write.
You can implement this with a simple associative container (or database), which only allows insertion and const lookup. Of course you still have to handle lifetime management. But that can be done easily with reference counting, since circular references are impossible with value semantics. All this should give you the benefits of value semantics, without unnecessary copying, at the cost of associative container lookup when accessing the contents.
@MSK_MKT Před rokem ⁺¹
Value semantics is native to parallel applications since there is no shared state... Many of the software packages in scientific programming suffer from tons of references and no one can make them parallel because that would require everything to be rewritten from ground.
> content-addressable data structures
Actually I am currently writing something that works in this principle. This approach allows type-specific dispatch and extremely easy parallel/vectorization.
@AbrahamsDave Před rokem ⁺⁶
Thanks for the kind words!
> The problem is that pointers, references, indices, iterators, filenames, database indices etc. are all representations of a location… but that makes them susceptible to all the problems mentioned in the talk.
Actually, no, array (or more generally, Collection) indices are not susceptible to the problems mentioned in the talk, because they don't confer access to an element's value *by themselves*. To read or write, you also need access to the thing you're indexing _into_, and if the Law of Exclusivity (LoE) is upheld, it causes no problems for local reasoning. You're accessing a part of a whole, just as though it was a member (the part) of a struct (the whole).
Filesystems and databases and threadsafe data structures are naturally-shared resources and thus are subject to logical races, which makes them even _harder_ to program than shared thread-local state. That's a reality, and as far as I know we only have ad-hoc ways to identify the patterns that work. For example, I can break down an algorithm that builds a set (possibly trying to add the same element multiple times) into parts that add elements to a shared set, and I can prove those are semantically equivalent. That use pattern works, but not if the algorithm also deletes set elements. I'm dying to know how to systematically distinguish the patterns that work from the ones that don't.
> If instead you represent a relationship to a "part" as a (type-safe) hash of the related value, you do get all the guarantees of value semantics. If anyone changes the value, the hash will change. But your original relationship (or hash) still refers to the original unchanged value. It's basically an enforced form of copy-on-write.
I guess I don't understand. Whole-part relationships are the ones that are trivial to maintain and reason about: you just use composition. If you're actually talking about extrinsic (non whole-part) relationships… A hash is an interesting way to answer the question of whether a given value is on one end of a relationship, but it can be expensive, is probabilistic, and can't be used to locate the value itself unless you also store the values in an associative container. Indices work just fine for this purpose if the LoE is upheld.
@mohammadmahdifarnia5358 Před rokem
37:10 swifty way to write C++. I love it
@fdwr Před rokem ⁺⁶
37:20 Dave, regarding "in" and "inout", have you talked with Herb Sutter? You'll want to be sure both of your intentions for the keywords align.
@AbrahamsDave Před rokem ⁺⁶
Yes, I've spoken with Herb. There's a lot overlap with what he's proposing; he arrives in a similar place by building a system "owner" objects. His version of `in` doesn't quite align with mine IIUC; for some reason he is disallowing multiple simultaneous `in`s of the same value even though his `inout` (with his lifetime profile rules applied) is already exclusive, so all dangers of aliasing are eliminated. But I'm not trying to push these features into C++, so if these keywords end up in C++, they'll likely be the ones Herb is specifying.
@kormisha Před rokem ⁺¹
Great talk, thank you! What’s the name of the other talk from Dimitry (?) that Dave mentioned?
@AbrahamsDave Před rokem
"Val wants to be your friend": czcams.com/video/ws-Z8xKbP4w/video.html
@raghavmehta1232 Před rokem
Really interesting talk. Does anyone have a link to the “Val can be your friend” talk that he talks about in the end?
@AbrahamsDave Před rokem
Apparently that will be out in 10 days, or so I hear…
@AbrahamsDave Před rokem ⁺²
Et voilà: czcams.com/video/ws-Z8xKbP4w/video.html
@DanielLidstrom Před 4 měsíci
How about you change offset into a pure function and instead return a new value?
@pedromiguelareias Před rokem
You C++ experts are basically reinventing Fortran 95/03/08.
@tomekczajka Před 7 měsíci ⁺¹
If you use IDs of objects stored in an array instead of pointers, aren't you basically implementing your own memory allocator and your own pointers inside your arrays, thus reimplementing something like a safe language with reference semantics? It seems like it doesn't solve the problems with reference semantics but rather just moves the problem into this simulated environment.
@tomekczajka Před 7 měsíci
He answers a question about this towards the end by saying that IDs differ from pointers in that IDs don't give you direct access, you need to access the data separately through the array. But that seems irrelevant: all the practical problems he mentions, such as "spooky action at a distance", still reappear in this model. What's behind the ID can change even when you're the only one holding the ID.
@ChristianBrugger Před 7 měsíci ⁺²
The ID is the only thing you own. And that cannot change through spooky action, as you are the only one having access to.
The data is not yours, and you have no access to it. It might change, but how could you even tell.
Now regarding the relationship. Making sure that it points to the right data, this needs to be managed by the component that owns you and the array.
Now this manager-component is also the only one who has access to the array and you, so it can fully maintain that invariant. E.g. there cannot be another component accessing the array or you without its knowledge.
@tomekczajka Před 6 měsíci ⁺¹
@@ChristianBrugger "how could you even tell"
Not sure what you mean by "you" here. Some function is using those IDs to access the global array of objects, and that function can tell that the object has changed since the last time it was accessing it. "Spooky action". Note that this is *not* about multithreading, it's about different parts of the code accessing the same data. I don't really see the difference between this and what the lecturer was talking about in regards to managed languages that allow sharing.
"Now this manager-component is also the only one who has access to the array and you, so it can fully maintain that invariant."
Well, but the whole program could be written in that manager-component. I could say the same exact thing about languages with sharing -- the part of the program that is messing with the graph of objects can maintain invariants. The whole point is not that it's impossible, the point is is that is error prone, and I don't see how putting the code in the manager helps.
You could say that you would split the program into several such manager-components. True, but by the same token you can also split a Java program into separate modules that don't share data with each other.
@4otko999 Před 10 měsíci
46:22 what took over boost graph library?
@AbrahamsDave Před 6 měsíci
Andrew Lumsdaine’s more recent work IIUC
@4otko999 Před 6 měsíci
@@AbrahamsDave this is still very cryptic. Does this work have a name?
edit: nvm, probably found it (NWGraph)
@Swedishnbkongu Před rokem
Is the Val talk not online yet?
@AbrahamsDave Před rokem ⁺¹
Video coming soon apparently, but slides are here: github.com/CppCon/CppCon2022/blob/main/Presentations/Val-at-CppCon-2022.pdf
@AbrahamsDave Před rokem ⁺²
Et voilà: czcams.com/video/ws-Z8xKbP4w/video.html
@Swedishnbkongu Před rokem
Thank you!
@ABaumstumpf Před rokem ⁺³
We have pointers and references cause we need them.
some small examples are fine, but in the code-base we are working on for example we have some data-structures that we need in multiple independent processes and they all need to be able to read and write to that data, seeing changes other processes made. We have network-communication were multiple threads are listening for different messages but those messages, once received, must be handled in strict sequential order.
Having dynamic memory without pointers would also be quite... tricky.
Or if a want to just notify another thread - in Thread A i did some calculations and now i need a way to pass Thread B the information that the object has been manipulated and it can now start working on it.
And i am a bit sceptical about "done right it is faster". When/how?
As an example lets say we have a program dealing with transactions. So we have a company where one department has an account for office-supply expenses. To make code that deals with that more readable of course i would first declare a reference to that specific account - or how else would i deal with that? Should i copy that account? Then how do i update the original account when i am done?
The suggested "inout" would be just another way of writing "&", albeit i must say way more readable. This is one thing that some other languages, even Oracle PL/SQL have gotten right (that language also has named parameters that even allows you to reorder them). It would also be good if the C++committee gave us something like C had for over 20 years - "restrict". Of course would be nice to have it be a bit more meaningful, but still a solution for aliasing of function-parameters has been around so long.
@youknowwho5900 Před rokem ⁺²
You seem to reject the idea outright siting "what about this and what about that" (or so it appears). It's unwise. Life is not black&white and no one's taking ptrs and refs away from you. And the concept's been around for quite some time (Dave "merely" communicated it to even wider audience). And yes, deploying value-based semantics is not a quick late code change here and there. It affects and has to flow from the initial design to implementation. An ability to deploy the concept provides considerable benefits wrt overall throughput (in MT env) and code maintenance. In MT env. you have to ensure unique data ownership and/or access and other languages are addressing the issue as well (see Rust). In C++ you do that via mutexes/serialization... or the value-based design. Nothing is ever free and you pay the price one way or the other. But now you can pick a better one... and, if your program does anything serious, then the ability to parallelize some complex algorithm might outweigh the price of copying. As for "restrict", then (if I remember correctly) it's pretty much useless as it's only a hint to the optimizer (can be wrong here as haven't been using C since 93).
@masondeross Před rokem
In you example, you would pass the account as a constant reference, use that to provide relevant data while constructing the new account, return the new object by value using RVO to elide the copy. That translates to you building a new object in place at the call site using the old account as the input data; nothing is moved (let alone account objects being copied), and only as many fields as need to be copied are copied while the rest is built in place using your new data. But it is done in a function somewhere for reasoning purposes, even though behind the curtain it is all done in place thanks to compiler magic. Building the new object in place is faster usually than modifying every individual field, because for fields like strings it would be a new object either way and for fields like ints a new int is just as fast as changing an int etc.
Those fields being "copied" can be pointers to immutable data still, if you are thinking it is excessive copying for some obnoxiously large part of the account data. The key is that you keep everything that can be immutable... immutable. And take advantage of the modern hardware and compiler ways of using value semantics with often no overhead depending on whether you are using the right abstraction for the job.
@masondeross Před rokem
@@youknowwho5900 I don't know about restrict, but I know using the keyword register was a suggestion/hint/prayer and not an actually way of using registers.
@AbrahamsDave Před rokem
> we have some data-structures that we need in multiple independent processes and they all need to be able to read and write to that data,
Absolutely, there are real use-cases where that that's important. But if you give me an arbitrary such system with multiple readers/writers on shared data, I don't know how to reason about its behavior. With the right set of guardrails and patterns, one can make a system that can be reasoned about, but as I mentioned to @josseldenthuis498, in general I don't even know how to identify which guardrails and patterns work. To maintain local reasoning, use of shared mutable state always ends up being tightly controlled. The rest of your code can use value semantics.
> And i am a bit sceptical about "done right it is faster". When/how?
For example
1. graph algorithms run faster on graphs represented via adjacency lists encoded as arrays of arrays of integers than they do on graphs built by separately allocating vertices and representing edges as pointers.
2. with a true guarantee of exclusivity, a program doesn't need to access main memory to reload values that have a local guarantee of immutability.
3. SwiftFusion implemented using value semantics demonstrated a 10x speedup over the OO-oriented GTSAM (written in C++)
> The suggested "inout" would be just another way of writing "&"
No, `inout` comes with additional guarantees. `restrict` is weak by comparison: it just says "optimize as though the pointee is independent” but it doesn't do anything to statically guarantee independence, the way `inout` does.
@AbrahamsDave Před rokem
@@masondeross wrote: "the key is that you keep everything that can be immutable... immutable" Making everything immutable is one path to value semantics, but it can be incredibly expensive. You don't have to give up mutation to get value semantics. My talk is promoting mutable value semantics: www.jot.fm/issues/issue_2022_02/article2.pdf
@thelatestartosrs Před 6 měsíci
47:20 hmm
@lexer_ Před rokem
I get the point of why value semantics get promoted (again) more and more in recent times but I am very confused about how this talk presents the application of these in practice.
On one hand, garbage collection as a solution is bad because it has a lot of overhead. Pure solutions like haskell are bad because they are hard to use. Same goes for rust borrow checking. So we should instead incur a similar performance overhead by abstracting away our pointers and references as values?
I totally understand the point that this is a much more maintainable, scalable, and safe approach to coding. But I don't understand why it matters where you pay the performance cost for it. You can pay it in C# or the JVM, you can pay it in mental overhead with rust, or you can pay it in C++ by incurring additional layers of indirections on things there were pointers and references previously as well as additional copying and bulkier types, or you can just use swift... I guess?
So is the point essentially don't use c++ for new code and fix existing code with value semantics? Why would you choose a language like C++ only to use it like haskell?
I am obviously passing over a lot of nuance here but at the end of the day, without getting caught in the weeds, to me, this seems like a very incoherent solution to the real problem. I completely agree on the problem. But this talk doesn't address any of the crucial problems with the suggested approach. It just kind of acknowledges them but then just passes over them without actually addressing any of it.
@AbrahamsDave Před rokem ⁺⁴
> On one hand, garbage collection as a solution is bad because it has a lot of overhead.
It also doesn't solve the thread safety problem.
> So we should instead incur a similar performance overhead by abstracting away our pointers and references as values?
That question sort of starts from a presumption that pointers and references are fundamental to your problem, and are part of the most efficient way to represent it. (Note that `in` and `inout` are _implemented_ as pointers of course, but they are slightly more efficient due to the LoE). That is not necessarily the case. But even when pointers _are_ part of the most efficient representation of your program, you can encapsulate them in a type with value semantics and recover a safe/understandable programming model.
> I don't understand why it matters where you pay the (performance) cost [parentheses mine; you talk about non-performance costs too]
This approach to coding is very difficult in C# or Java; both languages are hostile to mutable value semantics and copy references to mutable state liberally. To get independence, you either need to make everything immutable or clone objects defensively. About C++, value semantics doesn't introduce layers of indirection; if anything it tends to remove them, turning indirection into value composition.
Stepping back, as your own list demonstrates, every approach to programming has costs, and the costs for achieving value semantics in different languages and idioms are not equivalent. Engineering is about trade-offs. For me, the difference between paying a performance penalty and satisfying a borrow checker is very significant, but if all these things really are equal to you, I guess I can only add that a language built around the idea of independent mutable values can make programming this way more performant and more accessible. That's one of the premises behind Val.
> Why would you choose a language like C++ only to use it like haskell?
I can't answer that, and it's not what I'm proposing. Haskell doesn't allow mutation. Mutable value semantics is a very different thing.
> But this talk doesn't address any of the crucial problems with the suggested approach.
I disagree. The `in` and `inout` features would allow us to rule out problematic aliasing in new code. But if you're saying C++ and its existing code is a mess and adding a couple features isn't going to fix that, I certainly agree-part of the reason I'm not going to be writing a formal proposal for them. For a more comprehensive approach to the problem (which also won't fix the mess that is C++ 😘), see Dimi's talk on Val, and please treat this one as laying the groundwork for that.
@lexer_ Před rokem
@@AbrahamsDave There are all good points, and I acknowledged that I passed over a lot of the nuance you point out here.
I think my main problem is that we already have the tools to represent these semantics if not as explicitly at least in terms of the machine code a compiler would actually generate from this. In the best case you get equal performance but as the sad example of unique_ptr demonstrated 0-cost is almost never 0-cost in practice because of a whole lot of complex considerations around calling conventions and exceptions.
But what really bugs me is how it is presented. Maybe I am just too sensitive but you present it like if we just had those simple features all our problems would go away and you never really talk about the problems this approach would bring with it.
So it doesn't sound like this is an approach with tradeoffs, it sounds more like a sales pitch on a shopping channel.
I guess you could take this more as a feedback on your presentation technique and less as a factual critique of the material you presented if that wasn't at all your intention.
@AbrahamsDave Před rokem
@@lexer_ We have the tools to represent pretty much _any_ semantics in terms the machine code a compiler would actually generate. If your program is correct and performant, it doesn't matter what language features you wrote it with. The problem is that reaching correct and performant code is hard. Mutable value semantics is about principles that define away large classes of (but not all) bugs and performance problems.
I'm sorry if it sounded to you like I said there are no downsides to this approach, but I did point out the two I know of (at czcams.com/video/QthAU-t3PQ4/video.html): it may be unfamiliar, and you have to give up the convenience of direct access from one object to other objects that are not a logical part of it. If there are other downsides, I'm honestly not aware of them, and I would be grateful if you'd describe them.
@phenixwutao Před rokem ⁺¹
but where to find a C++ job?
@MuharremGorkem Před rokem ⁺¹
Value semantics is a temporary solution until someone unifies quantum mechanics and general relativity.
@masondeross Před rokem
That will be for the AI that is replacing us writing software in the future to worry about, assuming the sun hasn't burned out by that time or we've moved beyond the solar system.
@yaroslavpanych2067 Před rokem
I listened until he claimed to be related to boost.... skip skip skip
@etherstrip Před rokem ⁺¹¹
Your loss. What he talks about is really interesting, and entirely unrelated to boost.
@01MeuCanal Před rokem
So when will we see those "in and inout" operators as language standard?
@AbrahamsDave Před rokem ⁺²
As soon as you write formal proposals and shepherd them through the standardization process ;-)
@01MeuCanal Před rokem
@@AbrahamsDave Do you think nobody in the talk will do that?
@AbrahamsDave Před rokem
@@01MeuCanal Yep, that's what I think.
@01MeuCanal Před rokem
@@AbrahamsDave O.o

Další v pořadí

Automatické přehrávání

Back to Basics: Cpp Value Semantics - Klaus Iglberger - CppCon 2022