From NUL to DEL: Why 7 Bit ASCII IS Actually Really Clever

Sdílet
Vložit
  • čas přidán 10. 09. 2024

Komentáře • 377

  • @ke9tv
    @ke9tv Před měsícem +93

    Once upon a time, I worked in the office next door to Bob Bemer, the editor of the first ASCII standard. Which, by the way, also specified EBCDIC. IBM was the only manufacturer that embraced EBCDIC rather than ASCII because EBCDIC was more punched-card friendly, and IBM virtually owned the marked on 80-column card equipment.
    Single newline came from the B programming language. Multics used
    .
    X-ON and X-OFF are misidentified in your table. They're DC1 and DC3 respectively.
    ETB was the standard 'file mark' that separated multiple files on a magnetic tape. EM 'end medium' was a mark that meant, 'this file spans multiple reels, time to switch to the next reel.'
    NAK - negative acknowledgment - is the ^U that you use to cancel the stuff you're typing at the command line.

    • @allangibson8494
      @allangibson8494 Před měsícem +2

      IBM literally invented the punched data card via the Hollerith Company in 1889…
      Punched cards controlling machines however dates from 1798.

    • @lbgstzockt8493
      @lbgstzockt8493 Před 29 dny +1

      @@allangibson8494 It's crazy that modern IDEs still have markings at 80 characters. That technology was so far ahead of its time.

    • @allangibson8494
      @allangibson8494 Před 29 dny +2

      @@lbgstzockt8493 80 Characters was what was determined by the U.S. census as being adequate to store the population information as a line item…

    • @thezipcreator
      @thezipcreator Před 26 dny +1

      I didn't even know ^U existed, I've always just done ^C (and shells are smart so they know to catch that and not just terminate)

    • @TheEvertw
      @TheEvertw Před 22 dny +1

      @@thezipcreator Shells terminate with ^D, not ^C. ^D is End of Transmission (i.e. connection). ^C is passed on from the shell to the program that is running at the time.

  • @jasonclark1149
    @jasonclark1149 Před měsícem +87

    My browser decided to buffer at the perfect moment, in the Morse Code section. "The code for A is , but if you leave a gap, ..." and then it just started spinning. It was a VERY long gap 😂

  • @fritzp9916
    @fritzp9916 Před 26 dny +17

    Great video. Though I think what deserves to be mentioned is backspace. On paper terminals, you can't delete a character you've already written, so all that backspace did was going back one space, allowing you to print over the previous character. This was useful for making text bold - as you mentioned when discussing carriage return - but also for creating combined characters. Want to type "café"? Just type "cafe", hit backspace, and type an apostrophe. The fonts used for paper terminals were carefully designed to make this look good. Likewise, o with " on top was a "good enough" approximation of ö. Some ASCII characters were included specifically for this reason: the tilde, the acute/backtick, the caret. But most importantly, the underscore. The only reason why it was included was to underline words to highlight them.

    • @billwall267
      @billwall267 Před 21 dnem

      Important context!

    • @gcewing
      @gcewing Před 19 dny

      There was a phase of my life when I was using a 5-bit teleprinter as an I/O device for my homebrew 8-bit system. It unfortunately didn't have any backspace ability, which was very annoying when I wanted to print zeroes with slashes through them. I ended up doing a CR and then going over the whole line again to fill in the slashes.

  • @Huntracony
    @Huntracony Před měsícem +84

    I always wondered how terminal progress bars and such worked! This also explains why these often kinda break when there's an error or warning during the progress bar. Thanks, this was entertaining _and_ useful.

    • @exciting-burp
      @exciting-burp Před měsícem +11

      On modern systems, since roughly the '80s (support added ub Windows 11, which previously used a *very* different method), it's all done using VT100 and its successors. Here you'll find ways to encode things like "move to row X column Y", and "set color to red" (there are hundreds of commands). The trade-off is that commands are no longer single bytes. This was used for the first digital displays, especially for dumb terminals.

    • @declanmoore
      @declanmoore Před měsícem +2

      Windows has even added support for these newer control codes to their console host. Before that, and what still works, is to send commands to the console host driver (condrv.sys) via device IO controls.

  • @backpackvacuum9520
    @backpackvacuum9520 Před měsícem +49

    As an American I will proudly ignore all further episodes as I now have everything I need. /s 😂

    • @_f355
      @_f355 Před měsícem +9

      so you don't need that emoji over there, right? :)

    • @nickwallette6201
      @nickwallette6201 Před 28 dny

      It's kind of funny, but that decision became a self-fulfilled prophecy. Because it wasn't a given that you would have consistently-mapped upper ASCII characters to represent even the most common international letters, it got to be fairly commonplace to see letters with accent marks dropped back to their un-accented variants.
      Granted, I'm a native English speaker, and so 26 letters ought to be enough for anybody. ;-) But, it didn't seem to have much of an effect on the intelligibility of words that used those letters. I recall seeing discussions on this where, specifically, Spanish-language and German speakers shrugged it off as, "eh... we knew what it meant." And, again, as a native English speaker, I have rarely considered the word "jalapeno" spelled with anything other than a plain 'n', and yet I recognize it easily enough in either form.
      On a related note, I got a crash-course in the peculiarities of various languages when I started writing a driver for FAT filesystems. Plain FAT (as in, pre-LFN) is case-insensitive, and meant to only consider letters in the low-ASCII range A to Z. All lowercase alpha chars are converted to uppercase with that toggling of bit 5. But, when LFN support was added, well... now we're dealing with Unicode characters in UTF-16 form, and ... _technically_ ... we should be case-folding everything (I think?) to uppercase to store the 8-dot-3 compatibility entry, and when searching for or comparing filenames in either 8.3 or LFN form.
      I say "I think" because the official FAT LFN spec is a bit quiet on what to do about chars with ordinals above 127, probably for the most obvious reason: It's kind of a pain to handle those. You have languages that case-fold differently depending on context, and when you're converting from Unicode to local code-pages, that character might not even exist. While (IIRC) the common US English DOS code page has upper- and lower-case variants of all the accented characters >127, not all of the code pages do, making it impossible to represent properly uppercased versions of any given filename entered with lowercase chars. And, of course, if you change code pages (by either changing the local code page, or moving a file to a system with a different code page), the filename might "change" completely, resulting in lowercase chars in the filename, potentially making it inaccessible by normal means, or causing match collisions with files that get created with uppercasing applied. I think some (or maybe most?) implementations just continue cas-folding lower-ASCII chars, and letting everything else slide.
      All of this because the original implementations were designed in a relatively simple (cultural) language with straightforward rules, and when -- or if -- the thought occurred to anyone about how to handle other languages, they just shrugged and thought, "eh... that's a problem for future developers."

  • @ShenLong991
    @ShenLong991 Před měsícem +41

    The thing with the numbers is even more pretty if you look closely on the bits and have your hexadecimal in mind.
    0x30 till 0x39 are d'0' to d'9'.
    So if you are in embedded Programming, and plan your decimals according, you can look what each are, without have to try and disect the bits.

    • @JoQeZzZ
      @JoQeZzZ Před měsícem +1

      This Is cool, although an inevitable side effect of the "& 0b1111" thing. In order to get a string to int using only an and, they would have to be LSB aligned, and because there are 10 digits they need 4 bits.

    • @timseguine2
      @timseguine2 Před měsícem +3

      This is a side effect of it being backwards compatible with BCD. If you wanted to you could actually do arithmetic directly in string form because of that.

    • @gcewing
      @gcewing Před 19 dny

      And I still have burned into my brain that there are 7 character codes between '9' and 'A', from all the hexadecimal to binary conversion routines I wrote in machine language.
      I've sometimes fancied that if I were to design an improved character encoding I would make the first 36 codes be all the digits followed by all the uppercase letters. It just makes sense. To a programmer, at least.

  • @timseguine2
    @timseguine2 Před měsícem +30

    It took me longer than I care to admit to figure out that ASCII is also BCD with extra "tag" nibbles in between. You can read numbers off easily in a hex dump just by ignoring the extra 3s everywhere. Well and if you get used to it, you can also read letters pretty easily from the hex dump but that feels more like using one of those old cereal box decoder rings.

  • @peterlinddk
    @peterlinddk Před měsícem +29

    A lot of the "skipped" codes, like ACK, NAK, and SYN was used in a lot of early communication-protocols, like XMODEM and the likes. And for some reason I don't understand, DC1 and DC3 was used for XON and XOFF that I think we all remember from the old modem-days. I don't know why SO and SI are called X-On and X-Off in some ASCII-tables ... maybe some other protocols used those?
    Ah, the days of RS-232 ASCII-based protocols!

    • @timseguine2
      @timseguine2 Před měsícem +4

      STX (start text) and ETX (end text) are used sometimes for framing purposes.

    • @ke9tv
      @ke9tv Před měsícem +14

      DC1 and DC3 turned on and off the paper tape reader on Teletypes. DC2 and DC4 turned on and off the paper tape punch. When you were sending a paper tape down the line, if you were threatening to overrun a buffer, the other end would send DC3 to say, 'hold on there, tiger', and DC1 again when it was ready to slurp down more. ^S and^Q still work that way on most Unix terminal emulators.

    • @darrennew8211
      @darrennew8211 Před 29 dny +1

      Shift in and shift out controlled what you might think of as the typeface.

  • @edgeeffect
    @edgeeffect Před měsícem +15

    In your chart, you've got x-On and X-off as 14,15 SO,SI "control-N" / "control-O" but, in any system I remember, XOFF and XON are DC1/DC3 17,19 "Control-S" / "control-Q" ... and that takes me back to writing printer handshaking diagnostics for the repair centre at work and saying, "Oh that's why some of the old 8-bit machines had control-S to pause scrolling".
    My old manual typewriter didn't even have an exclamation mark... because you could make one out of single-quote, backspace, full stop.

    • @fburton8
      @fburton8 Před měsícem +2

      Control-O was commonly used to throw away the rest of the current terminal output.

  • @ShadowKestrel
    @ShadowKestrel Před měsícem +22

    ^D isn't dead and gone. in most CLI/TUI contexts it's a semi-standard way to close out into the parent shell, and still works well in cases where ^C is taken (e.g. in the python shell, where it will raise KeyboardInterrupt).

    • @RoamingAdhocrat
      @RoamingAdhocrat Před měsícem

      incidentally if you're using the python shell more than very very occasionally... install ptpython
      infinitely nicer python shell

    • @pidgeonpidgeon
      @pidgeonpidgeon Před měsícem +3

      Except on windows where its usually Ctrl Z

    • @darrennew8211
      @darrennew8211 Před 29 dny +8

      What ^D does is it sends any buffered data, including whatever you've already typed, but not the ^D. If you type it with nothing buffered, then it sends zero bytes. And Unix treats a read of zero length as an end-of-file.
      ^Z is the character that CP/M decided to put inline to mark the end of a text file, because all files in CP/M were a multiple of 128 characters long. You never saw a file that was like 74 bytes long, so if you had a 74-byte text string in a file, you tacked ^Z on as byte 75.

    • @RoamingAdhocrat
      @RoamingAdhocrat Před 29 dny +1

      @@darrennew8211 I don't know why CZcams sent me a notification about your comment but I'm glad it did.

    • @nickwallette6201
      @nickwallette6201 Před 28 dny +1

      Huh. While I knew the conventions of the above, I did not know the reasons why. This has been educational.

  • @karlfimm
    @karlfimm Před měsícem +20

    I still remember hitting my first EBCDIC files (about 1985) and being amazed that the A-Z characters were scattered around at what looks like random.

    • @timseguine2
      @timseguine2 Před měsícem +13

      If I remember correctly, EBCDIC was designed to be backward compatible with IBM's punchcard systems, which were still relevant at the time. I think there were considerations for efficient electromechanical sorting and also for not producing too many consecutive holes in the card which could clog the reader or the hole punch machine.
      Back when it was invented, IBM was almost superstitious about punchcards because they were a huge reason for their financial success, and continuing financial success.
      In hindsight they don't seem so important of course.

    • @peterholzer4481
      @peterholzer4481 Před měsícem +4

      @@timseguine2 Right. The punchcards didn't use a binary encoding of the digits 0-9. Instead they had 1 row for each digit. So it made sense to use only the digits 0-9 in the lower nibble for the letters, too. There is a picture of a punchcard in the Wikipedia article about EBCDIC. It looks quite neat and not random at all.

    • @darrennew8211
      @darrennew8211 Před 29 dny

      They're lined up properly if you ignore the right holes on the punch card, rather than ignoring the right bits in a byte.

    • @pjl22222
      @pjl22222 Před 25 dny +1

      EBCDIC was just a newer, fancier version of BCDIC, binary coded decimal interchange code, which itself was more a group of similar but different encodings. BCDIC was a 6 bit encoding where the numbers 0-9 were encoded as the values 0-9 and everything else was distributed basically randomly. The letters (uppercase only) were divided into three groups which were backwards, S-Z was encoded with smaller numbers than J-R which were smaller than A-I. EBCDIC is an 8-bit encoding (although many code points were left undefined) which didn't fix the noncontiguous problem but it did fix the order of the letter groups.

  • @jasmijnwellner6226
    @jasmijnwellner6226 Před měsícem +15

    ASCII 27 (ESC, generally written in source code as \x1b, \033 or \e) is still used a lot for terminal applications for more complex than
    can do, including changing the colour of the text or background!

    • @ke9tv
      @ke9tv Před měsícem +6

      There was a whole ANSI standard that came later for what the various escape codes were supposed to do. (Nobody implemented the whole thing, and no two vendors implemented the same parts.)

    • @jovetj
      @jovetj Před měsícem

      Don't forget *^[*
      😉
      ESC is a pretty important character. Not as important as 0x0A or 0x0D, though.

    • @BradHouser
      @BradHouser Před 25 dny +2

      VT100 and later ANSI Escape sequence made BBS pages colorful and graphical (boxes, symbols, etc.). DEC added REGIS graphics to the Escape sequences, and graphic primitives could be drawn on the screen enabling interactive graphis terminals, all using 7 bit ASCII.

  • @davidh.4944
    @davidh.4944 Před 23 dny +4

    I've always liked how caret notation makes clever use of the ascii scheme. If you ever hit backspace in a terminal and see ^H^H^H or cat -A a text file written in windows notepad and see a bunch of ^Ms (or see the programmers use them in comments here), it's because the display has taken the non-printing character, flipped one bit, and is presenting it as its corresponding alphabetic block character. So NUL (00000000) becomes ^@ (01000000), TAB (00001001) becomes ^I (01001001), etc.
    It also works in reverse to enter these characters, as the Control-C bit in the video explained. Very clever.

  • @Vennotius
    @Vennotius Před měsícem +9

    I enjoyed this one very much. I still remember discovering an ASCII table in one of my father's handbooks when I was a kid. This video took me back.

  • @pitan9445
    @pitan9445 Před měsícem +9

    First time viewing you channel - this was excellent.
    Before HTML was a thing, I worked for an organisation selling structured news (sports results &c)
    We used record separators (RS, ascii 30) and file separators (FS ascii 28) to split up our rows and fields.
    It took me a long time to realise we were redefining the acronyms.

    • @ke9tv
      @ke9tv Před měsícem +7

      RS was right to separate records. The fields should have been separated with US, unit separator. GS and FS were higher level.

  • @lennartbenschop656
    @lennartbenschop656 Před 29 dny +4

    They even did take care to support foreign western languages to some degree. ASCII includes the grave accent `, circumflex accent ^ and tilde ~ and you could backspace and print it over a letter (on a real teletype, not on a video screen). The single-quote/apostrophe character 0x27 ' did triple duty as an acute accent and in some old fonts it looks like a mirror image of the grave accent. The double quote character " could be used as umlaut/diaeresis in a pinch. The double-quote and single-quote characters were also common on typewriters and these did not have separate opening an closing quotes. The underscore character was meant to be overprinted on other text as well, just doing a CR without LF.

    • @greggoog7559
      @greggoog7559 Před 28 dny

      You can do it on a video screen too. It's called "Compose" and you just press the Compose key (whichever key you've assigned for that purpose) and then for example 'a' and '^'.

    • @lennartbenschop656
      @lennartbenschop656 Před 28 dny

      @@greggoog7559 That has nothing to do with ASCII as such. Compose combinations are substituted with codepoints for accented letters (formerly in your favorite 8-bit code page, today in Unicode). I was talking about old printers that only had 7-bit ASCII and could print a letter, then backspace then the accent.

  • @mhzellers
    @mhzellers Před měsícem +8

    If you have ever punched a Hollerith card, EBCIDIC makes a certain amount of sense.

  • @LordPhobos6502
    @LordPhobos6502 Před měsícem +8

    Looking forward to next week's video!
    Reading ascii codes in decimal hurts my poor lil brain though, I was taught early on in hexadecimal, and it always made more sense to me that way :)

    • @Lord-Sméagol
      @Lord-Sméagol Před 27 dny

      I learned BASIC at school using an ASR-33 TeleType dialling in to an HP 2000F, saving my programs to paper tape.
      Sometimes, classmates would want to know which program was on their paper tape that they forgot to write the name on.
      This was easy enough if the terminal wasn't being used, but I could read the holes and tell them :)

  • @ReneKnuvers74rk
    @ReneKnuvers74rk Před měsícem +6

    13:14 I’m pretty sure not the creators of ASCII threw all the hyphens and quotes on a couple of piles - it was the teletype-makers that around 1900 to 1960’s had no 1, only an i without a dot, a separate dot that doubled as a single quote, and no separate characters for o and 0. That meant that ASCII adding back these additional characters would force mechanical changes to the devices that were supposed to use the new standard. Since computers need a distinction between a letter and a number the 1/i and 0/O issue was required to be solved, but the start and end quotes have no functional meaning in a computer.

    • @darrennew8211
      @darrennew8211 Před 29 dny +2

      Not just teletype. That was pretty common on typewriters too.

  • @AdrianDerBitschubser
    @AdrianDerBitschubser Před 21 dnem +3

    11:50 The Rest contains one really important character: The ESC, or Escape-Character. It is used with ANSI Escape Codes to generate all the wonderful color and other formatting in terminals even to this day. Maybe that is worth a video.

  • @kevinmcnamee6006
    @kevinmcnamee6006 Před měsícem +2

    Excellent video. It certainly brought back memories. My first job as a programmer (1975) was working on code that allowed IBM mainframes to communicate with ASCII terminals. This involved translating ASCII to EBCDIC and of course worrying about how all the control characters worked, like CR, LF, TAB, NULL, etc. On the old Teletype 33 terminals you even had to worry about how long it would take for carriage to return to the left margin after printing a long line, and insert enough NULLs to allow it time to happen before the next printable character arrived. We referred to them as dumb-ASCII terminals. One thing that made things more tricky was that the guy who wrote the specs for the communications controller on the IBM mainframe got the bit order reversed, so the low order bit from the IBM system was sent as the high order bit on the wire. Another difference was sort order. In ASCII, digits sort first, followed by lower case letters, and then upper case. In EBCDIC, upper case letters sort first, followed by lower case, and then numbers.

    • @nuk1964
      @nuk1964 Před 28 dny

      One of the frustrations that I remember from the early 1980s was the occasional mangling of data when going between EBCDIC and ASCII world. Alphabetic and digits were OK, as most of the punctuation. Manged were things such as horizontal tabs, circumflex, backslash, curly braced and square brackets (apparently some versions of EBCDIC had these, and some did not, and those that did sometimes they appeared in different locations). E-mail and general text would generally pass through OK (or if was "mangled" in translation, it was still understandable). What was not so great was when you tried to transfer some source code in languages like C or Pascal.
      Learned quickly to NOT use TAB characters for indentation (due to the inconsistent translation -- sometimes it translated directly into a single spaces, other times it would get "expanded" to a sequence of spaces but inconsistently -- if you're lucky it expanded to the right number of spaces to preserve the indentation, but more often than not, it didn't). This helped to preserve the indentation of code -- allowing for easier recovery when the curly braces would get lost (and you had a better chance to guess correctly the location of those missing curly-braces).
      The loss of curly braces, square brackets and backslashes would render C source code unusable -- but a "somewhat obscure" feature of trigraphs became quite useful in this case. Downside is they make your code *really ugly*.
      For Pascal code, found the some the alternates used in Pascal/VS on the IBM useful -- such as the "(." and ".)" aliases for the square brackets, and "->"" alias for the caret.

    • @nuk1964
      @nuk1964 Před 28 dny

      My first encounter with double-byte character set was on the Control Data mainframe -- where a double-byte system was used to get beyond the limitation of 6-bit bytes.
      It was also on the Control Data systems that I'd finally understood why Pascal had used eoln() function (rather then looking at the character value and check for carriage return or linefeed) -- end-of-line was a very specific pattern (iirc it was something like a word-aligned sequence of contiguous zero-bytes -- where there were 10 6-bit bytes in a 10-byte word).

  • @TimSavage-drummer
    @TimSavage-drummer Před měsícem +9

    EOT (Ctrl+D) is still used in Unix/Linux to end a terminal session. I also find it odd that 28-31 aren't used more, they are perfect for use in CSV(like) files to avoid needing to do escaping etc.

    • @lupinzar
      @lupinzar Před měsícem +6

      The utility of CSV is that you can edit it in pretty much any text editor in a pinch and it still remains (fairly) human readable. Once you introduce control codes that won't be visible at all in some editors and require special settings in others, you might as well develop a binary format that is more efficient. That said, if you can't influence the design of a data format and need an extra set of delimiters they are useful, but probably not best practice.

    • @darrennew8211
      @darrennew8211 Před 29 dny +3

      Control D doesn't end a terminal session. It flushes the keyboard buffer without adding anything to it. If you're at the start of a line, then you flush zero bytes. A read from a file of zero bytes indicates an end of file in Unix. So the terminal reads zero bytes, thinks its input is closed, and exits.
      Write a program that sits in a loop reading the stdin and writing what it gets without any buffering. Then type "ABC" and hit ^D, and you'll see instead of exiting it just prints ABC.

  • @OrigamiMarie
    @OrigamiMarie Před měsícem +12

    Ctrl-d is still used a little with Bash. If you want to quit a user session fast (and can't be bothered with "exit"), ctrl-d will end it.

    • @edgeeffect
      @edgeeffect Před měsícem +4

      Reminds me of the MCP in Tron with his "End of Line".
      Ctrl-d can be used anywhere you want to end a file like `cat - >my_file.txt` - type a line, type another line, ctrl-d

    • @rogerramjet8395
      @rogerramjet8395 Před měsícem +9

      And CTRL-L to "clear" the screen. (Maps to "Form Feed" … which shifted the paper to the start of the next - blank - page).

    • @pidgeonpidgeon
      @pidgeonpidgeon Před měsícem +4

      Ctrl D is used a lot on Linux in general. Anytime you use a pipe it takes one processes stdout and connects it to another's stdin and the convention to say that the stdout is empty is to send Ctrl D

    • @0LoneTech
      @0LoneTech Před měsícem +3

      @@pidgeonpidgeon No, ctrl-d for end of transmission is in the terminal (tty) layer. Between processes end of file is indicated by closing the connection, see shutdown or close system calls. The terminal in cooked mode also permits using ctrl-d to input an unterminated line without ending the file, similar to fflush, or actually transmitting EOT with ctrl-v ctrl-d. More details in e.g. stty(1); try "stty -a".

    • @cigmorfil4101
      @cigmorfil4101 Před 26 dny

      It's more than bash.
      *nix uses ^D to mean EOF. Any program reading from STDIN getting an EOF would exit as it can no longer read any input; eg:
      $ cat > hello
      World
      ^D
      $ cat hello
      World
      $
      Thus, when you put an EOF (as the first character) to bash, it gets an EOF and exits, as do sh, csh, tsh, etc.

  • @BradHouser
    @BradHouser Před 8 dny

    Fun Fact: Some of us remember the key-strokes Ctrl-S and Ctrl-Q. They are the ASCII codes to stop and resume display output. They use the codes for Device Control 1 (ASCII 11 Hex) and Device Control 3 (13 Hex) to tell the sending device to stop sending data.

  • @McDuffington
    @McDuffington Před měsícem +2

    One of my favorite subjects! Looking forward to the follow up parts!

  • @bread8070
    @bread8070 Před měsícem +3

    One more thing, following on from how upper and lower case are separated by a single bit: look at the number keys on a keyboard and the symbols on them. Starting from 1 you’ll notice the codes for the numbers and symbols are also separated by a single bit. It goes a bit wrong about half way along, but on old keyboards (pre IBM PC) this usually works for the whole set. Now look at the keys for the non alphabetic symbols in those two alphabetic ‘blocks’. You’ll find the symbol in the low case block is on the same key as the equivalent symbol in the upper case block.
    Thus, the symbols and numbers on most keys differ only by a single bit. Why? Because taking a keyboard scan code and converting it to ASCII requires a bunch of code and a look up table. Old computers were very slow and had very little memory. So old keyboards generated ASCII codes in hardware, to be returned to the processor. Arranging the keys so the symbols on them were one bit apart made the hardware much simpler.
    To be fair, it’s probably fair to say that the ASCII codes were derived from existing typewriter layouts. So it’s actually the ASCII code ordering being chosen to match the keyboard layout rather than the layout being designed to match the ASCII. But that just makes the ASCII design even smarter.
    (And I suspect the same is true for teletypes and the symbol pairings on the hammers - which were probably inherited from typewriters anyway).

    • @ke9tv
      @ke9tv Před měsícem +1

      There was also a design for conversion between EBCDIC and ASCII that required only a handful of transistors. The two standards were developed together. (IBM 026 and 029 card code preceded EBCDIC.)

  • @amarqueze
    @amarqueze Před měsícem +3

    Very nice video. I work with computers since the 80s, and never though about ASCII. Now I know how python progress bar is built and other clever ideas. Well done Dylan!

  • @VoyVivika
    @VoyVivika Před 24 dny +1

    Clicked on this video only to discover it's by the guy who made the Rockstar programming language, lmao wasn't expecting that. Loved the video btw!

  • @bishaladhikari9499
    @bishaladhikari9499 Před měsícem +4

    Loved every second of it

  • @ib9rt
    @ib9rt Před měsícem +1

    When I was first introduced to computers in 1977, I used an ASR-33 Teletype complete with paper tape punch/reader. The ASR-33 only had uppercase letters, so it was with a sense of wonder I discovered that some more advanced terminals could also do lowercase! And everyone wrote the obligatory program that scanned through codes 0 to 127 and printed them out to see what they would do. Sending a string of ^G characters to an ASR-33 produced a sound never equaled by later devices, especially since they never seemed to insert a gap between the beeps.

  • @agranero6
    @agranero6 Před 29 dny +2

    It is mostly forgotten that we have SOH, STX, ETX, EOT, ENQ, ACK, SYM, ETB, FS, GS, RS, US, and particularly EM: end of medium. This was primarily designed for data transmission like Baudot and not for use for use on the computers themselves (like memory and files) as the very name states: "for Information Interchange".
    It is interesting to analyze those systems by their purpose (a teleology if you want): Morse made the most used characters shorter (he went to a printing press and looked at the size for the type cases, the most common were bigger, yes this is why we call uppercase and lowercase); Baudot was firstly designed to minimize the wear in the mechanical parts of the telegraph (not the modern Baudot), and ASCII, well we see hew hints of a protocol attached to a machine as those mentioned and DC1, DC2, DC3 and DC4.
    I always wonder if it was used this way or that part of the standard was simply ignored. Yeah, a teleprinter used many of them, but certainly not FS, GS, RS and US they are used for sending files not only inside of files, you do not need FS inside a file (except maybe a file like a TAR) but need it on a data stream that has several files, like a paper tape a magnetic tape or something like that.

  • @lostcarpark
    @lostcarpark Před 26 dny +2

    You skipped over 16-31 very fast. I think the Escape character at least deserves a mention!
    You mention Morse code, but there were several other digital codes that predate even computers. Baudot was developed in France in the 1870s for telegraph machines as a 5-bit digital code. The early consoles used a piano-like keyboard, and required operators to press keys together to make chords, so the code was designed to be easier for operators, with more common letters in single bit positions, and even the numbers weren't continuous. This was later adapted into Murray code, in the early 20th with the development of teletype terminals and teleprinters that let operators use a QWERTY style keyboard. As they were mechanical, the code was designed to minimise wear on the machinary. Finally, fully electronic machines started appearing in the 1930s, leading to the development of ITA2 (which at least put the numbers back in a contiguous block).
    Having been developed for one purpose and evolved and tweaked for others, the code was quite messy, so we can probably be grateful that the designers of ASCII decided to go with a clean sheet design. There probably is a universe in which they decided to take Baudot/ITA2 and extend into a 7 bit code. ASCII effectively has four 5-bit "pages". I could imagine taking the "letter" and "figure" modes of ITA2 as two of those pages, than adding lower-case and control codes as the other two. Then, your video would be explaining why the ASCII code letters weren't in alphabetical order.

  • @ChannelSho
    @ChannelSho Před 18 dny

    Another neat thing about the way the digits are organized in ASCII is if you convert it to hex, you just look at the lower half and you'll get the number.
    Also I like how the alphabet characters start with bit 0 as 1, because it makes more sense to use that A = 1 rather than A = 0.

  • @foo0815
    @foo0815 Před měsícem +2

    Thanks for the DEL story!

  • @dj196301
    @dj196301 Před měsícem +1

    Subscribed! No dumb-ass stock footage, no tangent shots, just an entertaining and informative chap talking about cool stuff. Looking forward to "Why UTF8 is Actually Very Clever"--unless you've done and ii just haven't seen it.
    Thank you.

    • @DylanBeattie
      @DylanBeattie  Před měsícem

      @@dj196301 thank you! UTF-8 is coming in a few weeks. Got some other stuff to talk about first :)

  • @clasqm
    @clasqm Před měsícem +7

    ASCII 27 still maps to the Escape key.

    • @briansepolen4917
      @briansepolen4917 Před měsícem +2

      One great thing about these blocks described is that one can see that like using Ctrl-C for ASCII 3 (ETX), one can also use Ctrl-[ (ESC) instead of lifting hands off the home row for Escape. Great for increasing TUI speed and efficiency.

    • @nurmr
      @nurmr Před 28 dny

      Yep, ESC is essential for CSI (and SGR in particular), so without it there would be no ANSI terminal colors!

  • @andythebritton
    @andythebritton Před měsícem +1

    This seems to be an abridged version (or possibly the first episode) of Dylan's 'No such thing as plain text' talk, which is well worth a watch.

  • @lennartbenschop656
    @lennartbenschop656 Před 28 dny

    Between Morse code and ASCII there was also ITA2 (sometimes incorrectly called Baudot code), a five-bit code for mechanical teletypes. It used control codes (letters and figures shifts) to switch between letters and digits/punctuations. ASCII still has SO/SI control codes to make it possible to temporarily switch to a different character set. ITA2 has a Null character, CR and LF and even Bell and "Who are you" (similar to the ENQ control code in ASCII).

  • @OhhCrapGuy
    @OhhCrapGuy Před měsícem +1

    I've actually used 0x1F instead of commas when I needed to save something with the sheer simplicity of a CSV file while not having to figure out the logic of how to handle data with commas or quotes in them.
    Works great. You know, since that's what it's for, haha

  • @DragoniteSpam
    @DragoniteSpam Před měsícem +4

    Lol I didn't expect that little shower thought to turn into a whole video, good fun!

  • @BradHouser
    @BradHouser Před 25 dny

    My first programming was over a dialup teletype at 110 Baud or 10 characters per second. I was in high school in the '70s and dial up time share systems running BASIC cost $6.00 per hour, so connect time was precious. You wrote your program offline on paper, then entered it on the teletype, punching it on tape as you typed, and if you made a mistake, the DEL key was like digital White out. Of course, it did not speed up data transmission. Once you had it all typed onto paper tape, you dialed the number with a Touch-Tone keypad, logged in and then played the paper tape back to upload your program. Then you ran it, you could also renumber, and list it back and re-punch it for later. When I told my mom I needed money to learn BASIC programming, she asked what I did on the computer. I told her games. I love her: she didn't complain. I became an Electrical Engineer/Computer Science guy.

    • @BradHouser
      @BradHouser Před 25 dny

      One of my friend's dad had a 300 baud terminal/printer, and we used to dialup GE's free modem line and just print out stuff in order to watch it work.

  • @Dominik-K
    @Dominik-K Před měsícem

    Thanks a bunch for this video. I've known most of these things already, but in my programming career knowing those fundamental bit layouts and tricks had been so valuable to writing efficient and understandable code

  • @niczoom
    @niczoom Před měsícem

    Great video and very well explained! The point about why certain commands are still in use today and their origins was very interesting. I learned something new-thanks for sharing

  • @cfhay
    @cfhay Před 24 dny

    EOT (End of Transmission) is Ctrl+D and can be used today still. Ctrl+D in Linux (and other similar systems) will flush the current buffer. If this buffer is empty, it will result in a zero-byte read. A zero-byte read mean end of file/end of input in most contexts. For example using it in at a shell prompt will cause the shell exit with exit code zero. If that was a login shell, it causes a logout. I use it every day.
    Also ESC is widely used to decorate Linux console output (colors etc).

  • @MattJoyce01
    @MattJoyce01 Před měsícem

    Some of this I knew, but I didn't realise the deliberate design elements. Good job.

  • @JamieBainbridge
    @JamieBainbridge Před 29 dny +1

    Ctrl+d is still commonly used on Linux. It's the way to logout of a shell, and also the way to get out of a REPL like Python.

  • @KX36
    @KX36 Před měsícem +2

    I got distracted at 3:50 and reimplemented morse code as a canonical Huffman code. By hand, in Excel, for fun. 😅
    Each character is 3-9 bits long but it's a binary prefix code so no need for gaps in transmission.

  • @Squossifrage
    @Squossifrage Před 8 dny

    4:51 While eight-bit bytes were already common when work on ASCII began in the early 1960s, they did not become ubiquitous until the mid-to-late 1970s.

  • @RaceriEmil
    @RaceriEmil Před měsícem

    Thanks. That was very informative and insightful. I like your delivery and the small jabs/joked you put in. I am looking forward to your next video!

  • @aDifferentJT
    @aDifferentJT Před měsícem +2

    Ctrl-D in the terminal is great, it will exit most REPLs or shells

  • @OranCollins
    @OranCollins Před měsícem

    I've always loved your talk on ascii.
    Love seeing more stuff from your brain!
    keep it commin!

  • @aaronbredon2948
    @aaronbredon2948 Před 17 dny

    ASCII being 7 bit covered most of the generic characters including accented characters via overprinting.
    If the inventors had wanted to include all possible characters across the world, they would have needed at least 2 bytes per character to be able to handle Chinese and Japanese ideographs.
    Leaving the remaining 128 values of a byte unspecified allowed different countries to add country specific characters. In the IBM PC world these were implemented as “code pages”, and were a bit of a problem when talking between countries.
    Unicode eventually resolved this communication problem, but it requires 32 bits or 4 bytes to encode the over 140,000 characters, and there are visually identical Unicode characters that are logically different, which makes it easier for scammers to fake internet addresses.
    And something as large as Unicode wasn’t practical in the early days of computing, when every single byte saved was significant.
    EBCDIC had the advantage that numbers were readable on hex crash dump printouts, but numbers and letters shared the same character codes (C1 represented either A or positive signed 1, depending on what the data type was.)

  • @__christopher__
    @__christopher__ Před 27 dny

    Control character 4 (EOT), that is, Ctrl-D, still lives on in terminal emulators of Unix-derived system like Linux as the end-of-file character (although technically it's the flush-input-buffer character, but returning an empty input is interpreted as end of file on Unix-derived systems, therefore it effectively acts as end-of-file for terminal input and also is commonly referred to as such; the difference can be seen if you try to use it on a non-empty line).

  • @andrewjameswelch
    @andrewjameswelch Před 27 dny

    Great vid, thanks. A follow up vid could be a similar explainer about how utf-8 uses multiple bytes and what happens when that is read using a single byte encoding.

  • @sheridenboord7853
    @sheridenboord7853 Před 25 dny

    Great talk thanks. I always suspected DEL because of how it sat in the ASCII table didn't look right. A control character all by itself as if it was an after thought. So a program reading from a stream would just ignore DEL characters.

  • @xdcountry
    @xdcountry Před měsícem

    That was great. Excellent tour through the origins. Just incredible.

  • @rabidbigdog
    @rabidbigdog Před měsícem

    Good lawd, this was awesome. Kinda hilarious how everyone else tried to ensure IBM was out there in the wind.

  • @flamewingsonic
    @flamewingsonic Před dnem

    You missed one very important use of characters in the control block: character 27 (ESC) is used by terminal emulators as part of the "control sequence introducer" ("CSI") to do things such as changing foreground/background color, setting bold/italics/underline, etc. Although this is more rpevalent in UNIX world, even DOS (and the Windows command prompt) had a device driver (ANSI.SYS) supporting these ANSI escape codes.

  • @darrennew8211
    @darrennew8211 Před 29 dny

    Fun facts: The ASCII underscore character was originally a left-pointing arrow, which is why Smalltalk (from around 1976) uses "_" as the assignment operator, and why Pascal (designed to work with EBCDIC also) uses ":=" instead, to look like an arrow as close as you can get on punched cards.
    EBCDIC has the same sort of bitwise feature for letters that the upper/lower trick in ASCII uses, except it's designed for punched cards. So with a card 11 columns high, the letters are in "contiguous" numbers if you ignore the proper holes on the card rather than ignoring the proper bits in the byte.

    • @__christopher__
      @__christopher__ Před 27 dny +1

      Actually, Pascal had several digraphs to be used when certain characters were not available. For example, Pascal comments were written in curly braces {like this}, but in case curly braces were not available, you could also use parentheses with asterisks (*like this*). Now the only character used by Pascal that was not available in ASCII was the left arrow, whose digraph replacement was :=, which is why that one became commonly known as the Pascal assignment operator.

    • @cigmorfil4101
      @cigmorfil4101 Před 26 dny

      ​@@__christopher__
      Were '" not available?
      "" for assignment, eg:
      RA -> VARLOC
      means the contents of the A register are stored in the location pointed to by VARLOC (effectively a variable).

    • @__christopher__
      @__christopher__ Před 26 dny

      @@cigmorfil4101 that's already the less-equal operator.

    • @cigmorfil4101
      @cigmorfil4101 Před 26 dny

      @@__christopher__
      Interesting how all the BASICs I've used over load the '=' operator to mean both "assign" and "compare equal" - the meaning based on context.
      How about ""? (That looks more like an arrow than ":=".)

    • @__christopher__
      @__christopher__ Před 25 dny

      @@cigmorfil4101 that is already a less-than followed by a unary minus operator.
      Also, := was already in use in mathematics for definitions, so it fits quite well.
      Note also that a proper assignment statement in BASIC was
      LET var = value
      A lot of BASIC interpreters (in particular Microsoft's) allowed omitting the LET though.

  • @KhalilEstell
    @KhalilEstell Před 29 dny

    Amazing video, loved it.

  • @philipoakley5498
    @philipoakley5498 Před měsícem +2

    I remember doing port-a-punch cards in EBCDIC for my first computer programmes at grammar school! 10-6-8 everyone (or was it 11-6-8;-).

  • @mag-icus
    @mag-icus Před 21 dnem

    Also, ctrl-d is still used to mark end of streams on unix. So it is not just ctrl-d and ctrl-g that has survived until this day.

  • @acasualviewer5861
    @acasualviewer5861 Před 29 dny

    In old Apple ][ word processors you'd enter control characters to teach the word processor how to work with your new printer (instead of drivers).
    Also they were used for modems. We had to type in weird characters to get the modem transmitting.

    • @cigmorfil4101
      @cigmorfil4101 Před 26 dny

      Apple ][ basic also used the VT52 arrow key ESC sequences to move the cursor around the screen ready for copying - you listed a line and then had to ESC-D-ESC-D etc to get to the start of what you wanted to copy, use the -> key to copy, and use lots of ESCs to skip over blank characters - the Apple ][ was aggressive in printing out blank spaces and word wrapping.
      To fix the excessive spaces we set the right hand edge of the window to be one less character than it needed before it word wrapped (indenting from line number) and so it would not put in the excessive end and start of line spaces. The next thing I did was to write a character output trap which ignored spaces outside quotes and showed control characters in reverse (particularly for the DOS ^D prefix character, but it also showed up any other CTRL codes) to make it easier when editing such lines.

  • @u9vata
    @u9vata Před 29 dny +1

    The ESCAPE ascii character is often used in various APIs like old BIOS interrupts for reading the keyboard you can grab "scan" codes or ascii codes. Most people who wrote games go for scan codes and many other software too, but even though there are no ascii returned properly for arrow keys for example, the escape key generates the ESC character properly in the bios - just example.

  • @williamlyerly3114
    @williamlyerly3114 Před měsícem

    As one who lived and died on TTY33/35 devices this was very interesting. Programmed in SLEUTH (Univac assembler) later BAL. Lived in ASCII land.

  • @unvergebeneid
    @unvergebeneid Před 21 dnem

    Thanks for bringing the GIF/JIF debate to EBCDIC ;D

    • @DylanBeattie
      @DylanBeattie  Před 21 dnem +1

      The first c in EBCDIC is pronounced like the c in "Pacific Ocean" - what's the problem? 🤣

    • @unvergebeneid
      @unvergebeneid Před 21 dnem

      @@DylanBeattie ;p luckily I have yet to see someone argue that those 256-color images are pronounced "SHIF" :D

  • @probablypablito
    @probablypablito Před měsícem

    Incredible video!

  • @chri-k
    @chri-k Před měsícem +2

    I can't believe you just called ^[ and ^D unimportant

  • @threee1298
    @threee1298 Před 27 dny

    New to the channel, this is wonderful

  • @KX36
    @KX36 Před měsícem +1

    The Device Control characters are still very important for configuring barcode scanners. How do you change the settings of a barcode scanner about e.g. whether or not to insert a
    or
    or nothing after scanning a barcode, you send combinations of device control characters followed by alphanumerics. Exact combinations are device specific.
    Also, we just last year migrated away from a 1980s unix program (still a very popular program) that uses a database of literal ascii strings, each field separated by the Record Separator character.

  • @orterves
    @orterves Před měsícem

    Good video, nice refresher of a topic I haven't really thought about directly since university - except for bloody Windows crlf when working with cross platform code

  • @stiansoiland-reyes2548

    More about the ASCII graveyard, please! For instance, RS Record Seperator, now used in application/json-seq format to separate JSON objects, e.g. in a streaming event log that will never finish. Lots of goodies in the graveyard...

  • @mag-icus
    @mag-icus Před 21 dnem

    You missed the cleverness behind code 33-41. These punctuation marks come in the same order as they do on the number keys on an (American) keyboard; this means that similar to how lower case letters were converted to upper case by resetting a single bit (toggled by the shift key), the same were actually true about pressing shift + a number key.

  • @Bunny99s
    @Bunny99s Před 28 dny

    It's worth mentioning that
    is also still in use when it comes to HTTP. HTTP is a text / line based protocol and each line is separated by a
    . So even in the Unix / linux world you have to deal with that line ending. There's a similar issue when it comes to little vs big endian. While little endian kind of dominates the PC world nowadays, when it comes to network protocols, most of them use big endian. That's why it's often called "network order". Big endian makes it easier to create hardware decoders and since a huge chunk of the network world is hardware this actually makes sense. From the programming point of view I generally prefer little endian.

  • @BradHouser
    @BradHouser Před 25 dny

    The eighth bit was often used for parity checking.

  • @BobFrTube
    @BobFrTube Před měsícem

    The extra bit also provided parity.. CR and LF were separate because going to the next line on a teletype took two character times. Multics chose LF as the NL because CR could be considered as not doing anything. _ was originally a left arrow.

  • @gcasar
    @gcasar Před měsícem

    so happy i got this as a suggested vid

  • @Colaholiker
    @Colaholiker Před 24 dny

    I don't normally comment on the clothing style of CZcams creators - but that t-shirt rocks. 🤣

  • @notthedroidsyourelookingfo4026

    8:18: Dylan taking a stance on tabs vs. spaces 😂

  • @ABaumstumpf
    @ABaumstumpf Před měsícem +1

    I still think windows got it right with
    : The symbols have different meanings and should not be abused like that. For me SOH/STX/SOH/EOT are still rather important and it buggs me to no end when some people with no clue want to sound smart and then decide "Lets use ENTER for SOH". no... Just no. That statement was so bad it fails to even qualify as being wrong.
    And for programming: Tab all the way - that is exactly what it was created for and, contrary to spaces, does not force any particular style onto others. You want 4 'm' worth of indentation? Sure, go ahead, set your tabstops to be at that interval.
    On the other hand i have so often seen people try to align variable-width text in columns only to fail miserably or ragequit when they noticed that changing any settings of the font messes it up cause they tried using spaces. The tool for that has been available for 60 years in the form of tabs.

  • @Tweekism86
    @Tweekism86 Před měsícem +3

    7:12 Speak for yourself! I still use Control-D, to close terminals and exit SSH sessions, quit python or node.js and the like.
    Edit: Love the video btw, can't wait for the next one :)

    • @SirusStarTV
      @SirusStarTV Před měsícem

      On Windows python repl only accepts ^Z and enter key needs to pressed for it to work

    • @Tweekism86
      @Tweekism86 Před měsícem

      Dammit Windows, this is why we can't have nice things!

    • @0LoneTech
      @0LoneTech Před měsícem +1

      @@Tweekism86 In this case you can blame CP/M, in particular where file length in bytes was not recorded.

  • @DrCoomerHvH
    @DrCoomerHvH Před měsícem +1

    I like how you've recycled some of the points from your talks into their own little videos, especially when the video topics are directly interactive with the community or fans.

  • @dimitrioskalfakis
    @dimitrioskalfakis Před měsícem +1

    useful and well presented.

  • @AutomatedChaos
    @AutomatedChaos Před měsícem +2

    While working in IT for more than 2 decades now, it surprises me that developers try to invent character separated values (csv) for columnar data again and again while there are literally 4 ASCII characters reserved to handle these cases. But no, let's use the comma, semicolon, tab (\t), pipe, tilde or even the |~| combination as separator with all problems that can occur like escaping, quoting and in-field newlines.

    • @DylanBeattie
      @DylanBeattie  Před měsícem +5

      10 years working tech support made me realise that if regular folks can't read it on their screens and type it on their keyboards, they're not gonna use it... and, honestly, I think they're right. We wanna bring back ASCII field and record separators, we should be putting them on keyboards.

    • @jovetj
      @jovetj Před měsícem +1

      Yep. Control characters are generally un-keyable and non-displayable. Not very practical for most people.

    • @ABaumstumpf
      @ABaumstumpf Před měsícem

      @@jovetj "Control characters are generally un-keyable and non-displayable. "
      No, they were simple control-character - literal keys on the keyboard, and they are very much displayable as even MSWord can show them.

    • @cigmorfil4101
      @cigmorfil4101 Před 26 dny

      ​​​
      MSWord might, but Notepad doesn't (other than as an undecipherable character as to which control code it actually is - try looking at a PDF using notepad) - CSV being a plain text format, Notepad, a plain text editor, would be _the_ tool for the job, not MSWord.

    • @ABaumstumpf
      @ABaumstumpf Před 26 dny

      @@cigmorfil4101 "Notepad, a plain text editor, would be the tool for the job, not MSWord."
      notepad is just a scratch TEXT-editor and NOT for working with csv. I mentioned word cause most programs do display them correctly.
      And notepad is just far far off from being the correct program for anything.
      for CSV you would use a program that either can actually deal with ASCII (so not notepad) or better - a program designed for handling tabular data.

  • @LethalChicken77
    @LethalChicken77 Před 22 dny

    I think what's even cleverer about it is that you can use the 8th bit for 128 more characters!

  • @enterrr
    @enterrr Před měsícem +2

    Correction to the last frame: ASCII has 128 characters, not 127 😏

    • @darrennew8211
      @darrennew8211 Před 29 dny

      I bet you could argue that DEL is not a character. :-) I saw that too, and then thought about it.

    • @enterrr
      @enterrr Před 27 dny +1

      @@darrennew8211 more likely he does not "feel" NUL (\0) is a character in the earnest. But gut feeling or C ASCIIZ hangups) are irrelevant - the ASCII is defined as 128 7-bit characters

    • @darrennew8211
      @darrennew8211 Před 27 dny

      @@enterrr Granted that NUL on paper tape is arguably less of a character than DEL is. :-)

    • @enterrr
      @enterrr Před 27 dny +1

      @@darrennew8211 that's like calling 0 less of a number than 1, hehe

    • @darrennew8211
      @darrennew8211 Před 27 dny +1

      @@enterrr Not really. I mean, unless you want to say the tape comes pre-filled with NUL characters, right? :-)

  • @luserdroog
    @luserdroog Před měsícem +1

    I like this, but what about the earlier threads like Jacquard Looms? There's some fascinating stuff in the first APL books (I forget if it's in A Programming Language or Automatic Data Processing) about how to design encodings for punch cards with various numbers of holes.

  • @gurumeditationno.4251
    @gurumeditationno.4251 Před 21 dnem

    ^D is very much still in use. It will end a shell session or logout on any Unix-ish system.

  • @chrisd561
    @chrisd561 Před měsícem

    Great video!

  • @rascta
    @rascta Před měsícem +1

    Sadly lost and not mentioned here, the FS, GS, RS, and US characters (28-31). Meant to serve as distinct bytes that wouldn't be part of text data, and therefore could easily be used to delineate it.
    But alas instead we just totally forgot they existed and therefore ended up with formats like CSV, which gave double meaning to commas, newlines, quotes, etc. With special escaping rules and incompatibilities between systems. And we've spent generations figuring out how to handle that properly and handle all the edge cases. Just because we didn't have and didn't bother to come up with a few symbols to represent those 4 characters.
    Some of those other low code points were perfect for networking, sending a single byte to communicate something that now we need an entire packet to communicate the same thing.

    • @darrennew8211
      @darrennew8211 Před 29 dny

      The number of self-taught computer programmers who reinvent the wheel because they were never taught what already works always astounds me.

    • @cigmorfil4101
      @cigmorfil4101 Před 26 dny

      Interestingly Pick uses characters 252-254 as markers in dynamic arrays (and filed items) between the "elements":
      FE - 254 - Attribute mark
      FD - 253 - Value mark
      FC - 252 - Sub Value mark
      The whole dynamic array is a string with the elements separated by the marks. If an element is required that doesn't exist, Pick adds enough of the relevant marks to create it when setting the value of the "element" or returns a null.
      This means you get to access things like:
      Data = ''
      Data = 'attr 1'
      Data = 'at 2, v1, sv 3'
      Data = 'at 2, v 3'
      Data = 'at 4, v2'
      CopyData = Data
      Element2 = Data
      The strings Data and CopyData contain:
      attr 1[am][sm][sm]at 2, v 1, sv 3[vm][vm]at 2, v 3[am][am][vm]at 4, v 2
      And Element2 contains
      [sm][sm]at 2, v 1, sv 3[vm][vm]at 2, v 3
      Where [am] is char(254), [vm] is char(253) and [sm] is char(252)
      Pick is a multi-value DBMS OS with all fields of variable length and type (though as the whole is stored as a string they're effectively all strings which are converted to the relevant type at time of use).

    • @cigmorfil4101
      @cigmorfil4101 Před 26 dny

      The use of CSV is to _avoid_ non-printing control characters (other than a line break) so that the data is easily edited as plain text by a plain text editor.
      A plain text editor generally only understands line breaks; how control characters are displayed depends upon their programming: some may display as ^c, some may display a '?' regardless of the chatacter, some may let the display driver decide what to do (hence the smiley face, musical notes, etc, that the original IBM PCs displayed for control characters)
      As there was no consensus how to handle control codes, CSVs avoided them and stuck to plain text, using commas (hence the name: _Comma_ Separated Values) requiring some sort of escape for commas - enclose a field with commas within quotes - and a mechanism to handle the quoting characher within fields.

    • @darrennew8211
      @darrennew8211 Před 25 dny

      @@cigmorfil4101 I always found this argument bizarre. ASCII was invented well before any "plain text editor" was, so saying "we changed this because plain text editors couldn't handle ASCII" sounds like working around the problems in tools rather than just fixing the tools.
      There was also an image format called NetPBM which was great, and one of the options was to represent all the bytes with decimal digits. Like, you could read it with BASIC even. Red would literally be "255 0 0" with nothing other than ASCII digits and spaces.

    • @darrennew8211
      @darrennew8211 Před 25 dny

      @@cigmorfil4101 Wow. It has been *ages* since I heard anyone else who ever used Pick. :-) Blast from the past there.

  • @TheEvertw
    @TheEvertw Před 22 dny

    "The next two have fallen out of use"
    You skip over CtrlD (End Of Transmission). It is still used A LOT. Most UNIX (and Linux) tools accept CtrlD (End Of Transmission) to end a stream, file or connections. It is the recommended way of closing e.g. an interactive Python session, an SSH session, etc, etc, etc.

  • @jensschroder8214
    @jensschroder8214 Před měsícem

    The Baudot code is older than ASCII. A 5-bit code for teleprinters. But this code has the disadvantage that it does not accommodate all 26 letters and 10 digits.
    That's why there are two shift codes and two different characters per code.
    The 7-bit ASCII code accommodates the Latin alphabet, but lacks special characters used for French, Spanish, German and other languages.
    Therefore an 8-bit ACSII with code page was used.
    Other languages ​​cannot be represented.
    Unicode is used today. The first 128 characters correspond to 7-bit ASCII.

    • @cigmorfil4101
      @cigmorfil4101 Před 26 dny

      Though before 8-bit characters were used when the 8th bit was used by serial devices as a parity check (leaving only 7 bits for characters) devices (printers) had different regions programmed into them which could be selected via a code sequence and substituted the locale characters for standard characters. eg
      A printer set to UK would substitute '£' for '#' so that you sent "#5,899.99" to it and it printed "£5,899.99".
      Working with an Apple ][ with a printer set to UK listings would include things like PR£3 instead of PR#3.

  • @dfs-comedy
    @dfs-comedy Před měsícem +3

    Ctrl-D is still "End-of-File" in UNIX tty land.

    • @darrennew8211
      @darrennew8211 Před 29 dny

      Technically not. It's "send the buffered output without sending the ctrl-D". If there's no buffered output, the program gets a read length of zero, which is EOT. But if you type something first and hit control D, it just sends what you typed.

  • @dlwiii3
    @dlwiii3 Před měsícem

    I still work with DB2 databases which use EBCIDC encoding!

  • @robfielding8566
    @robfielding8566 Před 27 dny

    Oh yeah, I can really see the beauty of ASCII, when trying to do something similar to Morse... Braille.
    Braille is a total mess, because they insist on using 6-bits (ie: literally 6-dots). It is just barely not enough, because you need multi-chars for most punctuation, separate capitalization chars, etc. They have the opposite of Unicode; a dozen 6-dot codes to learn for different contexts. It has an ambiguous grammar. It's so complicated that the English standard is basically to cargo-cult around a C program (liblouis) into places where it should not be. They use Emscripten, because nobody can just implement the standard in Javascript, etc.
    Computer Braille by contrast uses 8-dots, but 7-dots in practice; to map exactly to ASCII. It is so far superior if you are a computer programmer, trying to just make the tools work correctly on an existing computer. There is even a way to encode all 8 dots of Braille so that you can encode binaries that you can read completely unambiguously. When using 7-bit for output, the 8th bit comes in super handy for input (arrow keys, etc).
    It's too bad that single and double quote don't have actual open/close variants, or standalone apostrophe. Braille has open/close quotes, but it makes things worse to try to use them; because the translation to ASCII is then ambiguous.

  • @TheEvertw
    @TheEvertw Před 22 dny

    The ESC code is still MUCH used by all sorts of programs across platforms. You shouldn't have skipped over that one.
    And you might have mentioned ^S and ^Q, which are the flow control characters. If you ever press ^S in e.g. a UNIX shell, it will hang until you press ^Q. Which is unfortunate, as many programs have re-purposed ^S as a shortcut for "Save".

  • @pedro1492
    @pedro1492 Před měsícem +1

    the best newline character sequence is linefeed, then carriage return, because it is backwards compatible with mechanical typewriters

    • @CTSFanSam
      @CTSFanSam Před měsícem +2

      Not on a real ASR33 teletype. PDP-11's used CR, LF, NUL. It took time to move the head from the far right to the far left. If you printed "blablabla....", CR, LF, "new stuff etc", the first letter of the next line would print as the head was returning to position one. So, Get the CR out first, then the line feed and a Null so the head could finish returning to position 0.

    • @HenryLoenwind
      @HenryLoenwind Před měsícem +2

      @@CTSFanSam You missed the joke.
      On a mechanical typewriter, the lever you grab first transports the paper, and then, at the end of its travel, you pull the carriage. Just like the handle of a car door, where the travel of the handle opens the door lock and then you pull the whole door with the same handle at its end point.

    • @0LoneTech
      @0LoneTech Před měsícem

      @@HenryLoenwind The levers I've used all pushed the carriage first (the literal carriage return), then feed a line once the carriage stops. I suppose your order may occur if the carriage is hard to move.

    • @darrennew8211
      @darrennew8211 Před 29 dny

      @@0LoneTech The levers weren't even on a standard side of the typewriter when ASCII was being developed. :-)

    • @HenryLoenwind
      @HenryLoenwind Před 29 dny

      @@0LoneTech Are you sure? I've never used a typewriter where the lever was that stiff. Those carriages are heavy, and you have no leverage. It also makes it a bit awkward to use as you now have no way of LF without first CR and instead have to turn the knobs counting clicks to match the line spacing. The other way around, you can easily CR without LF by pushing the carriage at any other point you can touch it.

  • @jovetj
    @jovetj Před měsícem

    Excellent video!

  • @CalvinsWorldNews
    @CalvinsWorldNews Před 29 dny

    I despise the EU political project, but I learned a LOT about computing from the Euro symbol, and having to work on its introduction. It must have been weird to show up as the (eg Saudi) delegate to one of the conferences in the 90s and have to talk to nerds about how other writing goes right to left, and how the character set is continuous

  • @rollinwithunclepete824
    @rollinwithunclepete824 Před měsícem

    Very interesting! Thank you