The registers of V9958 are a mess.

Andy Hu

zhlédnutí 2 403

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 9. 03. 2024
This is a follow up to my previous video on the V9958. Inefficiencies in register access is another big problem with the V9958. The issue started with TMS9918, which has a strange way to access its registers. Yamaha tried to improve register access speed, but practically the improvement is insignificant.
GitHub repository for my Z80 project:
github.com/Andy18650/HEC-Mode...
Join our discord server!
/ discord
Support my project on Patreon:
/ andy18650
Věda a technologie

Komentáře • 43

@static-san Před 2 měsíci ⁺²
The 9938 and 9958 definitely had register limitations from being descended from the 9918/9929. You mentioned this, but it was probably an important feature in the MSX2.
It is known that TI's engineers fought for an 8-bit port interface when they designed the 9918. The 9900's equivalent of the Z80's ports was single bit communication and that was going to be far too slow for video access. The 8-bit port was a big reason the 9918 was so popular in many other systems than TI's own 99/4a.
Providing an alternative and better laid out register interface would've made the register handling so much more complicated on the silicon. A halfway solution might've been to shadow some functionality in higher register numbers. Even an extra pin to be able to directly select the port properly would've been an improvement. I think a proper redo of the layout (with a backwards compatibility mode) might've happened a few generations after the 9958, though, if it had kept being developed. They might've also come up with a way for the CPU to access the video RAM directly by then, too. So much potential....
@andyhu9542 Před 2 měsíci ⁺¹
I agree. The V9978 may be the solution, but there is so little information around the chip, the register layout is unknown.
@kirill_bykov Před 2 měsíci
你好。Fully agree of what you say at 6:38. Is there any work you have in progress about those propositions? Why only ¼, ½, 1 and 2 scaling? Why not some fractions given as two 8-bit or 16-bit integers or at least 16-bit fixed point?
@Calphool222 Před 2 měsíci
The chaos with the data and mask bit to tell the chip you're talking about a register goes all the way back to the TMS9918. When Yamaha decided to build a backward compatible (ish) chipset, they inherited a *lot* of compromises that trace their existence to either constrained design choices from the late 1970s, or from literal arbitrary choices that the Texas Instruments engineers made (engineering was a little more "free spirited" in the 1970s than today). The half-baked composite-in line on the TMS9918 is a great example.... it allows you to chain multiple TMS9918s if they share a clock, but it's practically useless otherwise, which is *really* unfortunate, because it would have been amazing to have text overlay on video in consumer hardware all the way back into the mid to late 1970s, but alas, they only half implemented it.
Some of the decisions the TI engineers made were unfortunate, but they were what they were, and everything built on that architecture inherited those "features." The auto-increment thing was present in the TMS9918 as well, but was used for reading and writing to VRAM only. So Yamaha was borrowing an idea that was already present in the original architecture.
@andyhu9542 Před 2 měsíci
I think there is a way to synchronize external video to the TMS9918 by pulling reset pin to 9V or something crazy like that. It's hard to achieve with only 5V supply, but the feature is there.
@Calphool222 Před 2 měsíci
@@andyhu9542
Sort of, but since nothing is keeping the clocks in sync, if you feed composite from some source other than a shared clock, one of the images will "roll" because sync will slowly separate. As I said, they only *half* implemented a solution. They needed a way to achieve genlock, or to buffer the incoming video, which they didn't implement.
@andyhu9542 Před 2 měsíci
@@Calphool222 The TMS documentation doesn't go into whether this is H sync or V sync. However, I don't think the video would roll as one can input one sync pulse via the reset pin every field. The video may appear skewed, but not rolling. The bigger issue is that the input signal must be non-interlaced, but all TV signals are interlaced. Therefore, half of the input screen content would appear on completely wrong places.
@Calphool222 Před 2 měsíci
@@andyhu9542
It's been decades ago, but I tried to get it to work. It might have been tearing rather than rolling. I remember using a sync separator chip, and I remember messing with the reset line, but ultimately it just didn't work, and there was no real way to make it work. It was half baked.
@BGBTech Před 2 měsíci ⁺¹
Kinda of reminds me:
I had designed a graphics hardware interface for my project (a soft processor on an FPGA).
Initial idea was inspired partly by retro-systems, so I did it like:
Well, you have 80x25 cells, each 32 bits (initially with 8K of VRAM).
So, you could have options:
8x8 pixel character cell (via a font), some attributes, and a 512 color RGB space (RGB333);
4x4 bitmap cell, selecting between two colors from a 64-color space (RGB222).
Initially hard-coded for 640x400 output (line doubled from 640x200).
Ended up adding more base resolutions via flags, along with using flags to specify the relative size of pixel cells:
So, for 640x400 base mode: 40x25, 80x25, 80x50, ...
And, made cells bigger (64/128/256 bits); and expanded them to support more formats.
One subset of this includes modes similar to S3TC style texture compression.
Another does 8x8 pixels with 1 bit/pixel selecting one of 2 colors in (128 bits, 2 bpp).
Eventually added non-raster bitmapped modes, which interpreted each cell as a block of pixels.
Then, later, added a few raster modes by supporting a mode where each pixel cell was only 1 pixel tall (say, one can do a 320x200 linear hi-color mode by effectively setting it to an 80x200 mode with 64-bit cells within the bitmap sub-mode).
I don't generally go over to high resolutions though, partly because memory bandwidth limits resolution. Theoretically, it now supports bigger RAM-backed VRAM, but due to memory timing and bandwidth, it can't maintain good video output for modes than need more than around 8-10 MB/sec (such as 320x200 hi-color or 640x400 16-color; or 800x600 4-color if feeling adventurous; or use the color-cell modes...).
But, the design is pretty terrible looking, vs had I just gone straight for a raster mode; and used some more straightforward way of specifying resolution.
Also theoretically it has two separate scrolling mechanisms, but I don't use either of them...
The design of the audio hardware is also a little wonky. ...
Hardware design is fun though...
@ArneChristianRosenfeldt Před 2 měsíci
That is why a good software library designer thinks about interfaces from the start. Or ISA of a CPU. Or UX : Mac thinks about UX first. C64 : 8 bit machine? Let’s use 8 sprites ( per line and screen ). One char code and color attribute per 8x8 tile. NES : 256px resolution to fit sprite positions into 8 bit. Sprites and background are made of 8x8 tiles. Shared palette between background and sprites so that you can always park sprites or use a walking animation in the background.
I feel like a lot systems limit the sprite data too much. Artist should be free to use any width of tiles and any height. Flicker will remind them when they are out of bounds. Same for playfield pitch
@andyhu9542 Před 2 měsíci
I don't want to be too critical, but I don't understand several of your design decisions here. I assume that you are using the FPGA with an 8-bit CPU so if you aren't, please ignore 80% of what I'm about to say. 80x25 is usually only good for high-res text modes. Using it for graphics will make the cells appear very thin. Also, 2 color or 4 color cells are sometimes limiting when it comes to artistic expression. Also, the CPU may have a hard time moving large amount of data in VRAM.
@BGBTech Před 2 měsíci
@@andyhu9542 I am running a 64-bit soft-processor on the FPGA, albeit at 50 MHz, with a custom RISC/LIW style ISA (with a custom OS of sorts as well). Performance is roughly between 486 and Pentium-1 territory as far as I can tell (can run Quake on the thing, but framerates are a bit weak; Doom at present is around 25-30 fps).
As for 80x25, yes, it is a good mode for text. Yes, for color-cell graphics, it has non-square pixels. With the early design, 80x25 would have done 320x100, and 80x50 does 320x200.
With 64-bit cells, and a DXT1 style encoding, it could do OK 320x200 graphics in 32K of VRAM.
Typical early rendering strategy with Doom was to first render to a 128K 320x200 hi-color buffer, and then use a real-time color-cell encoder, then copying each frame into VRAM.
Later, I increased VRAM to 128K, which allows better looking graphics.
I have also still used a similar color-cell-encoder strategy for doing an experimental GUI.
Where, the idea is that the GUI first draws into a 640x400 hi-color buffer (512K), then color-cell encodes the framebuffer and updates the VRAM. Generally, each window has its own framebuffer, so the program first draws into the window framebuffers, and then the windows are drawn into the combined screen framebuffer, ...
But, performance is difficult with this (it is difficult to get the screen updates much faster than around 8 frames/second when running Doom or Quake). (Had also tried, as an experiment, running Doom and Quake at the same time in the GUI, but then overall performance drops to around 2 fps...).
Have also experimented with 640x400 indexed color (with a 256-color palette), but it is tradeoffs. It is difficult to get good looking colors with 256-color. Currently the palette is hard-coded, as trying to do an adaptive optimized palette is way too expensive for real-time use (it effectively takes several seconds at 50MHz to rebuild the lookup tables for an optimized color palette).
For 800x600 mode, currently the only viable options (due to screen-refresh memory bandwidth) are 2-bpp color-cell and 4-color modes. Both look "kinda awful" at present (though, color-cell looks less awful than 4-color mode).
Where, 4 color has a selection of fixed color palettes (including the traditional black/white/cyan/magenta), and a Bayer-pattern mode (where the palette alternates for each pixel; allowing a crude approximation of full-color graphics).
The color-cell modes can also work OK for video playback, as I am using custom video codecs based around a similar design (though, conceptually along similar lines to if one had LZ4 compressed MS-CRAM video).
Things like MPEG style codecs being a bit too computationally expensive to be practical.
Well, and BMP format images containing still images in MS-CRAM format, is also something I have been using (though, normal software does not understand this format). Well, and relatedly, also have a hacked BMP variant that adds transparent colors to 256 and 16 color bitmaps (for 16-color, the transparent color replaces the hi-intensity magenta).
But, yeah, admittedly I don't fully understand how old style GUIs worked, and based on performance, I suspect it is "not quite the same".
Can note that I am in an age-range where my childhood was mostly playing Quake 1/2 and similar on Win9x machines... having been in high-school when WinXP started to appear. So, no real first-hand experience with 8-bit era computers (but, earlier on, I do remember Windows 3.1 and similar, and game consoles like the NES, ...).
Also, technically, the rise of IBM clones started before I existed...
@BGBTech Před 2 měsíci
@@ArneChristianRosenfeldt Possibly true. The design of some things migrated pretty far from where it started.
Where I started, and where I ended up, in terms of the ISA design, it is almost unrecognizable.
I started out pretty close to the original SuperH ISA (and it was effectively a reboot of a prior design based on throwing extensions onto the SH-4 ISA; which had turned into a horrible mess). Now, not so much, the only obvious similarities are things like the Stack Pointer being in R15, the use of an SR.T bit for conditionals, R4..R7 being used for the first 4 function arguments, ... It started out with 16/32/48 bit instructions, now 32/64/96-bit; and admittedly some parts of the encoding are a little dog-chewed, etc.
Had thrown on an alternate decoder allowing it to run RISC-V / RV64G code, but RV64G gets slightly worse performance than my own ISA design (despite the general crappiness of my C compiler if compared with GCC). Though, most of this is likely due to RISC-V "shooting itself in the foot" by not having a register-indexed addressing mode and similar.
But, yeah, graphics hardware also mutated, and the design is a mess. Mostly using it for a mixture of text and bitmapped graphics (along with Doom and Quake). The early design was mostly text, but could handle Doom semi-effectively via a real-time color-cell encoder.
Though, for full-screen Doom, have mostly gone over to a 320x200 hi-color mode. Can note that 640x400 color-cell works moderately well for GUI.
Audio is also a bit wonky, using a combination of a audio-loop buffer (4kB, A-Law, nominally 16 KHz), and a music-chip design partly influenced by both the Yamaha OPL chips and earlier PSG chips (such as the SID), effectively resembles a 16-channel FM design, but also with square and sawtooth waves (and can use either sine or triangle waves for the modulators). Later glued on a feature where it can use 8/11/16 kHz A-Law samples (supporting one patch per FM channel), partly intended to be able to accelerate MOD and S3M playback. Alternately, it is possible to do the MOD and S3M playback in software, but this can eat a lot of CPU time on a 50MHz CPU (mostly went with S3M as the format made sense, whereas IT and XM seemed needlessly complicated).
But, the games I am testing with (mostly Doom and similar) would not make use of the hardware mixer.
Was using A-Law mostly because it can give quality closer to 16-bit PCM, at 8-bits per sample (with internal mixing mostly being done at 16-bits per sample). Mostly using 16 kHz as audio quality is considerably better than 8 or 11 kHz (while not as expensive as 22/32/44 kHz, with less gain in perceived quality).
For audio storage, 16kHz ADPCM works well IMO: Better quality than 8kHz 8-bit PCM, at the same bitrate. Also, IMHO, both faster to decode and better quality than low-bitrate MP3 (at low bitrates, MP3 tends to sound more like one is shaking a bunch of broken glass in a steel can; not a good sound).
The ADPCM approach works down to around 3 bits per sample.
For stereo, have often used a center/side transform, with the side channel encoded at around 1/4 the sample rate (say, 48 kbps center, 12 kbps side).
Well, and similarly for RGB555 as a working format, which is "mostly good enough" (and needs half the space of 32-bit RGBA8888 or similar).
Or, for image storage, often 8-bit indexed color. Or, the oddity of repurposing a limited form of 8-bit MS-CRAM as an image format; which needs roughly 2-bits/pixel. Though, in its simplest form, this is essentially 4 bytes per 16 pixels: 16-bit pixel selector, and two color-endpoint bytes. )
...
@ArneChristianRosenfeldt Před 2 měsíci
Why is RISC-V missing an addressing mode? I thought that just like MIPS it always adds a register and the immediate value. You are free to set the immediate to zeros and reserve a register to contain zero. But then I think there is already a zero register to eat the output of .. the output of what actually? Those CPUs have no flags.
I did not know that audio data rate is a problem. So I am not good at hearing, but I thought that Music with 3 channels, and 3 channels for effects are enough. Even with 8 channel, the data rate to memory is far below video demands. I don't like analog audio. I like the DSP in SNES, Atari Falcon and Jaguar. People complain about dithering on the PSX, but forget that it looked okay on TV. My Riva128 also dithered, on a high quality CRT, but also at higher resolution. Sadly, additive shading was broken in the driver and fog would exaggerate dither. Jaguar can use a color space with luminance. It is said to make Doom look best. No colored lights (Quake2) though.
Ah so, SH2 has a single flag. Even with the powerful branch instructions, I think that ADC between integers with custom numbers of bits is more natural for some applications ( fixed point maths).
@@BGBTech
@MK-ge2mh Před 2 měsíci ⁺¹
I enjoy your content. Always interesting. I was going to use this chip for a homebrew computer until I saw your videos. Perhaps it's not worth the effort.
@andyhu9542 Před 2 měsíci
I wonder which CPU do you use? If it's an Z80 I would say a TMS9918A is enough. I had a lot of fun with the TMS9918A. If you don't like using 16k*1 chips try the 9118.
@MK-ge2mh Před 2 měsíci
@@andyhu9542 I built a computer using the HD63C09 which is an MC6809 clone with more registers, instructions, and runs faster (fewer cycles and faster clock). It's currently using a 128x64 graphical LCD display, 2 UARTs, a C64 keyboard, an SD card (I'll soon replace with Compact Flash), 128K SRAM bank-switched. It runs Microsoft Extended BASIC, FORTH, C, etc. I have FUZIX OS mostly working on it. For months, it's been sitting on my desk running a demo-loop of several programs I wrote like plasma, 3D maze (like Doom), small flight simulator, spinning 3D objects, several Amiga Boing balls bouncing around, a few typical 8-bit 1980's line-demo variations.
That processor is the best 8-bit CPU of the 1980's. When you examine the instruction-execution cycles, it give the 68000 a run for its money for both 8- and 16-bit operations. It performs 16-bit multiplication in less than half the cycles of the 68000, and 32-bit division in a quarter of the cycles. My computer is clocked at 3 MHz (for which the CPU is rated), but I'm not exaggerating when I say that it's probably comparable to a 6 MHz 68000 on 8- and 16-bit operations. It runs perfectly fine above 5 MHz, but I would need to add more wait-states when communicating with the GLCD since it's slow.
@Programmiernutte Před 2 měsíci
I'd say go for the V9958. It's available, it has more interesting modes than the 9918. Yes, it's slow and a bit weird, but it is the best available option apart from rolling your own.
@AK-vx4dy Před 2 měsíci ⁺³
Wow it is really messed but first thought is the vicitim of preserving backward compatability or conspiracy to llimit number of developers to those who given secret documentation :D
To be fair, i don't know times where V9985 was constructed but in many caseses in late '70 and early '80 this was real fight for transistors and real posibilites of manual tracing of silicon.
That's why instruction lists on 8-bit procesor are so not heterogenus and that's why MC68k is so beautifull symertic cutie ;)
Many I/O chips have many quirks like read-only and write-only registers, wierd timing dependecies, writing order dependecies (excluding using OTIR), i think Amgia Cooper was final response
to dealing with such things :D
Seperate scroll registers have some sense, becuase they are realised in seprarte parts of chip, one change adresses, one shifts timing reference.
@ArneChristianRosenfeldt Před 2 měsíci ⁺¹
SH2 and ARM TD have half the register count of the 68000 . VIC-II and TED are quite clean.
This video chip came out after 16 bit CPUs came out. So design tools should be better.
Actually, I think that a 6502 like computer could be reimplemented in a simple way with orthogonal addressing modes. You just need to accept that transistors are fast. Then use a generic circuit for the byte/word field. So a loop in the PLA . One PLA for addressing, one for data.
@AK-vx4dy Před 2 měsíci
@@ArneChristianRosenfeldt 6502 is sepearte own beautiful story 🙂
If was made after 16bit then conspiracy theory wins or someone with less skill was finished project of someone who leaved.
@ArneChristianRosenfeldt Před 2 měsíci
@@AK-vx4dy many people have their take on that chip. But surely it was designed before it was clear what the MOS fab would be able to do. And then came slow DRAM and shared memory. The PLA design and the busses on the 6502 are all optimized to do as much in a clock cycle and at the same time run as a high as a clock rate as possible. In the end the CPU was always limited by RAM. Z80 came later and built on this information. Intel was a bigger company and more sure about their fab.
Nonetheless, 6502 wants to address stuff in almost every instruction. Encoding already starts with the instruction length in bytes. So length = 1 : implicit addressing or register ( TAX TXA ). 2 : immediate8 , 3 : absolute
@AK-vx4dy Před 2 měsíci
@@ArneChristianRosenfeldt I don't have a take after I watched full story of its origin and times were then.
And even before this he was matched to those times memory speeds almost ideally, z80 although has faster clock but used more cycles wich in result given simillar result for memory accesses speed.
Both were unique and amazing but in other directions.
I think 6502 is amazing taking in the account that was intended to be cheap industrial microcontroller.
@ArneChristianRosenfeldt Před 2 měsíci
@@AK-vx4dy I just found out that prior to 1977 the 6502 was slower and STA had a delay between output of address and Akku ( why even, Akku is const .. Long bus . Or this is about other writes like push or JSR ).
Z80 has a better duty cycle. Z80 and memory work in parallel, while 6502 and memory work alternating. So both are idle half of the time. Apple I used the idle time for video. Z80 is still idle after instruction fetch. You can use it for memory refresh, but you don’t need to. So Z80 is like: Fetch, Video/Refresh, execute, writeback for most instructions.
I should check the timing on individual instructions 6502, but how can immediate16+X register addressing mode be so fast? The addition works in a pipeline fashion with reading the immediate value while incrementing the PC. Then there is cycle to copy the high byte over the bus into the AH register so that it is available really early in the next cycle.
So there is only one unused memory cycle. 6502 fetches the next in the last cycle of the previous instruction and decodes it? Fetch/decode is in a single cycle. This is probably the limiting circuit for clock rate. At least we know what 6502 does with the phase where it is off the bus here. In a pipeline design which wants to utilize all transistors to the max, we won’t to this. Also not in CMOS (65C02 anyone? ). But for nMOS minimum latency is king. Spit out PC and let the signals race until the control lines are set for the first cycle of the instruction.
@johneygd Před 2 měsíci
Well this ms2 vdp chip might be messy all the way BUT what if we could make an interface chip to make it possible to port games from other systems to it,sure there will be still some manual adjustments to be made by the user but if it will make porting games to the msx2 faster and easier,well then why not,because that would be great😁
@andyhu9542 Před 2 měsíci
The issue with porting is that almost all other systems support more color than MSX2. Even NES has 25 on screen (or even per line). Porting games to MSX2 means that the artwork has to be remade with new color limitations (16 on screen). Which is sometimes the majority of work involved in porting a game.
@mrkosmos9421 Před 2 měsíci
V9958? No thanks, I'll take a 6845 any day
@andyhu9542 Před 2 měsíci ⁺¹
Well, implementing a bitmapped graphics mode on that chip would be a pain...
@mrkosmos9421 Před 2 měsíci
@@andyhu9542 ... yeah...

Další v pořadí

Automatické přehrávání

The Commander X16 does not deserve the hate