How Well Does Chat GPT Know Commander Cards?

Sdílet
Vložit
  • čas přidán 29. 06. 2024
  • #mtg #thetrinketmage #trinketmage
    Patreon:
    / thetrinketmage
    Sorry about some of the breathing noises my noise gate seemed to not work when recording this. Let me know what your score was!
    Channel Art by Beevuu:
    Insta: / beevuu
    Twitter: / beevuu
    All the Music is by Chillpeach:
    / @chillpeach
  • Hry

Komentáře • 62

  • @mr.whistler6114
    @mr.whistler6114 Před 2 měsíci +23

    Remember : ChatGPT will never think outside of the box. ChatGPT is the box.
    Edit : In the AI response for Forcefield, ChatGPT talks about it not being ''essential for most decks withing its colors''. Forcefield is colorless, so only an AI would think of it the same way as Black/Blue/Green/Red or White because those are the six options plausible for MTG deckbuilding. Furthermore, we can deduce that ChatGPT didn't understood a ''colorless card'' as the idea of a card devoid of colors, but as a sixth color that, in the MTG rules, can be blend in every color types of decks, thus why it speaks of ''its colors'' in plural.

  • @admiralatom5990
    @admiralatom5990 Před 2 měsíci +55

    The biggest take away is that CHATGPT doesn't refer to itself in its answers. Some of the people used "I" when they answered.

    • @thetrinketmage
      @thetrinketmage  Před 2 měsíci +7

      Didn’t notice that!

    • @TeaAddict1
      @TeaAddict1 Před 2 měsíci +3

      I also saw chatgpt using I.

    • @devan9197
      @devan9197 Před 2 měsíci +2

      Honestly for me it was pretty obvious and that gave it away a lot

    • @natelagrassa9337
      @natelagrassa9337 Před 2 měsíci

      Yeah AI don’t refer to themselves very often… in formal writing to drive a point home you don’t use the word “I.” I picked up on that too, lol.

  • @Jerma985_fan
    @Jerma985_fan Před 2 měsíci +13

    woah demonic tutor trip me up I'm surprised someone gave that a B.

  • @Mwarrior1991
    @Mwarrior1991 Před 2 měsíci +4

    without fail, chat gpt would repeat itself "demonic tutor is an incredibly powerful card allowing you to search your library for any card"... "its ability to fetch any card greatly increases consistency..." redundant information each time.

  • @DMZZ_DZDM
    @DMZZ_DZDM Před 2 měsíci +4

    ChatGPT will use "creative" language to fill up space and will always expand on surface level issues while only brushing on more nuanced details that affect the broader game. Also, its trained mostly on business emails, pamphlets and guidebooks so it has an inherently sanitized vibe to its responses (unless asked to use a different tone)

  • @trevordumais2117
    @trevordumais2117 Před 2 měsíci +8

    Bros rating Mechanized Production as a D forget that treasures exist. It all goes back to Smothering Tithe.

    • @thetrinketmage
      @thetrinketmage  Před 2 měsíci +3

      Smothering tithe really was the hero of this story

  • @atticussalmon9064
    @atticussalmon9064 Před 2 měsíci +4

    The AI says "decks within it's colors" or some variation of that A LOT, kind of a giveaway

  • @solarupdraft
    @solarupdraft Před 2 měsíci +7

    The Assassin's Trophy one was interesting, because it makes you consider who would be more likely to make that mistake in their writeup. It's also inconsistent, saying "any nonland permanent" in one line and "any permanent" in a later one.
    For me the Mechanized Production one came down to "which author is likely to go off on a non-magic tangent?" Also, the final sentence of the right hand text seems to contradict the entire message preceeding it, depending on how you define something "being a riot."

    • @thetrinketmage
      @thetrinketmage  Před 2 měsíci +1

      Yea those are the AI hallucinations which causes it to be wrong

  • @KirioGameNote
    @KirioGameNote Před 2 měsíci +5

    I really need to hear more on the patron’s thoughts on giving demonic tutor a b

    • @thetrinketmage
      @thetrinketmage  Před 2 měsíci +3

      I made it anonymous so unless they tell me, I also won’t know more

    • @DMZZ_DZDM
      @DMZZ_DZDM Před 2 měsíci

      I would have given it an A, but yeah, it isn't an S imo

  • @TeaAddict1
    @TeaAddict1 Před 2 měsíci +4

    I noticed thay chatgpt has a habit of reiterating the prompt. It always talks like its checking off items on a checkbox.

    • @thetrinketmage
      @thetrinketmage  Před 2 měsíci +1

      I feel like that’s how a lot of AI work looks.

    • @cinderheart2720
      @cinderheart2720 Před 2 měsíci +1

      I swear they didn't used to and now they always do it, in any context. Its very frustrating.

    • @violetto3219
      @violetto3219 Před měsícem +1

      it's got the vibe of trying to fill space in a high school writing assignment you reeeeally don't want to do

  • @Xhosant
    @Xhosant Před měsícem

    The ikea giveaway was that it started its (surprisingly poetic) metaphor along the lines of 'it's a doomed project', and then twists to 'and when it works it's neat'. That sudden context switch was suspicious.
    Speaking of context, overall ChatGPT will provide too much of it explicitly, compared to human answers using it implicitly and with less regard about you having it. From needless clarifications to tying back to the assignment's phrasing, that was a pattern for ChatGPT, feeling like a grade-school essay - answering as it expected you wanted it to answer. Contrasting, the humans would often use subtler slang or context cues.

  • @XiaosChannel
    @XiaosChannel Před 2 měsíci +2

    16:09 that's why you either use the API or always restart a new conversation per case

    • @thetrinketmage
      @thetrinketmage  Před 2 měsíci

      Yea I didn’t know it was gonna do that. Made for a funny bit though

  • @aleksihakli1125
    @aleksihakli1125 Před 2 měsíci +1

    "ChatGPT doesn't care about budget" You're telling me. I asked some recommendations to my food token life drain deck.
    It recommended such affordable cards like anointed precession (60€) doubling season (36-40ish €) teferi's protection (48€) parallel lives (33€) exquisite blood (23€) and many, many more cards well over my budget.
    I think the cheapest card it recommended was beast whisperer that I already HAVE IN MY DECK.

  • @Ent229
    @Ent229 Před 2 měsíci +4

    Commenting before watching: I predict the LLM will have good syntax in its responses but will fail some of the semantics. Likewise I expect its fake "reasoning" to be heavily biased towards generalities and other common responses. I predict patrons of MtG to understand the semantics. I also expect those patrons to be capable of novel reasoning, but likely to give general answers. (common responses are common for a reason). As for the ranking, I would expect the LLM would have a higher mode in their answers and the Patrons answers would have a broader spread.

    • @thetrinketmage
      @thetrinketmage  Před 2 měsíci +3

      Novel reasoning ends up being a huge giveaway! I think you are spot on

    • @Ent229
      @Ent229 Před 2 měsíci +1

      While watching (my guesses of the identities and tracking the rating scores). My guess for the AI in brackets. Actual AI in parentheses.
      1 [(A)] or C. Initially guessed based on accuracy. Doubled down based on generic AI answer vs novel Patron answer.
      2 [(C)] or C. Again, novel responses help guess the Patron.
      3 [(A)] or S. One answer repeated itself in a redundantly redundant explanation.
      Huh, the AI downgraded it to nonland permanent. Was that due to generalizing answers or due to not understanding the semantics of the card? Both are factors but I wonder which had a bigger cause.
      4 D or [(C)]. Initially guessed based on accuracy. Further confirmed by the novel Patron answer (silver bullet draft design). Even further confirmed by the LLM having no context for the lack of horsemanship.
      5 [(S)] or B. Initially guess based on accuracy (unless the patron is trolling, or arguing that it is too powerful to fit in many commander decks without moving the deck away from the desired power level). Wow the reasoning is making me reconsider. The S ranking said "any deck within it's (Demonic Tutor's) colors (plural)". Why the implication of plural? There is also more redundancy in the S's reasoning. I am changing my mind.
      6 [(A)] or B. The LLM likes listing literally the same logic repeatedly. The Patron response was more novel.
      7 D or [(D)]. This one is tough. The left was more novel.
      Wow. I expected something like 55/45 odds there. Let's Go!
      8 [(S)] or B. I initially guessed based on accuracy, but the B has the novel response, so it must be the Patron. LLM wouldn't do that. And once again the LLM uses "decks within it's colors" when talking about a mono white card. Why the plural? Also the card needs to fit within the deck's colors not the deck fit within the card's colors.
      9 [(A)] or A. "Decks focused on defending against large attacks"? Also the Patron is once again the more novel answer.
      10 S or [(S)]. Redundant LLM response is redundant.

    • @Ent229
      @Ent229 Před 2 měsíci +1

      After the 10 scores:
      Patron scores: SSABBBCCDD (5 different ranks. Somewhat biased towards B but really spread out otherwise)
      LLM's scores: SSSAAAACCD (4 different ranks. High bias towards S or A)
      Since my 10/10 accuracy was based on my reasoning of the LLM's limitations, I think it is soft evidence that my predictions about its limitations might be accurate.

    • @Ent229
      @Ent229 Před 2 měsíci

      Bonus Round? 1. [(B)] or C. The C had a novel response.
      Final thoughts: We already know ChatGPT does not try to evaluate cards, so it is not suited to evaluating cards. (Don't use a saw for a hammer's job). Beyond its lack of motivation to judge cards, it does not understand the card or their context enough to judge them. Additionally we see it's general answers as a clear marker of the LLM answer. It is trained to give a "reply-like" response that was a likely reply rather than a reply that was likely to be correct. Specificity and nuance are things it is trained to avoid.

    • @Ent229
      @Ent229 Před 2 měsíci

      Your patron's evaluation seems within the norm for commander players. They can mostly evaluate cards, and there is some subjectivity that make the "surprising" evaluations still have merit.

  • @drunkcapybara7004
    @drunkcapybara7004 Před 2 měsíci +1

    Dang, i actually got the Mechanized Production wrong as well, what threw me off was the mention of wasting 2-3 slots and getting "the combo", since there was no prior mention of what other slots are wasted for what combo, and these inconsistencies are a big problem of AI.
    Should have focused more on the same problem in the other text, the card being able to be "a riot" contradicting the D rating.

    • @thetrinketmage
      @thetrinketmage  Před 2 měsíci +1

      Yea it was such a weird response for that card

  • @nahboh1897
    @nahboh1897 Před 2 měsíci +1

    I agree with the demonic tutor Rating , but not its waste of space but because it is a tutor it makes the deck to consistent so the deck does the same thing every time and make it a less fun deck to play against.

    • @drunkcapybara7004
      @drunkcapybara7004 Před 2 měsíci

      Valid point, especially in casual settings and for decks with a very clear and not super varied gameplan.
      For example, my Kathril deck only really wants to fill the graveyard with keywords, and i took Entomb out of it because i would always tutor up Zetalpa which made the deck play very monotonous (amplified by how terrible the precon is at filling its graveyard so i took a lot of mulligans, but Entomb of course was always keepable) and now that i'm replacing a ton of cards soon, i think i might also cut Vile Entomber and Buried Alive, and exclusively rely on what i happen to mill/sacrifice.

  • @FranciscoJG
    @FranciscoJG Před 2 měsíci +4

    Oooohh, surprise Snail participation :D

  • @l1ghr
    @l1ghr Před 2 měsíci +1

    10:40 interesting option

  • @SwedeRacerDC
    @SwedeRacerDC Před 2 měsíci

    Lord of Extinction: I was right from the grade alone
    Lightning Bolt: They had the same grade, so I guessed correct based on the description
    Assassin's Trophy: I wasn't sure on the grade, because I don't use it in 5C decks typically, but the description was obvious to me.
    Taoist Mystic: Obvious from the grading.
    Demonic Tutor: I honestly don't love using tutors that much, but I was wrong on this one. I think it's an A, right in the middle.
    Panharmonicon: I'm correct...Chat GPT is just stupid at this point. Lol
    Mechanized Production: Same grade, so had to guess based on description. Both descriptions were wild... But I was right. I think it's a C. It's fun and can win on the spot, especially now that we have Obeka, but even with extra turns.
    Smothering Tithe: I needed the description on this one, but got it right. I still think its a better grade than the human gave it.
    Ink Shield: I lost to this card. It's great. You will likely win if everyone else has been eliminated. I was right from the description.
    Tropical Island: The description helped. Right again.
    Forcefield: I was right and that's an interesting card. Of course it's on the reserved list.
    Chat GPT is fairly easy to sus out. But it's still interesting to see.

  • @leax1337
    @leax1337 Před 2 měsíci +2

    I recently build a Deck with ChatGPT aswell, the cards were so random i had to put it into a power level calculator, because i didn’t understand the deck myself, which put out a 10 for some reason. ChatGPT always tried to put Rhystic Studys in the Deck xD
    (It was green black)

    • @thetrinketmage
      @thetrinketmage  Před 2 měsíci +1

      That’s funny maybe I’ll need to try that too

  • @Demoncoregobrrr
    @Demoncoregobrrr Před 2 měsíci +1

    rad, got recommended your work early

  • @AutumnReel4444
    @AutumnReel4444 Před 2 měsíci +3

    Yeahhh very not hard to guess. AI ain't killin us yet

  • @BS-bv5sh
    @BS-bv5sh Před 2 měsíci +3

    I enjoy your content.

    • @thetrinketmage
      @thetrinketmage  Před 2 měsíci +1

      I’m glad! I know this one is a bit different so I’m happy you like it

  • @v3rsatile_V3
    @v3rsatile_V3 Před 2 měsíci +4

    tbh instead of running demonic tutor you should run it until you play it, then whatever you search for just get another version of that effect, if you search for a boardwipe, put another in the deck. simple really

    • @thetrinketmage
      @thetrinketmage  Před 2 měsíci +7

      I do like this idea, though the flexibility of a tutor I think makes it worth it!

    • @Jacob-km4yb
      @Jacob-km4yb Před 2 měsíci

      What he said the flexibility to get let's say a board wipe OR a single target removal spell because you have a big board presence makes it way better imo

    • @v3rsatile_V3
      @v3rsatile_V3 Před 2 měsíci

      @@Jacob-km4yb . . . I know

  • @anabsurdlylongnameme8948
    @anabsurdlylongnameme8948 Před 2 měsíci

    What version of chatgpt did yall use? 3.5 is terrible, 4 is great but behind a paywall. If yall used 4, did u put any additional reference info in?

  • @epi1763
    @epi1763 Před 2 měsíci +1

    Next time ask chat gpt to write like a normal personnor dumb it down and feed it other peoples reviews so ot wrotes on a similiar context

  • @CD-sl7ld
    @CD-sl7ld Před 2 měsíci +4

    I love you

  • @robertomacetti7069
    @robertomacetti7069 Před 2 měsíci

    to be fair to chat gpt
    it never played commander, freacking out over lord of extinction is a classic noob mistake

  • @hoffedemann5370
    @hoffedemann5370 Před 2 měsíci +1

    "highly desirable" "in its colors" "extremely valuable" "versatility" "particularly those in XYZ strategies" are dead giveaways.
    Also Ai do be yappin' with way too eloquent words all the time

  • @JustinNovack
    @JustinNovack Před 5 dny

    If both answers were then (re-)summarized by ChatGPT, it may have removed the obvious bias that is inherent in the verbiage and language used of ChatGPT. Clear prompt reiteration from ChatGPT and "I built a deck..." phrasing from the humans made this not much of a game.

  • @orobors
    @orobors Před 2 měsíci +1

    Personally, I think Demonic Tutor is a B or even C in most casual metas. If I were to pull out a Demonic Tutor, I'd probably get focused on because my playgroup doesn't run $50 cards unless we're proxying high power or cEDH. In a lot of games, Demonic Tutor is just too focused/good to be worth slotting in, since it gets people to target you.

    • @thetrinketmage
      @thetrinketmage  Před 2 měsíci

      Interesting I often think of it as a charm effect. I don’t know if I’ve ever been explicitly targeted because of it

  • @Raghetiel
    @Raghetiel Před měsícem

    Whats funny, chat gpt learned to talk about MtG from real people chats. So if you're gonna blame anyone, blame reddit)

  • @ellie6091
    @ellie6091 Před 2 měsíci

    boooooo. AI is dumb, and you shouldn't be feeding it more data.

    • @thetrinketmage
      @thetrinketmage  Před 2 měsíci +5

      Me making articles or videos feeds it data. Not me asking questions. Just asking questions isn’t really training it

    • @Ent229
      @Ent229 Před 2 měsíci

      I would not be surprised if the questions are saved as more raw data to feed it later.