Part of the point here is that it's still better than the norm; better than what the majority will take. What is Good Enough and More Convenient will always win out - and it's better than good enough, and it's hard to get more convenient than "insert some lines of text, get song." Additionally, much of the AI music I'm referring to is styles of yesteryear: surf rock, classic rock, 80s new wave, hair metal, big beat, grunge, etc. - far more complex than what the 2010s and on have produced. AI music has already grasped that and more.
Vtubers will be more difficult, especially as it'd combine more AI elements than just speech. However, the advancements I'm seeing in speech, video, text, and response tells me that will take a few more years to get working at an acceptable level. Do not take Neuro-sama (bless her metal heart) as an indication of where such AI will be in a few years, nor as the only indicator of the technology (despite the scene basically only having Vedal doing this sort of thing - I see that changing by next year).