
因为看到了不错的焚诀,所以就用 ChatGPT 试了试。
这两天 QQ 群里流传着“在吃火锅时偶遇 Coser”的 AI 图片,比如这张遐蝶的:
还附带了焚诀(使用 ChatGPT 生成图片):
A casual iPhone snap of a female cosplayer perfectly replicating the character from the reference image.
EXTREMELY STRICT identity match: same hairstyle, same twin tails, same hair color gradients, same hair accessories, same elf ears, identical outfit design, and the same overall character appearance. The face’s proportions and features must be exactly consistent with the original character’s – it must be clearly recognizable as that character, not just a pretty face with similar features.
Face: Very attractive but natural-looking; features a beauty style that combines realism with an anime-like quality. Pores are visible but slightly smoothed out. Light, natural makeup is applied. Soft blushes are worn on the cheeks and nose. The skin appears slightly oily and shiny due to the heat from eating hot pot. There’s also a slight redness from the warmth, along with minor imperfections like slight sweating and uneven texture. In short, it’s not overly perfect, but still beautifully natural.
Expression: She notices the camera, slightly turns her head toward it, and makes a small, casual gesture—like a quick peace sign or a slight smile. Her expression remains natural, not forced. Her eyes briefly meet the camera’s gaze, but it doesn’t seem like a deliberate pose. It feels like a quick, friendly response, not something done for a photo shoot.
Hair: Slightly messy due to heat and movement; a few strands sticking to the face or neck. There’s natural motion blur in the hair, and it appears slightly flattened from sitting.
Outfit: Perfectly faithful to the original design. The fabric is authentic, with realistic wrinkles. There are slight signs of wear from sitting and eating; some small details have also shifted slightly from their original positions.
Pose: Slightly turn the body toward the camera. One hand continues to hold the chopsticks or rest on the table. The other hand makes a casual gesture, such as making a peace sign or waving lightly. The body posture should be relaxed and natural. This isn’t a formal pose, but rather a quick, spontaneous reaction.
Scene (IMPORTANT): At the Haidilao hotpot restaurant, the cosplayer is sitting at a different table—either next to or diagonally across from the original table. She’s seated near a wall or in a booth-like area. The background is relatively uncluttered, with wall panels or mirrors in view. She’s still eating with her own group.
Environment: Clean dining area near the wall, with soft wall lighting. Light steam rises from the hotpot. The table is prepared with meat dishes, drinks, and sauces.
Framing (VERY IMPORTANT): It looks like the image was taken from your own database. The subject isn’t centered in the frame. The image is slightly zoomed in, and the cropping is awkward. Part of the subject’s body is cut off.
Foreground (EXTREMELY IMPORTANT): Your own table takes up most of the foreground space. The hotpot, soup, chopsticks, and plates are clearly visible. The edge of the table blocks part of the lower frame. Your arm or shoulder partially obstructs the view. Another diner also partially blocks the frame. The foreground is slightly out of focus.
Camera: Poor composition, slightly tilted image, mild motion blur, slightly inaccurate focus, visible noise, JPEG compression artifacts (typical of WeChat images). The lens also has smudges and oily blur, with a finger partially covering one of the corners of the image.
Lighting: Mixed indoor lighting (warm yellow + soft white); exposure is slightly uneven. The lighting on the walls is softer than that in the central hall. There are reflections on the skin and table surfaces, with steam dispersing the light.
Enhanced realism: slight steam drifting in front of the subject, subtle background motion blur, and minor occlusions where objects like a cup, arm, or chopsticks block part of the view.
Mood: You’re eating normally when you notice a highly skilled cosplayer at another table. You zoom in to take a photo of her. She notices and responds with a friendly gesture in a natural way. It’s a spontaneous interaction, but it doesn’t seem staged at all.
Style: Raw iPhone snapshot – no professional editing or staging involved. A natural, candid look with slight interaction between the subjects. There’s also a playful atmosphere, as if they’re aware they’re being photographed, but still remain relaxed and natural.
我上传了一张 Saber 穿着常服的插画作为参考图,初次生成的结果:
我发现了一些问题,比如这个提示词生成的角色大部分时候都是精灵耳朵(而且是套在人类耳朵上的,看起来很奇怪)。另外周围坐着的人经常抢镜(虽然这是为了增加真实感),所以我让 ChatGPT 进行了一些修改。你也可以直接在原提示词后面追加:
使她的耳朵保持正常人的耳朵外观,而非精灵耳朵。
只保留她一个人,周围不要出现其他人。
使她位于镜头中间。
略微减少脸部皮肤上的油光,减少出汗的感觉。
修改后的结果还行:
我又让 ChatGPT 生成了一个 Saber 喂我吃蛋糕的场景,也是穿着常服的。我直接把上面生成的那张图片和常服的衣服当作参考图:
生成的结果:
提示词是我自己写的,不专业,而且我发现 AI 生图还是不太好用,很多细节必须说的非常清楚。比如面对面喂蛋糕这个场景,人类的大脑里很容易就能联想到空间关系、角色的动作。但对于 AI 还是必须说的很仔细,否则连角色的面向、动作都可能出现差错。因此想生成符合要求的、好看的、AI 风格不明显的图片,就需要像吃火锅那样写很详细的提示词。这还是太困难了。而且 ChatGPT 现在生成的插画总是容易显得很繁复(特别是褶皱的处理太明显了)。
下面是提示词,你也可以试试或者优化一下,如果生成了好看的图片可以分享到评论区里。
帮我生成一张saber(阿尔托莉雅)的二次元风格插画:
画风:萌系可爱,接近动画风格的插画,色彩明亮鲜艳,线条简洁流畅。插画风格强烈,可以看出线条和笔触的存在,具有一定的手绘感。上色干净。看起来像专业的动漫插画,具有高质量的细节和表现力。
服装:saber 穿着常服(居家服饰),衣服可以参考我上传的图片。
场景:在室内的小餐桌上,想象我和saber在餐桌两侧面对面坐着,她在喂我吃蛋糕。
视角:我不必出镜,因为画面是我的第一视角,所以只需要拍摄 saber 即可。她正对着我(同时也是正对着镜头)坐着,上半身出镜。
摄像机的位置:saber坐在桌子的一端,我(摄像机)坐在桌子的另一端,我俩相对坐着,没有侧身,而且面对面。
动作:saber 左手端着有小蛋糕的盘子,右手用叉子把一块小蛋糕向我嘴里喂。她的心情愉快,注视着我,面带笑意。我俩的身体都略微前倾,所以镜头需要很靠近 saber。需要注意姿势的合理性,让 saber 的右臂向镜头(我)的位置伸过来,这样看起来是在喂我。不能让saber看起来像是在给自己吃蛋糕。saber 手里拿的叉子要向前伸,叉子前端离镜头最近。
餐桌是浅白色的,有高级质感。餐桌上可以放一些小茶点、甜品等作为装饰。
光线:室内光线柔和,并且在saber的右侧有略微的补光,以突出她的面部表情和动作。
氛围:画面整体氛围比较明亮温馨。
画面比例:使用 4:3 比例,使 saber 的上半身能完全显示,而不会被裁切。
注意:
- 在生成的画面里,“我”的身体完全不会出现,只会出现 saber。
- 务必准确理解摄像机的位置。不可以改变摄像机的位置。
- 添加适当的细节来增强画面的真实感,比如背景、餐桌上的食物等。
- 前景内容都是清晰的,不能模糊。尤其是叉子由于靠近镜头,可能会变得模糊。但这是插画,所以你需要让叉子等所有前景内容是清晰的。
- 要降低生成的图片的AI感,让画面更接近手绘的真实感。要注意让物体的透视关系保持正确。细节合理(例如叉子上叉着的蛋糕不能看起来错位、飘浮)。不同物体的质感应该有所区分,不能让所有物体都像是同一种材质,尤其要避免“光滑塑料感”。
- 衣服上不要有太多的褶皱。褶皱处理要简单,阴影要淡,不要处理的太过逼真和细致,以保持手绘的感觉。
- 注意手指、指甲的透视正确、形状正确。手指应该较为白皙纤细。
其实我一开始写的比较简单,但生成的图片总是存在问题,因此我不得不一次次追加要求,越写越多。所以现在 AI 生图还是个麻烦事。
这个东西就跟产品经理的“加个小功能,今天做好”一样
你脑子里有个画面,对面就算是活人,也没法知道你脑子里的画面,还是要你把细节补上去。
还是等脑后插管吧(?