ramoz a month ago

FYI: For Flux, there is a lot more power in the text-encoder & you can prompt with more meaningful and comprehensive sentences. Thus, less of the traditional comma separated & concise phrasing we saw in stable diffusion.

You should do the same with your training images. Caption everything you do not want the model to remember as "you" (what you're doing, wearing, accompanied by, accessories, etc).

  • AuryGlenz a month ago

    You can do that, but you can also just do your name (or, I do two prompts - one the name, the other “A photo of _name_” as that makes it easier to prompt non-photographic images). This has the bonus of not training that particular shirt or whatever in, so if you prompt for something similar it won’t be identical.

    That said I usually train with about 100 really varied images of the person so it tends not to overlearn any other particular thing.

isoprophlex a month ago

I did this for our beloved, dead cat... On replicate, too. I loved the results, until at one point I suddenly got really creeped out about the thing I was doing.

  • ryandvm a month ago

    This is going to be big business I think. I have probably sent hundreds of thousands of emails, texts, chats, etc. It would be well within the realm of possibility to train an LLM on a loved ones communications corpus and allow you to chat with "them" after they're gone.

    Possible? Yes. Convincing results? Probably. Good idea? I doubt it.

    • mipmap04 a month ago

      Oh man, I did this with my dad's voice after he died and set up a thing where I could talk with an LLM-backed assistant and have it respond in his voice and mannerisms. It was a very weird coping and grief period and I ultimately hit a point where I got really weirded out about what I was doing.

    • portaouflop a month ago

      I think that was 1:1 a black mirror episode

    • slig a month ago

      I remember seeing it here on HN that someone did that with a group chat and it would reply as each friend.

    • waspleg a month ago

      Literally a Black Mirror episode.

    • knicholes a month ago

      This is exactly what I'd want to do for my "smart urn."

      • TeMPOraL a month ago

        Code golf task: implement the whole pipeline above in minimum amount of (existing as of now) ComfyUI nodes.

        Extra challenge: extend that to produce videos (e.g. via "live portrait" nodes/models), to implement the digital version of the magic paintings (and newspaper photos) from Harry Potter.

        EDIT:

        I'm not joking. This feels like a weekend challenge today; "live portraits" in particular work fast today on a half-decent consumer GPU, like my RTX 4070 Ti (the old one, not Super), and I believe (but haven't tested yet) even training a LoRA from a couple dozen images is reasonably doable locally too.

        In general, my experience with Stable Diffusion and ComfyUI is that, for fully local scenario on normal person's hardware (i.e. not someone's totally normal PC that happens to have eight 30xx GPUs in a cluster), the capabilities and speed are light years ahead of LLM space.

        Just for comparison, yesterday I - like half the techies on the planet - got to run me some local DeepSeek-R1. The 1.58 bit dynamic quant topped at 0.16 tokens per second. It's about the same as it takes a SD1.5 derivative to generate me a decent-looking HD image. I could probably get them running parallel in lock-step (SD on GPU, compute-bound; DeepSeek on CPU, RAM-bandwidth bound) and get one image per LLM token.

        • czue a month ago

          Can you explain more about comfy ui? I heard it could work for running inference locally, but I couldn't get it running because I don't have Nvidia GPUs. Does it only work if you still have those?

          • TeMPOraL a month ago

            https://github.com/comfyanonymous/ComfyUI

            I only use it on Windows with Nvidia GPU, but it should work both on Windows and Linux with CPU only and with Intel GPUs, as well as on Linux only with AMD. Though skimming the README some more, I also see Apple Silicon section, and one called "DirectML (AMD Cards on Windows)", so maybe AMD+Win works too.

            As for use: you install ComfyUI from the link above, and then this:

            https://github.com/ltdrdata/ComfyUI-Manager

            to have UI for searching and downloading custom nodes (instead of having to install them by hand), and you're good to go.

      • mystified5016 a month ago

        Forget an urn, I want my digital ghost to haunt a furby.

    • numpad0 a month ago

      Is it going to be good for your sanity? I very much doubt it.

    • oskarkk a month ago

      This reminds me of paintings in Harry Potter.

petercooper a month ago

Replicate does make this particularly easy while still being somewhat developer focused. I've used it for a few people in our group chat so we can make silly in-joke memes and stuff and the results are quite stunning. Replicate then offers the model up over a simple API (shown in the post) if you wanted to let people generate right from the chat, etc. Replicate is worth poking around a bit more broadly, too, they have some interesting models on there (though the pricing tends not to be very competitive if you were going to do it at scale.)

manishsharan a month ago

This is fantastic but now you need to train a model to detect AI generated images from actual photos. Then of course , a model to beat the detector model and then a model to catch the model that beats the detector model and so on.

Thank you from people holding NVDA.

  • beng-nl a month ago

    You may have re-invented GANs :-)

thefourthchime a month ago

I did this a while back, though it was pictures of my wife in lingerie.

- I asked grok to generate a list of racey prompts. - Has replicate generate them via script. About 10-20% are very poor, I filtered those out manually. - It also has NSFW guardrails, but a simple retry or word juggle gives you a chance to get around it.

I think I spent $10

  • Der_Einzige a month ago

    There is a parallel "underground" AI research world of stuff like this, with it's hub on "civit.ai" instead of huggingface.

    Often the innovations from that world are ahead of mainstream AI research by years. You should see what coomers did for LLM sampling in order to get over issues with "slop" responses just for their own pervy interests. This is a full several years before the mainstream crowd ever cared.

    • ok_dad a month ago

      Porn has always pushed the boundaries of media on the internet. I don't know why people are surprised! Since sex is something nearly everyone does, it would make sense that a lot of human progress were the result of trying to integrate sex and whatever new tech is out there at the time. I am sure a hundred years ago some inventors were pushing the boundaries of motors in sex toys, and in another hundred years some other inventor will be pushing the boundaries on putting sex in holograms.

    • DrSiemer a month ago

      It's kind of annoying that some of the best models out there have a tendency to produce very not safe for work results.

      Look mom, I can make some cool astrology images for you! Whoops, that's boobs. That too. And this one. Ehh, hold up, I need to add a pile of negative prompts first...

      • wongarsu a month ago

        Sketching nude humans is a huge part of how human painters learn. Because surprisingly clothed humans are just nude humans with some fabric over them, and the fabric can make it harder to tell what's going on.

        Even if we assumed equal amounts of effort, it wouldn't be surprising if a large corpus of nude images in the training data improved model results.

        But maybe we should have better negative prompt presets for different levels of decency

ge96 a month ago

What I want is to be able to feed in a bunch of videos and generate an animatable (from talking) 3D face from that data. I suppose you in theory only need 3 images (front and sides). But mapping pixels to motion is interesting (facial expressions).

There wouldn't be depth data so it would be inferred from shadows

  • timdiggerm a month ago

    Why do you want to do that?

    • ge96 a month ago

      My case is not directly nefarious, for example an old popular YouTuber that streamed in the early 2000s taking their content and making a model of them for personal use like a 3D chat bot but with that person's quirks

      Edit: when I say "nefarious" I mean you can use that tech to impersonate someone (eg. political reason) but for my case it's more the creeper type cloning someone for personal use eg. Replika

      Tangent, the holo vtubers industry is interesting since they build up these characters with some unique persona/theme and then people follow that specific model, they could make themselves into an AI easily since it's a rigged 3D asset but of course it would be boring compared to the real thing

      • GaggiX a month ago

        >they could make themselves into an AI easily since it's a rigged 3D asset but of course it would be boring compared to the real thing

        The most popular vtuber on Twitch is an AI tho

        • ge96 a month ago

          You talking NeuroSama? I haven't kept up with it in a bit

          I'm not sure if that's truly AI since the Turtle drives her

          Edit: if the source was open I'd believe it

          • GaggiX a month ago

            >I'm not sure if that's truly AI

            It has always been a LLM. There is no human typing at insane speed to the TTS.

            • ge96 a month ago

              I'm referring to live interception of messages which I guess has to be done to be compliant with Twitch's terms -- there is a human there

              edit: but yeah the fact that so many people interact with her shows generated content can keep people occupied

deadbabe a month ago

I’m imagining something where an influencer trains AI to make and post images of themselves on social media, then the influencer dies but the AI keeps going forever.

  • ge96 a month ago

    The impact is kind of interesting, how do you know someone's legit, the person doing basejumping or whatever

    Thanos/NFTs: where did that take you? right back to me

    Thinking hardware with built in chain interface for proof

    Oh man dating apps too

    That's true love though, two people meet up IRL they're both like wtf who are you

m463 a month ago

I had set up automatic1111 a while back, and I believe the webui let you your image generation have a starting image. It's kind of fun to have a cartoon of yourself based on an image.

njx a month ago

Thank you for sharing. Is there any model that can help train convert pictures into cartoon or flat vector illustration?

DoodahMan a month ago

is something like this possible to do with video yet?

  • dvrp a month ago

    you can use some providers like kling or platforms like krea.ai to do consistent characters with one frame

  • AuryGlenz a month ago

    Yeah, with Hunyuan video.