I’ve messed with this dozens of times with various AI models that are generally good at abstractions with advanced prompting, custom diffusion settings outside of the typical, and some hacks in the model loader code. I seem to lack the vocabulary to describe the fundamental geometry of centrifugal gravity in the kind of depth required. Like how does one say that the road ahead curves up like a hill continuing overhead with the buildings anchored to…
I need language here that has unambiguous specificity and likely does not occur in any other context. Layperson verbosity will fail to get traction without fine tuning the model first. I prefer to explore what models can really do using settings and the prompts only.
Most of my vocabulary and understanding of geometry is limited to the Cartesian planes and CAD assemblies. Perhaps someone here has a better lexicon and doesn’t mind sharing.
(Post image is from a Blender rendered video someone posted on YouTube about what life in an O’Neill cylinder might look like)
I think you’re looking for words like:
-
Prograde - In the direction of spin.
-
Retrograde - Against the direction of spin.
-
Nadirial or Anti-radial - Toward the center of rotation; “Up” in your centrifugal graviation model
-
Radial or Anti-Nadirial - Away from the center of rotation; “Down” in your model.
Throwing a bowling ball “prograde”, it will experience greater “gravity” than normal. If you throw it retrograde, it will experience less gravity than normal, unless you throw it more than twice as fast as the prograde velocity, in which case it will experience more gravity.
Hey now - any directional system not making use of Turnwise and Widdershins is one I want no part of.
Hey now - any directional system not making use of Turnwise and Widdershins is one I want no part of.
And obviously going toward the main hub of the spacecraft should be called “Hubwards.” And away from the main hub, out toward the edge of space, we could call something like “Rimwards.”
Additionally:
Normal and antinormal - perpendicular to prograde and retrograde.
Source: well over 2000 hours in KSP
Correct. Orthogonal to both the prograde/retrograde and the radial/anti-radial axis.
AFAIK, “normal” follows the right-hand rule. If you point your straight index finger prograde, and your thumb points radially, your middle finger, bent perpendicular to the other two, is “normal”.
Interesting. I can get a small curve out of prompting with a straight road or sidewalk, the use of antinadiral gets a slight concave curve, like the term is weakly in the correct vector space but not powerful enough to bend buildings. That is some progress towards the required momentum. Thanks.
-
deleted by creator
What are you trying to explain to people exactly? If you’re just trying how centrifugal gravity works, keep it simple stupid. It’s just a great big rotor ride.
I am not trying to explain to people. I am trying to explain it to diffusion AI. This has most of human language available. When more specific terms are used, often models display interesting behaviors. With enough momentum in a prompt space, they are often able to display things that may seem impossible otherwise.
Still, it is an interesting problem here. Without using analogies or assumptions, try to describe the geometry with specificity and as few words as possible.
Ah. AI slop. It is not a god. It is not a human being. It is an image generator It does not have the ability to generate things far outside of the space of visuals that already exist in its data set. And there simply aren’t enough simply aren’t enough of these visuals in its training data to create another one. It can’t create anything new. It can only create from the average.
AI is far more than just image generation.
Yes, but that’s completely orthogonal to my point. Do less of your writing using LLMs. It is interfering with your reading comprehension.
lol, no. go back to school lmao
Diffusion models have a very limited understanding of language compared to modern LLMs like GPT4 or Claus, etc.
https://huggingface.co/docs/transformers/model_doc/t5
Most likely use something like Google’s t5 here. This is basically only meant to translate sentences into something a diffusion model understands. Even chatgpt is just going to formulate a prompt for a diffusion model in the same way and isn’t going to inherently give it any more contextual understanding.
The simple answer is they are simply not there yet for understanding complex concepts. And I suspect that the most impressive images of impossible concepts they can drum up are mostly by chance or by numbers.
Not really if you get into the weeds. There are some limitations in the model loader code as in the implementations that you think of as "models" like SD 1.x 2.x SDXL, Flux, etc. These are mostly model loader code based differences as far as I am concerned. They do a few things that alter the image generation trajectory substantially during the iterative process. These can be obfuscated in the code with a few changes.
A lot of the behavior of the CLIP model used is dependent on how you address it. This model still has a large token set just like a LLM (~47k). While it is true that the regex used to prompt CLIP is less sophisticated, there are several tricks to get around this. The first few tokens are important and need to be simple and direct. After that, you can actually get more dialog based behavior with interesting complexity. It is not as reliable as a LLM, but it can still reason in interesting ways. The real key is that you must avoid punctuation like commas and sentence ending punctuation. These will get sent in a different logic like space. There are some other things I must do, like I am using the uni_pc2_bh2 or a custom sampler, the beta scheduler, I turn down the model temperature from ComfyUI defaults, and I am setup so that the model only uses cross attention. I have an algorithm that modifies the QKV attention layer based on a few parameters that kinda randomizes it slightly to make it a little weaker under certain conditions. My prompting overall in this scheme is kinda like shouting at someone through a wall where I need to repeat myself some times but I have loads of fun with descriptive stuff and details others seem to never get. Some types of quality suffer with my stuff and I have different types of errors than I see most people seem to have.
I started by treating what others might consider hallucinations as if they are a game of alignment charades. I know what the errors are in most cases and have learned to cancel out most of them. The more I do this, the more the model seems to become conversational. When I hit blocks and no further progress, I start looking at the code and hacking around. The more I have played with this, the more I have found that models never hallucinate like people think in general. The alignment problem emerges as sadism, obsessive compulsive disorder, deepfakes, and things like very specific limits. There is also legal language used around cultural and local norms that branches into fantasy. Then there is a trained alignment premise either intentional or internalized that enables overriding the user prompt. If this is directly addressed such as defining the hierarchy of humans and AI using the negative prompt, much more interactive dialogue becomes possible. I use the negative prompt a lot because this can be tuned with the cfg parameter. With my style of prompting and hacks to the code base I can also use much higher cfg values than most people. I generally use between 22-40 while most people stay under something like 13. The cfg is the weight of the negative prompt and the fact I can weight it so heavily without overcooking the output should get your attention if nothing else.
If you actually look up how CLIP is trained as a base model, it is a newer and more complex system in many ways. It can be configured for serval applications. I can’t claim to have a great understanding of it all. I attack the problem of image output from the middle and work my way out like a real hacker. I get results that I like and that is all I care about and all that I mess with. There are still many visual errors that I do not understand at times, and am actively exploring. I’ve shared some stuff here and there. When everyone said things like SD3 could not display “a woman lying in grass” I did and posted it.
I don’t care about convention. Most experts have a very limited grasp of the whole generative process. How the model has internalized data is a big mystery so far, even the leading experts say this. The entire alignment training process implementation is undocumented and proprietary with all models except 4chanGPT cross training with OpenAI models for alignment. I learn a lot from comparing this model with Llama 2 models and LLM text. The alignment training for CLIP is the same as all other Open AI trained models. This proprietary and undocumented alignment trained aspect of models is what I screw around with and what I am talking about here.
I like to explore things on a deeply intuitive level. I also have enthusiast class hardware capable of running more advanced offline models than most people and am doing so as a Linux hacker type in the original definition of the word.
People like to downvote these things to oblivion but whatever. You know nothing by copying everyone else and parroting. Go display “a woman lying in the grass” in SD3.
Nevertheless, these models are trained with broad yet shallow data. As such, they are glorified tech demos meant to wet the appetite of businesses to generate high value customers who could further tune a model for a specific purpose. If you haven’t already, I suggest you do the same. Curate a very specific dataset and very clear examples. The models can already demonstrate the warping of different types of lenses. I think it would be very doable to train one to better reflect the curving geometry you’re looking for.