In the dynamic landscape of digital media, the power of video content is taking center stage. As an increasingly dominant form of communication, video offers an engaging, interactive, and visually appealing way to convey information. One particular area of innovation that's garnering attention is text-to-video technology. This emerging field offers a broad spectrum of exciting use cases, from education and training to entertainment and marketing.
The concept of text-to-video is simple yet powerful: transforming written content into visually stunning videos. This has the potential to revolutionize how we consume and share information, making it more accessible, engaging, and impactful.
The current status of text-to-video technology is rapidly evolving. Cutting-edge algorithms are being developed to improve the quality and realism of the generated videos, while also making the process more intuitive and user-friendly. Open-source libraries play a crucial role in this landscape, offering a collaborative platform for innovation and development.
One such library that's making waves in this domain is deforum. This open-source project is designed to democratize the text-to-video space, providing a powerful toolkit for developers and content creators alike. Whether you're looking to generate promotional videos from product descriptions or educational content from textbooks, deforum could be the key to unlocking the potential of text-to-video technology.
neural frames offers access to the most cutting-edge advances in text-to-video, leveraging what deforum has been built upon and in this blog post we shall revisit the seven most common misuses of neural frames
neural frames offers a variety of models to choose from. While there are many, many more models out there in the wild, we hand-selected six of those that we find the most astonishing to date. Those are three allrounder models (OpenJourney, Deliberate, DreamShaper) and three specialist models (Realistic Vision, Analog Diffusion, Anything).
The specialists allow for photorealistic, analog photography, or comic/manga styles but can ONLY depict those ones. The allrounders are good for anything, particularly Deliberate and Dreamshaper can produce mind-blowing results.
The way these models are set up, they can currently not really output nice faces of people far away. Make sure to keep the faces relatively large on the images, either by adding something like "portrait" or "close-up" to the prompt or just by selecting an image where the faces look OK.
The AI models know basically all vocabulary from the internet, so when you train a custom model on yourself or some other object, you are left with the question: How do I name this object such that the model knows I am referring to this new object that I am showing it?
The solution people have come up with, is to add cryptic letters such as "sks person" or "sks object". In neural frames, we use that phrase to depict the objects of the custom models and can produce astonishing visuals of really anything you want.
The models are not trained on 16:9, so sometimes the results with 16:9 or 9:16 can be a bit weird. For instance, sometimes they duplicate objects or make them weirdly bigger or so. If you encounter issues like that, you can alternatively go with 4:3 format which typically works much better. You could also add things in the negative prompt such as "two, double" to try to tell the model to focus on solo objects. But that doesn't always work.
Try to practice your camera movement. With only one movement setting such as "Chill" or "Loco", often times, the camera will move so far that the object of your video actually moves out of frame. Simply add another Box in the timeline with reverse settings and you are good to go!
Control over the camera is what separated pro's from beginner's on neural frames.
If you want to change direction, such as in the above example, it is usually desirable to add another box in the timeline instead of just changing whatever Box you are on. Oftentimes it is better to shorten the current Box, add another one behind, with a certain prompt fade window in between. The promptfade window will interpolate between the parameters of the two settings and therefore cause a smooth transition between those settings.
You can change the video video duration by editing the last time value on the timeline (in the beginning it's a 30). Currently, the maximum allowed duration is 5 minutes!
Happy rendering!
No VC money, just a physicist turned indiehacker in love with text-to-video. Contact our team here: help@neuralframes.com. This website is greatly inspired by Deforum, but doesn't actually use it. For inspiration on prompts, I recommend Civitai.