The number of AI tools is growing every week, and this trend is unlikely to change any time soon. Rather, we expect the use of AI algorithms and models to trend even further. The creators of almost every application we know are trying to use AI in their products in one way or another and not be left behind in this crazy technological race.
Today, practically every tool used for editing photos or videos can boast that it allows retouching or background removal with just one click. Almost every text editor allows for content generation based on a typed prompt. We can generate ideas, summarize articles, or write them in full by entering only a simple instruction. Tools from Microsoft or Google use AI models for data analysis, creating summaries, charts, and suggesting various solutions. Browser plugins allow for automatic email responses, or analyzing a page for SEO or conversion purposes. It's easy to create a video where we can speak in any foreign language, and no one will even realize that we don't know that language. People in business, marketing, creative fields, bloggers, artists, writers, data analysts, literally everyone can speed up their work today by taking advantage of AI benefits.
However, because there is so much choice, it's easy to become simply overwhelmed by it all. In the end, instead of speeding up our work and being like all those super-productive people who flood us with posts on LinkedIn or X (formerly Twitter), we don't know where to start and what tool to choose. We don't know what is actually worth attention and what should be avoided. I think it's obvious that if everyone started implementing AI in the solutions they offer, among the quite large number of great tools we will also find those that are still a long way from actually helping us. Many tools available on the Internet are still just cool toys, but when it comes down to it, unfortunately, we wouldn't want to use their results in our business.
Sure, there are a few tools that everyone talks about, tools that are currently enjoying a triumph of popularity, and therefore they must be the best. To a large extent, it's hard to disagree, but even these theoretically best tools have their drawbacks and won't always be suitable for what we specifically want to do. Additionally, there's the issue of cost, or what the entry threshold is to achieve really solid results. Besides, if we dig a little deeper, we'll find a range of products that are really quite good and often allow for quick completion of a specific task for free.
For the purposes of this article, we tested several dozen different AI tools ourselves and chose those that, in our opinion, really do the job or are simply worth keeping an eye on because they are developing in an interesting direction.
We decided to split the article into two parts. Firstly, to examine the recommended tools more closely and discuss, in our opinion, all the essential aspects. Secondly, to avoid boring you and not take up too much of your time. Shorter content will be easier to digest. So, in the first part, which you are reading now, we will present the first 5 tools. The next 5 will be in the second part. We recommend reading both, as it is in the second part that we will describe the less obvious tools.
Additionally, both parts will primarily focus on tools that specialize in working with images and videos. Topics like working with text, music, or enhancements in AI for productivity will be addressed in separate posts.
We will start with 3 well-known and recognizable tools that, in the context of generating and editing images, have recently become what ChatGPT is in the context of generating text. Even though probably everyone knows or is familiar with how they work, we believe that a post about AI tools omitting the most important players would simply be incomplete. Setting aside the various downsides of each tool, this trio really deserves special recognition!
Midjourney
If we talk about generating images using AI, one of the first names that comes to mind is indeed Midjourney. Not without reason, as it is currently one of the best tools for creating images based on a typed prompt.
Currently, the most advanced and latest version is Midjourney V5.2 released in June 2023. However, it's worth keeping your finger on the pulse because it has just been announced that Midjourney V6 will see the light of day before Christmas 2023... we can't wait to see what the creators have prepared for us this time!
Midjourney, in addition to standard image generation, offers a bunch of interesting functionalities that diversify and improve the target results, namely:
- Additional parameters - we have at our disposal a large number of various parameters that we can add to the prompt, thereby deciding, for example, what proportions the generated image should have, what elements it should not contain, what the image quality should be, or how much we want to deviate from the prompt and rely on the tool's creativity (level of artistry and abstraction).
- Zoom Out - a feature that allows you to generate content around an existing image without changing the original. We can understand this as a literal zooming out, seeing what is beyond our frame - the tool enlarges the area/canvas of our photo and in a sense 'draws in' what is not visible. It works extremely well. I think many of us have taken a photo where a key part of the frame was accidentally cut off - now we can fix that!
- Pan - similar to Zoom Out, but this time we can ask to generate a fragment of the image only in a specific direction.
- Upscaler - the ability to enlarge a photo without losing quality.
- Vary - the ability to generate new suggestions only for a given part of the image based on our selection. We can, for example, generate a robot and then select only its head and replace it with something else.
- Video generation - in our opinion, some time is needed to refine this function, perhaps it will work much better in Midjourney V6. At this point, we have many tools that simply do it better.
What distinguishes Midjourney from the competition is primarily realism, refined compositions, attention to detail, and high-quality generated images.
However, Midjourney also has its disadvantages, and we are not talking about the price.
- Reduced precision - in contrast to, for example, DALL-E 3 (which I'll discuss in a moment), Midjourney does not always adhere to the user's instructions and intentions. Sometimes the generated images significantly deviate from what we entered in the prompt. The tool allows for quite a bit of creative freedom.
- Problem with generating text in images - this may be a limitation for some users, instead of text we often get some random shapes and smears.
- Limited availability - the tool operates through Discord, which is not a major issue, but we realize that this may discourage some people. But, according to the latest news, it will change soon. 7 days ago Midjourney has begun testing an “alpha” version of its new website which includes image creation. For now, it’s accessible only for users who have generated 10000+ images.
- Content Censorship - Midjourney implements content censorship that may be restrictive for users seeking complete creative freedom.
Cost
From $96 to $1152 per year (depending on the chosen subscription plan).
DALL-E 3
A product from OpenAI, which until recently couldn't really compare to what Midjourney offers. However, everything changed with version 3, namely DALL-E 3.
The tool is available as part of a ChatGPT Plus subscription or for free as part of Bing's Copilot service from Microsoft. It can be said that the fact that we can generate images completely free of charge is a huge advantage, but DALLE-3 also deserves recognition for several other reasons:
- High quality of generated images - it's not yet the realism and attention to detail as seen in Midjourney, but the results are really impressive.
- Precision - DALLE-3 is great at interpreting user intentions and reproducing entered prompts. Here it fares much better than Midjourney, offering a high degree of accuracy in realizing the users' vision.
- Ability to generate text on images - unlike the competition, DALL-E 3 easily incorporates text into its graphic creations.
- Ease of use - unlike traditional image generators that require specific prompts or instructions, DALL-E 3 allows users to interact through conversation, making it more accessible and intuitive.
The main downside of DALL-E 3, in our opinion, remains a certain limitation in generating realistic creations. Images generated using DALLE-3 have a specific style, and despite their excellent quality, it is often easy to notice that a given graphic was generated by AI.
In addition, our editing options are quite limited, i.e. we cannot, for example, correct a fragment of the generated photo. In this case, you need to regenerate the entire image and hope that the model goes in the right direction.As with Midjourney, DALL-E 3 is quite a heavily censored image generator, especially when used inside ChatGPT.
Cost
$20 per month, included in ChatGPT Plus. Free in Bing's Copilot.
Adobe Firefly
Adobe probably needs no introduction and it is also easy to guess that this technological giant has not been left behind in this race.
Adobe Firefly is actually a family of generative artificial intelligence models, offered as a separate product on the website https://firefly.adobe.com or in the form of functionalities integrated into various Adobe applications.
By entering the mentioned website, we currently have the following features available:
- Text to image - a classic image generator, based on a typed prompt, similar to DALL-E 3 and Midjourney. However, what sets Adobe's approach apart is the remarkable simplicity of the tool. Initially, we get a simple text field to enter our instructions - there are no options for entering parameters, uploading files, etc. After generating images, we do receive an intuitive editing panel, where we can adjust our results. Besides numerous sliders, we get many predefined options that allow for adding specific effects, setting lighting, composition, color scheme, choosing between a more artistic approach and realism. We can also upload an image as a reference, then the algorithm will adjust subsequent results to the style of our reference.
- Generative fill - this tool allows for modifying images using brushes and selection tools. We can remove objects from an image, generate new elements in the selected area, remove the background, etc. Image editing enters a completely new dimension, namely, for example, in a few seconds we can remove someone who accidentally appeared in the frame, or add sunglasses that we forgot to wear. In our opinion, it is better to use this functionality directly in Adobe Photoshop, where it offers much more.
- Generative recolor - the tool allows you to automatically recolor vector images based on the entered prompt. Thanks to this, we can adapt the illustration to, for example, match a specific mood or theme.
- Text effects - allows you to generate text with a specific visual effect, e.g. you can enter "Hello, my name is Stephany" and ask for the letters to be covered with eucalyptus leaves or to look like they are embroidered. In our opinion, it's still just a cool toy and not something we want to use every day.
What is fantastic about Adobe's solutions is that AI functionalities are also available directly in their applications.
In Adobe Photoshop, we have access to the already mentioned Generative fill with some enhancements. Specifically, we can use options like expand. Similar to the zoom-out and pan functions in Midjourney, here we can also generate something beyond the canvas area, allowing us to freely 'expand' our images.
In Adobe Illustrator, the most interesting option is generating vector illustrations using a prompt. This is a revolutionary approach to graphic design. We no longer need to know how to draw to prepare a beautiful illustration for a website or a book.
Additionally, there are also AI enhancements in Adobe Premiere Pro and Adobe Express, but we won't elaborate on that here.
What also deserves attention is the issue of ethics and copyright. Adobe trains its models based on licensed content from Adobe Stock and public domain content, where the copyright has expired. This means that everything we generate is based on data sets collected in a fully ethical manner, and the complete copyright of the produced image belongs to the generator.
In conclusion, Adobe offers its users a lot. After testing all the AI features, we can confidently say that the quality and results are really good, and this is complemented by ease of use and integration within Adobe's flagship applications.
Downsides? At this moment, we can point out three things that may discourage potential users, namely:
- Although the generated images are really good, they still are not at the level of Midjourney. Adobe is characterized by a certain style, and this is often visible in the generated creations. Often, the results can seem too 'candy-like'.
- In the case of Midjourney and DALL-E 3 within ChatGPT, we deal with quite strong censorship. However, it seems that Adobe Firefly leads in this regard. The censorship from Adobe is really significant and can often be a limitation. For example, attempts to generate creations using words like 'soldier' or 'bikini' are futile.
- Adobe Firefly operates on a generative credits system, providing users with a set amount of image generations and edits. This means that users have a limited number of uses. Of course, additional credits can be purchased, but in our opinion, this could ultimately be more expensive than, for example, a Midjourney subscription. If someone is already paying for a Creative Cloud subscription, they are allotted a certain number of credits.
Cost
Varies according to the country. Has a free tier.
Leonardo.Ai
Another fantastic and rapidly developing tool is Leonardo.Ai. It is essentially a web application (along with an API) created by independent creators and based on Stable Diffusion models. Similar to the previous three applications, it allows for image generation and editing.
What characterizes this application is the ability to change the model used to generate our images. At the moment, we have access to dozens of models - some of them simply differ in version (older, newer), but others are models specifically trained for generating images with certain characteristics. For example, we can choose a model tailored for generating photorealistic images or a model that generates isometric views.
At the moment, among the most important functionalities of Leonardo.Ai we can distinguish:
- Image Generation - generating images based on a textual prompt. Similar to Adobe, we have an intuitive panel with a range of additional options to customize our instructions.
- Live Canvas - an interesting feature that allows for generating an image in real time based on what we are currently drawing. Of course, we can fully edit everything, determine the level of consistency with our sketch, and so on. At the moment, it's more like a toy, but it's developing in a very good direction.
- Canvas Editor - an extensive editing environment that allows you to manipulate imported images. Removing unwanted objects from the photo, generating image fragments, adjusting the proportions, etc.
- 3D Texture Generation - generating and modifying textures based on imported OBJ files. It can help, for example, in preparing game assets.
- Motion - generating videos based on a textual prompt, the functionality is expected to be available in the near future.
The main drawback is that most of the features is locked in the free version of the tool. However, the tool itself, although heavily limited, remains available without the need to purchase a subscription. Every day, we have access to a total of 150 tokens, which we use to perform various actions in Leonardo.Ai. Not all actions consume the same number of tokens, e.g. high-resolution images cost more.
Cost
From $12 to $60 per month. Has a free tier.
HeyGen
Have you ever wondered what it would be like to speak fluently in dozens or even hundreds of foreign languages? HeyGen is a tool that won't teach you to speak these languages, but it will create the impression that you are using them perfectly.
HeyGen is currently one of the most popular and best-known tools for creating video avatars. We have quite a few interesting features at our disposal, allowing us to create professional videos based on a clip recorded by us, written text, and/or recorded audio. There are several combinations, including:
- Creating a video avatar based on one of over a hundred available avatars and your own text or audio file.
- Creating a completely custom avatar based on a provided video clip and your own text or audio file.
- Translating a provided video into over 40 available foreign languages with lip-sync technology. The final result looks very natural.
- Changing or improving your voice or generating audio for the entered text.
The tool has incredible potential in the field of marketing, promoting products, preparing offers, or training materials. From now on, you can, for example, put a video on your website where you talk about your product in Korean. The applications are enormous.
Cost
From $29 per month. Free plan for 1-min max duration video per month.
That's all the tools in this part. To read about the remaining 5, please visit the second part.