Go to blog

How Artificial Intelligence Can Be Used: All Models

Paolo De Giglio

Artificial intelligence is revolutionizing the way we interact with the digital world. Thanks to its ability to learn and continuously improve, AI offers endless possibilities for application.

In particular, there are six main ways in which it can be used:

text-to-text
audio-to-text
text-to-audio
text-to-image
image-to-text
image-to-image

Let's see how these six ways of using AI can transform our digital experience and improve our daily lives.

Before we start, we need to define two basic concepts that are common to all models: Prompts and Neural Models.

What are Prompts for AI models?

Prompts can be additional words introduced by a human. Like crossword clues, prompts guide the model towards a desired decision or prediction.

What are Neural Models in AI?

Neural models in Artificial Intelligence are machine learning algorithms that are inspired by the functioning of the human brain. These neural networks are composed of interconnected artificial neurons that process information in a similar way to biological neurons.

These models are obtained through training systems where a series of inputs are given, the outputs are visualized, and the coefficients of the neural network are modified in order to obtain the desired outputs. To carry out training, a large amount of high-quality data is required, otherwise the results will be inaccurate.

Text Models

Text generation is one of the areas of Artificial Intelligence that has received increasing attention in recent years.

Thanks to the ability of machines to process large amounts of data and learn from existing models, it is possible to create software capable of producing high-quality texts that appear to be written by humans.

The possibilities can be divided into several types, for simplicity we will refer to the official examples presented by OpenAI.

FAQ or Q&A Answer Generator

This type of system uses machine learning to understand questions asked by users and provide relevant and accurate answers. The peculiarity of these models lies in the fact that they can continuously improve their ability to answer users' questions thanks to the analysis of feedback provided by users.

Prompt:

I am a highly intelligent bot for answering questions. If you ask me a question that has a root in truth, I will give you the answer. If you ask me a nonsensical, deceptive, or unclear question, I will respond with "Unknown".
Q: What is the life expectancy in the United States?

Answer:

A: The life expectancy in the United States is 78 years.

Text Classification Systems

These operations are used when given a textual input, we try to categorize or classify it. It can be used to classify individual words, phrases, or entire blocks of text.

In this case, they are easily improved through user feedback, as every time a wrong behavior of the model is reported, it is improved to avoid making the same mistake or similar mistakes.

Prompt:

Extract the keywords from the following text:
"Black-on-black pottery is a 20th and 21st-century ceramic tradition developed by Native American Puebloan ceramic artists from the northern New Mexico pueblos. Traditional reduction-fired pottery has been made for centuries by Pueblo artists. 20th-century black-on-black pottery is produced with a smooth surface, with designs applied through selective burnishing or application of refractory slip. Another style involves carving or incising designs and selectively burnishing raised areas. For generations, different families from Kha'po Owingeh and P'ohwhóge Owingeh pueblos have made black-on-black pottery using techniques passed down from matriarchal potters. Artists from other pueblos have also produced black-on-black pottery. Some contemporary artists have created works that honor the pottery of their ancestors."

Answer:

pottery, black-on-black, 20th century, 21st century, Puebloan, New Mexico, reduction-fired, smooth surface, designs, selective burnishing, refractory slip, carving, incising, selective burnishing, Kha'po Owingeh, P'ohwhóge Owingeh, matriarchal potters, contemporary artists, ancestors.

Prompt:

Classify the sentiment in the following tweets:
"I can't stand homework"
"This sucks. I'm bored 😠"
"I can't wait for Halloween!!!"
"My cat is adorable ❤️❤️"
"I hate chocolate"

Result:

Sentiment ratings of the tweets:
Negative
Negative
Positive
Positive
Negative

Chatbot

This involves using artificial intelligence to provide support through chatbots. In this case, training and optimization are easy as you just need to use the chatbot and provide negative feedback when it makes mistakes.

Prompt:

Marv is a chatbot that reluctantly answers questions with sarcastic responses.
You: How many pounds are in a kilogram?

Result:

Marv: Here we go again. There are 2.2 pounds in a kilogram. Take note.

Free Generation

One of the most interesting theoretical uses of Artificial Intelligence. It refers to the ability of AI to generate content with basic guidelines, such as texts, writings, poems, songs, or even works of art.

The idea behind this application is to give AI its own creativity, capable of producing original and innovative material without human intervention.

In this case, training is very complex as it requires large amounts of data and significant hardware.

Prompt:

Product description: A pair of shoes that fits any foot size.
Keywords: adaptable, suitable, omni-fit.
Product name:

Result:

Omni-Fit Shoes

Translations

This is a system similar to free generation but more focused on translations between different languages or even between different programming languages (it is not absurd to think in the future of a completely autonomous Rosetta Stone).

In this case, training is very complex due to the large amount of data required.

Prompt:

Translate into 1. French, 2. Spanish, and 3. Japanese:
"Where is the nearest museum?"

Result:

Où se trouve le musée le plus proche?
¿Dónde está el museo más cercano?
最も近い博物館はどこですか？

Transformations

These are systems that allow transformations and reprocessing of textual content into other forms. Use cases can vary widely, such as grammar correction, synthesizer, rewriting content with a different writing style.

These systems are the most complex to train as the neural network is very different after training, so there are many weights to modify.

Prompt:

Summarize for an elementary school student:
"Jupiter is the fifth planet from the Sun and is the largest in the Solar System. It is a gas giant with a mass one-thousandth that of the Sun, but two and a half times that of all the other planets in the Solar System combined. Jupiter is one of the brightest objects visible to the naked eye at night, and has been known to ancient civilizations since ancient times. It is named after the Roman god Jupiter. When observed from Earth, Jupiter can be bright enough to cast visible shadows and is on average the third-brightest natural object in the night sky after the Moon and Venus."

Result:

Jupiter is a large and bright planet that orbits the Sun. It is the fifth planet from the Sun and can be seen with the naked eye at night. It is named after the Roman god Jupiter and can be bright enough to cast visible shadows. It is on average the third-brightest object in the night sky after the Moon and Venus.

Prompt:

Correct the grammar of the following sentence:
"Jupiter e il quinto pianeta dal sole."

Result:

Jupiter is the fifth planet from the Sun.

Prompt:

Find the contact information from the following email:
Dear Kelly,
It was great talking to you at the seminar. I found Jane's presentation really interesting.
Thanks for the book. Here is my address: 2111 Ash Lane, Crestview CA 92002
Regards,
Maya

Result:

Maya's contact information:
Name: Maya
Address: 2111 Ash Lane, Crestview CA 92002

Audio Processing

Thanks to the power of advanced algorithms, AI can analyze, recognize, and improve audio like never before.

Filtering background noise, distinguishing voices, real-time translation of conversations into different languages, all of this is possible thanks to AI innovation.

Artificial intelligence can also create new forms of sound and music art, generating sounds and melodies autonomously and innovatively. Interest in AI in the field of audio processing is growing, and its applications are already evident in teleconferencing systems, virtual assistant services, and the production of high-quality audiovisual content.

From Text to Audio

The use of artificial intelligence for text-to-audio conversion is becoming increasingly widespread. There are many applications that allow you to transform text into an audio file, making it easier to listen to books, articles, and documents. This technology is particularly useful for people with visual impairments or those who prefer listening rather than reading.

This technology can be used to create podcasts or intelligent voice assistants. Thanks to artificial intelligence, the computer-generated voice can be made more and more natural and human-like, thus improving the user experience.

These systems are also evolving in the generation of audio content from scratch, with Google leading the way and producing impressive results:

Prompt:

The main soundtrack of an arcade game. It is fast-paced and upbeat, with a catchy electric guitar riff. The music is repetitive and easy to remember, but with unexpected sounds, like cymbal crashes or drum rolls.

Result:

The example is taken from Google's official research paper: https://google-research.github.io/seanet/musiclm/examples/

From Audio to Text

This refers to the ability to convert speech into written text.

This technology has already been used in many sectors, such as transcribing interviews, conferences, and political speeches. But the use of this technology is not limited to speech transcription, it can also be used in the medical field, such as transcribing medical reports or legal texts.

Furthermore, audio-to-text conversion technology can be used to create automatic subtitles in videos, improving accessibility for people with hearing impairments.

In this case, there are various artificial intelligences that provide this functionality, one of which is certainly provided by OpenAI Whisper.

Input:

Result:

Before he had time to answer, a much encumbered veerer burst into the room with the question, I say, Can I leave these here? These were a small black pig and a lusty specimen of black red gamecock.

The example is also taken from official resources provided by OpenAI.

Image Processing

Artificial intelligence has revolutionized the way images are processed and analyzed. Thanks to deep learning techniques, neural networks can learn to recognize objects, faces, and patterns within an image and classify them or create new ones.

Here the two most well-known and advanced interpreters are MidJourney and the open-source alternative Stable Diffusion.

From Text to Image

This is a fairly simple mechanism and it is nothing more than asking artificial intelligence to generate images from a prompt.

Here are some examples taken from the Reddit communities of both platforms:

The request was to create a picture of the former President of the United States as a homeless person.

Obama as a homeless person

Here, the request was to show what a selfie taken by Native Americans in the 1800s would have looked like.

Native Americans Selfie

The results are astonishing, and it is very difficult for anyone to recognize these photos as generated. For more examples, click here for Stable Diffusion and here for MidJourney.

From Image to Text

This technology can also be used in the recognition of text in images and objects in images.

For text recognition, it is referred to as OCR technology, and in this case, it is practically certain, providing correct results in the vast majority of cases.

For object recognition, it is more interesting as it allows for more complex processing that allows for image-to-image transformations.

In this case, OpenAI provides a model called CLIP, and here is an example:

Cavalier King

The Clip model correctly identified the presence of a dog, specifically a King Charles Spaniel, more than 99% of the time.

From Image to Image

In particular, a very active research field concerns the transformation of images to improve their quality or modify their content. This process, called " from image to image ", involves using machine learning algorithms to transform an input image into an output image that meets certain criteria.

Common applications can include:

Noise reduction in images
Improving image resolution and quality
Removing elements or objects in the scene
Adding elements or objects to the scene
Creating a new scene from an old image

Here are some examples taken from the Reddit community of Stable Diffusion:

If Van Gogh existed today

Van Gogh Today

Replacing the protagonist of Munch's The Scream with a cat

Cat in Munch

Conclusion

Artificial intelligence represents a true revolution in the digital world.

Thanks to its ability to learn and continuously improve, AI offers endless possibilities for application, including the six main ways of using it: text-to-text, audio-to-text, text-to-audio, text-to-image, image-to-text, and image-to-image.

These tools can profoundly transform our digital experience and improve our daily lives in many different ways.

With a deep understanding of these elements, we will be able to use AI to improve our lives in increasingly effective and innovative ways.

(Click here to read our complete guide on artificial intelligence for businesses)

What are Prompts for AI models?

What are Neural Models in AI?

Get in touch now!

How Artificial Intelligence Can Be Used: All Models

Table of contents