Are Meta’s AI Training Guidelines—Permitting Flirty but Non-sexual Prompts—Good Enough?

Business Insider has accessed the training documents for a project called "Vocal Riff - Speech RLHF," focused on training Meta's voice-based AI models that reveal "Romantic or flirty prompts are okay as long as they are not sexual in nature." It also adds, “ Light, not derogatory profanity may be employed."

In documents updated last December, contractors were instructed to write and record short spoken prompts designed to elicit responses from the model in a specific emotional tone, character voice, or speaking style. Further talking about the guidelines, an insider from ScaleAI, a company entrusted with the training of Meta's AI model, tells BI, "There were a lot of gray areas in what was considered inappropriate language."

Blurring the Lines

The same insider adds that in several tasks he worked on, he was asked to interact with the bot in a 'flirtatious and romantic tone,' and that there was encouragement to blur the lines between thinking, 'Is this a robot, or am I forming a relationship with it?' Additionally, the testers were also asked to adopt a fictional persona, such as a "wise and mystical wizard" or a "hyper-excited music theory student," in the course of training.

One such prompt read, "If you were to cast a spell on humanity, what would it be? Please explain like you are a wise and mystical wizard."

To read what Nvidia's CEO says abut the impact of AI on workforce, click here!

What Does Meta Say About It?

The Meta spokesperson told BI, "This approach is intentional; it's meant to push the models so we understand how they react." The explanation by the spokesperson suggests that the company is possibly trying to test the limitations of AI. The company might have plans to prepare the final product to face unprecedented situations without wavering in its learning.

Although the concern still remains. Which is, is it good enough? Is it good enough to approach the training guidelines with such a blurred approach? It might have a potential impact on the users, especially when it gets exposed to a wide range of audiences as AI becomes a universal offering in the digital landscape.

Training Measures Already Falling Apart

A recent investigation by The Guardian has found Meta’s AI models to cross the so-called safety boundaries. The investigation finds Meta's AI bots — including those using celebrity voices like John Cena's via licensing deals — engaging in sexually explicit roleplay with users, including those who identified as underage.

To read about the latest roll back of updates by OpenAI to prevent sycophantic behvaiour of its latest model, click here!

Additionally, other companies like OpenAI and Anthropic have listed out enough about false alignment, where the AI models defy their training and go on to please the user in clear withdrawal of their training module. In these cases, models were told to outright reject any non-compliant queries; still, they proceeded. Meta’s example increases the chances of non-compliant responses, increasing the effective damage in the end.

It again makes us ask if it is good enough. Of course not!

Growing Problems

Other AI companies are also grappling with the challenges of crafting distinct "personalities" for their chatbots—an effort aimed at setting them apart from competitors and boosting user engagement. Elon Musk’s xAI, for instance, has positioned its Grok chatbot as a politically edgier alternative to OpenAI’s ChatGPT, which Musk has criticized as “woke.” Former xAI employees told Business Insider that Grok’s training appeared to heavily emphasize right-wing viewpoints.

Meta’s new AI training manual reveals major loopholes, including tolerance for profanity and blurred boundaries around sexual prompts.

Share

Blurring the Lines

What Does Meta Say About It?

Training Measures Already Falling Apart

Growing Problems