Humanist Discussion Group

Humanist Archives: April 17, 2024, 6:11 a.m. Humanist 37.552 - text to speech

				
              Humanist Discussion Group, Vol. 37, No. 552.
        Department of Digital Humanities, University of Cologne
                      Hosted by DH-Cologne
                       www.dhhumanist.org
                Submit to: humanist@dhhumanist.org




        Date: 2024-04-16 13:33:02+00:00
        From: Clay Foye <clay.foye@gmail.com>
        Subject: Re: [Humanist] 37.550: text to speech?

Dear Maurizio,

In my response, I am assuming you meant that the Text-to-Speech (TTS) model
changes its output (tone, emphasis, etc.) based upon the inputted
text's typology.

As you might have found, a lot of the closed-source / licensed TTS models
are for things like customer support. For example, I found IBM's
customizable model by navigating to a page about "Transforming your call
center with conversational AI technology", where one can use markdown-style
tags to customize speaking style, emphasis, and tone. IBM's TTS page.
<https://cloud.ibm.com/docs/text-to-speech>

A good place to look for non-mainstream work on AI is open-source models.
In particular, I like to use Hugging Face <https://huggingface.co/>, a
collection of open-source models for free to use on your own machine. I
browsed their TTS models to get a feel for the state of the publicly
available resources. One particular model that caught my eye was Parler-TTS
<https://huggingface.co/parler-tts/parler_tts_mini_v0.1>. This model allows
you to describe the desired output with natural language. For example, I
can provide a letter as the actual text to be read, while also providing a
short description of the kind I might provide to a voice actor. Here is the
paper/research: https://www.text-description-to-speech.com/. What I
particularly like about Hugging Face models is that you yourself can try
this model out! There are short instructions for you to get the model
running on your own computer on the Parler-TTS page on Hugging Face.

Of course, my description is an oversimplification, and the model has the
same Jentschian *unheimlich *we have come to expect from AI. It is perhaps
disingenuous to describe natural language instructions to a model as the
same one might give to a human. There are some interesting loose strings
which a trained deconstructionist might pull on that take the form as
"Tips" on the Parler-TTS HuggingFace page. From that page:
```

   - Include the term "very clear audio" to generate the highest quality
   audio, and "very noisy audio" for high levels of background noise
   - Punctuation can be used to control the prosody of the generations,
   e.g. use commas to add small breaks in speech
   - The remaining speech features (gender, speaking rate, pitch and
   reverberation) can be controlled directly through the prompt

```
I hope this helps!

Clay

On Tue, Apr 16, 2024 at 1:40 AM Humanist <humanist@dhhumanist.org> wrote:

>
>               Humanist Discussion Group, Vol. 37, No. 550.
>         Department of Digital Humanities, University of Cologne
>                       Hosted by DH-Cologne
>                        www.dhhumanist.org
>                 Submit to: humanist@dhhumanist.org
>
>
>
>
>         Date: 2024-04-15 09:29:46+00:00
>         From: maurizio lana <maurizio.lana@uniupo.it>
>         Subject: text to speech?
>
> dear all,
>
> can anyone give me an indication of where in the world research and/or
> application work is being done on text-to-speech differentiated by text
> typology? (a lettere rather than a news rather than ...)
> in my experience mainstream text to speech systems are not so able from
> this point of view.
> thank you a lot
>
> Maurizio
>
>
>
> ------------------------------------------------------------------------
>
> a questo punto devo fare una confessione:
> come il mio amico Erri De Luca, sono un europeista estremista.
> Questo significa che, per  me, l’Europa unita è l’unica utopia politica
> ragionevole che noi europei abbiamo coniato.
> xavier cercas, inaugurazione del salone del libro, torino 2018
>
> ------------------------------------------------------------------------
> Maurizio Lana
> Università del Piemonte Orientale
> Dipartimento di Studi Umanistici
> Piazza Roma 36 - 13100 Vercelli



_______________________________________________
Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php