gstt

A Go client to call the Google Speech API for free.

The Google Speech API (full duplex version) are meant to offer a speech recognition service via the Web Speech API on the Google Chrome browser. They are different from the Google Cloud Speech-to-Text API.

Disclaimer: The Google Speech API is an internal API and totally unsupported, susceptible to change or disappear at any moment in the future.

Usage

Import it as a package:

import (
    "github.com/giulianopz/go-gstt/pkg/client"
    "github.com/giulianopz/go-gstt/pkg/transcription"
)

func main() {
	var (
		httpC   = client.New()
		in      io.Reader                            // audio input
		options *opts.Options                        // configure transcription parameters
		out     = make(chan *transcription.Response) // receive results from channel
	)

	go httpC.Transcribe(in, out, options)

	for resp := range out {
		for _, result := range resp.Result {
			for _, alt := range result.Alternative {
				fmt.Printf("confidence=%f, transcript=%s\n", alt.Confidence, strings.TrimSpace(alt.Transcript))
			}
		}
	}
}

Use it as a command:

$ git clone https://github.com/giulianopz/go-gstt
$ cd go-gstt
$ go build -o gstt .
$ mv gstt /usr/local/bin
# or just `go install github.com/giulianopz/go-gstt@latest`, if you don't want to rename the binary
$ gstt -h
Usage:
    gstt [OPTION]... --interim --continuous [--file FILE]

Options:
        --verbose
        --file, path of audio file to trascript
        --key, API key to authenticates request (default is the one built into any Chrome installation)
        --language, language of the recording transcription, use the standard webcodes for your language, i.e. 'en-US' for English-US, 'ru' for Russian, etc. please, see https://en.wikipedia.org/wiki/IETF_language_tag
        --continuous, to keep the stream open and transcoding as long as there is no silence
        --interim, to send back results before its finished, so you get a live stream of possible transcriptions as it processes the audio
        --max-alts, how many possible transcriptions do you want
        --pfilter, profanity filter ('0'=off, '1'=medium, '2'=strict)
        --user-agent, user-agent for spoofing
        --sample-rate, audio sampling rate
# trascribe audio from a single FLAC file
$ gstt --interim --continuous --file $FILE
# trascribe audio from microphone input (recorded with sox, removing silence)
$ rec -c 1 --encoding signed-integer --bits 16 --rate 16000 -t flac - silence 1 0.1 1% -1 0.5 1% | gstt --interim --continuous

Note: the Google Speech API seems to accept only input audio with 16k sample rate and 1 channel. If you need to mix a single stereo stream (2 channels) down to a mono stream (1 channel), please read the ffmpeg docs.

Demo

Live-caption speech redirecting speakers output to microphone input with PulseAudio Volume Control (pavucontrol):

(how-to-gif)

Credits

As far as I know, this API has been going around since a long time.

Mike Pultz was possibly the first one to discover it in 2011. Subsequently, Travis Payton published a detailed report on the subject.

I wrote about it on my blog.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
assets		assets
pkg		pkg
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Taskfile.yaml		Taskfile.yaml
go.mod		go.mod
go.sum		go.sum
main.go		main.go
split-flac.sh		split-flac.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gstt

Usage

Demo

Credits

About

Releases

Languages

License

giulianopz/go-gstt

Folders and files

Latest commit

History

Repository files navigation

gstt

Usage

Demo

Credits

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages