Release v0.13.1 - Voice mode, Super speedy streaming, and a lot more (#255)

## Thanks for contributions

- PR [#249](https://github.com/NeuralNomadsAI/CodeNomad/pull/249)
"feat(speech): add prompt voice input" by
[@shantur](https://github.com/shantur)
- PR [#243](https://github.com/NeuralNomadsAI/CodeNomad/pull/243)
"feat(i18n): Hebrew locale + full RTL support" by
[@MusiCode1](https://github.com/MusiCode1)
- PR [#241](https://github.com/NeuralNomadsAI/CodeNomad/pull/241)
"feat(lazy loading): Implement virtual list with virtua" by
[@pixellos](https://github.com/pixellos)
- PR [#240](https://github.com/NeuralNomadsAI/CodeNomad/pull/240)
"fix(tauri): force Windows process tree shutdown" by
[@pascalandr](https://github.com/pascalandr)
- PR [#239](https://github.com/NeuralNomadsAI/CodeNomad/pull/239)
"perf(ui): split right panel and secondary viewer chunks" by
[@pascalandr](https://github.com/pascalandr)
- PR [#238](https://github.com/NeuralNomadsAI/CodeNomad/pull/238)
"perf(ui): defer locale and overlay bundles" by
[@pascalandr](https://github.com/pascalandr)
- PR [#236](https://github.com/NeuralNomadsAI/CodeNomad/pull/236)
"Suppress OS notifications for subagent (child) sessions" by
`@app/codenomadbot`
- PR [#235](https://github.com/NeuralNomadsAI/CodeNomad/pull/235)
"fix(ui): unwrap pasted placeholders in slash commands" by
`@app/codenomadbot`
- PR [#232](https://github.com/NeuralNomadsAI/CodeNomad/pull/232)
"fix(tauri): stop CLI process group on exit" by `@app/codenomadbot`
- PR [#229](https://github.com/NeuralNomadsAI/CodeNomad/pull/229)
"feat(ui): add RTL support for Hebrew/Arabic text" by
[@MusiCode1](https://github.com/MusiCode1)
- PR [#227](https://github.com/NeuralNomadsAI/CodeNomad/pull/227)
"fix(tauri): improve Windows desktop runtime behavior" by
[@pascalandr](https://github.com/pascalandr)
- PR [#226](https://github.com/NeuralNomadsAI/CodeNomad/pull/226)
"fix(tauri): restore desktop menu controls and fullscreen shortcut" by
[@pascalandr](https://github.com/pascalandr)
- PR [#225](https://github.com/NeuralNomadsAI/CodeNomad/pull/225)
"fix(tauri): restore external links in the folder picker" by
[@pascalandr](https://github.com/pascalandr)
- PR [#224](https://github.com/NeuralNomadsAI/CodeNomad/pull/224)
"fix(tauri): sync server UI bundle during prebuild" by
[@pascalandr](https://github.com/pascalandr)
- PR [#215](https://github.com/NeuralNomadsAI/CodeNomad/pull/215)
"perf(ui): lazy-load markdown and defer diff rendering" by
[@pascalandr](https://github.com/pascalandr)

## Highlights

- **Voice-first conversations**: Start prompts with voice input,
configure speech behavior from settings, and listen back to assistant
responses with message playback and conversation playback controls.
- **A complete Hebrew + RTL experience**: CodeNomad now ships with a
full Hebrew locale and much broader right-to-left support, making the
app feel natural for Hebrew users while improving Arabic text rendering
too.
- **A much faster experience in long chats**: The new virtualized
message list, deferred markdown and diff rendering, and more selective
loading for heavy UI surfaces make large sessions feel noticeably
smoother.

## What's Improved

- **More flexible speech controls**: Speech settings and playback modes
now adapt better to different browsers and platform capabilities.
- **Cleaner prompt workflow**: The prompt includes a quick clear action,
a simpler recording indicator, and a more polished mic control layout.
- **Faster startup and lighter heavy views**: Locale bundles, overlays,
right-panel viewers, picker flows, markdown, and diff surfaces all load
more lazily to reduce upfront UI work.
- **Less notification spam**: Subagent sessions no longer fire OS
notifications, so important interruptions are easier to notice.
- **Better RTL behavior across the whole interface**: Session names,
tool outputs, markdown blocks, file views, selectors, and layout
controls behave more consistently in right-to-left contexts.

## Fixes

- **More reliable Windows desktop behavior**: Process cleanup is
stronger during app shutdown, background CLI process trees are
terminated more reliably, desktop identity/metadata is aligned more
cleanly, and stray console windows are hidden during startup and exit.
- **Cleaner shutdown on macOS and Linux**: Desktop quit/close now stops
the spawned CLI process group more reliably, reducing leftover
background processes after exit.
- **Restored desktop actions**: External links in the folder picker work
again, and the desktop View/Window controls plus the fullscreen shortcut
are back.
- **More stable streaming and scrolling**: Reasoning streams stay pinned
more consistently, follow behavior is less jumpy, spacing is cleaner in
virtualized conversations, and session switching retains position more
smoothly.
- **Safer slash command pasting**: Pasted placeholders are resolved
correctly before slash commands run, so long pasted inputs behave like
normal prompts.
- **More dependable desktop packaging**: Tauri prebuild now refreshes
the server UI bundle correctly, which avoids packaged desktop builds
picking up stale UI assets.
- **Clearer speech compatibility handling**: Streaming playback
limitations are surfaced more cleanly instead of failing in a confusing
way.

### Contributors

- [@pascalandr](https://github.com/pascalandr)
- [@MusiCode1](https://github.com/MusiCode1)
- [@pixellos](https://github.com/pixellos)
This commit is contained in:
Shantur Rathore
2026-03-27 19:58:35 +00:00
committed by GitHub
151 changed files with 7706 additions and 3516 deletions

View File

@@ -207,6 +207,39 @@ export interface BinaryValidationResult {
error?: string
}
export interface SpeechSegment {
startMs: number
endMs: number
text: string
}
export interface SpeechCapabilitiesResponse {
available: boolean
configured: boolean
provider: string
supportsStt: boolean
supportsTts: boolean
supportsStreamingTts: boolean
baseUrl?: string
sttModel: string
ttsModel: string
ttsVoice: string
ttsFormats: string[]
streamingTtsFormats: string[]
}
export interface SpeechTranscriptionResponse {
text: string
language?: string
durationMs?: number
segments?: SpeechSegment[]
}
export interface SpeechSynthesisResponse {
audioBase64: string
mimeType: string
}
export type WorkspaceEventType =
| "workspace.created"
| "workspace.started"

View File

@@ -23,6 +23,7 @@ import { AuthManager, BOOTSTRAP_TOKEN_STDOUT_PREFIX, DEFAULT_AUTH_COOKIE_NAME, D
import { resolveHttpsOptions } from "./server/tls"
import { resolveNetworkAddresses } from "./server/network-addresses"
import { startDevReleaseMonitor } from "./releases/dev-release-monitor"
import { SpeechService } from "./speech/service"
const require = createRequire(import.meta.url)
@@ -313,6 +314,7 @@ async function main() {
})
const fileSystemBrowser = new FileSystemBrowser({ rootDir: options.rootDir, unrestricted: options.unrestrictedRoot })
const instanceStore = new InstanceStore(configLocation.instancesDir)
const speechService = new SpeechService(settings, logger.child({ component: "speech" }))
const instanceEventBridge = new InstanceEventBridge({
workspaceManager,
eventBus,
@@ -397,6 +399,7 @@ async function main() {
eventBus,
serverMeta,
instanceStore,
speechService,
authManager,
uiStaticDir: uiResolution.uiStaticDir ?? DEFAULT_UI_STATIC_DIR,
uiDevServerUrl: uiResolution.uiDevServerUrl,
@@ -417,6 +420,7 @@ async function main() {
eventBus,
serverMeta,
instanceStore,
speechService,
authManager,
uiStaticDir: uiResolution.uiStaticDir ?? DEFAULT_UI_STATIC_DIR,
uiDevServerUrl: undefined,

View File

@@ -21,12 +21,14 @@ import { registerStorageRoutes } from "./routes/storage"
import { registerPluginRoutes } from "./routes/plugin"
import { registerBackgroundProcessRoutes } from "./routes/background-processes"
import { registerWorktreeRoutes } from "./routes/worktrees"
import { registerSpeechRoutes } from "./routes/speech"
import { ServerMeta } from "../api-types"
import { InstanceStore } from "../storage/instance-store"
import { BackgroundProcessManager } from "../background-processes/manager"
import type { AuthManager } from "../auth/manager"
import { registerAuthRoutes } from "./routes/auth"
import { sendUnauthorized, wantsHtml } from "../auth/http-auth"
import type { SpeechService } from "../speech/service"
interface HttpServerDeps {
bindHost: string
@@ -41,6 +43,7 @@ interface HttpServerDeps {
eventBus: EventBus
serverMeta: ServerMeta
instanceStore: InstanceStore
speechService: SpeechService
authManager: AuthManager
uiStaticDir: string
uiDevServerUrl?: string
@@ -252,6 +255,7 @@ export function createHttpServer(deps: HttpServerDeps) {
eventBus: deps.eventBus,
workspaceManager: deps.workspaceManager,
})
registerSpeechRoutes(app, { speechService: deps.speechService })
registerPluginRoutes(app, { workspaceManager: deps.workspaceManager, eventBus: deps.eventBus, logger: proxyLogger })
registerBackgroundProcessRoutes(app, { backgroundProcessManager })
registerInstanceProxyRoutes(app, { workspaceManager: deps.workspaceManager, logger: proxyLogger })

View File

@@ -3,6 +3,7 @@ import { z } from "zod"
import { probeBinaryVersion } from "../../workspaces/runtime"
import type { SettingsService } from "../../settings/service"
import type { Logger } from "../../logger"
import { sanitizeConfigDoc, sanitizeConfigOwner } from "../../settings/public-config"
interface RouteDeps {
settings: SettingsService
@@ -20,10 +21,10 @@ function validateBinaryPath(binaryPath: string): { valid: boolean; version?: str
export function registerSettingsRoutes(app: FastifyInstance, deps: RouteDeps) {
// Full-document access
app.get("/api/storage/config", async () => deps.settings.getDoc("config"))
app.get("/api/storage/config", async () => sanitizeConfigDoc(deps.settings.getDoc("config")))
app.patch("/api/storage/config", async (request, reply) => {
try {
return deps.settings.mergePatchDoc("config", request.body ?? {})
return sanitizeConfigDoc(deps.settings.mergePatchDoc("config", request.body ?? {}))
} catch (error) {
reply.code(400)
return { error: error instanceof Error ? error.message : "Invalid patch" }
@@ -31,12 +32,15 @@ export function registerSettingsRoutes(app: FastifyInstance, deps: RouteDeps) {
})
app.get<{ Params: { owner: string } }>("/api/storage/config/:owner", async (request) => {
return deps.settings.getOwner("config", request.params.owner)
return sanitizeConfigOwner(request.params.owner, deps.settings.getOwner("config", request.params.owner))
})
app.patch<{ Params: { owner: string } }>("/api/storage/config/:owner", async (request, reply) => {
try {
return deps.settings.mergePatchOwner("config", request.params.owner, request.body ?? {})
return sanitizeConfigOwner(
request.params.owner,
deps.settings.mergePatchOwner("config", request.params.owner, request.body ?? {}),
)
} catch (error) {
reply.code(400)
return { error: error instanceof Error ? error.message : "Invalid patch" }

View File

@@ -0,0 +1,74 @@
import type { FastifyInstance } from "fastify"
import { z } from "zod"
import type { SpeechService } from "../../speech/service"
interface RouteDeps {
speechService: SpeechService
}
const TranscribeBodySchema = z.object({
audioBase64: z.string().min(1, "Audio payload is required"),
mimeType: z.string().min(1, "Audio MIME type is required"),
filename: z.string().optional(),
language: z.string().optional(),
prompt: z.string().optional(),
})
const SynthesizeBodySchema = z.object({
text: z.string().trim().min(1, "Text is required"),
format: z.enum(["mp3", "wav", "opus", "aac"]).optional(),
})
function getSpeechErrorStatus(error: unknown): number {
if (error instanceof z.ZodError) {
return 400
}
if (error instanceof Error && /not configured/i.test(error.message)) {
return 503
}
return 502
}
function getSpeechErrorMessage(error: unknown, fallback: string): string {
return error instanceof Error ? error.message : fallback
}
export function registerSpeechRoutes(app: FastifyInstance, deps: RouteDeps) {
app.get("/api/speech/capabilities", async () => deps.speechService.getCapabilities())
app.post("/api/speech/transcribe", async (request, reply) => {
try {
const body = TranscribeBodySchema.parse(request.body ?? {})
return await deps.speechService.transcribe(body)
} catch (error) {
request.log.error({ err: error }, "Failed to transcribe audio")
reply.code(getSpeechErrorStatus(error))
return { error: getSpeechErrorMessage(error, "Failed to transcribe audio") }
}
})
app.post("/api/speech/synthesize", async (request, reply) => {
try {
const body = SynthesizeBodySchema.parse(request.body ?? {})
return await deps.speechService.synthesize(body)
} catch (error) {
request.log.error({ err: error }, "Failed to synthesize audio")
reply.code(getSpeechErrorStatus(error))
return { error: getSpeechErrorMessage(error, "Failed to synthesize audio") }
}
})
app.post("/api/speech/synthesize/stream", async (request, reply) => {
try {
const body = SynthesizeBodySchema.parse(request.body ?? {})
const result = await deps.speechService.synthesizeStream(body)
reply.header("Content-Type", result.mimeType)
reply.header("Cache-Control", "no-store")
return reply.send(result.stream)
} catch (error) {
request.log.error({ err: error }, "Failed to stream synthesized audio")
reply.code(getSpeechErrorStatus(error))
return { error: getSpeechErrorMessage(error, "Failed to stream synthesized audio") }
}
})
}

View File

@@ -0,0 +1,40 @@
import type { SettingsDoc } from "./yaml-doc-store"
function isPlainObject(value: unknown): value is Record<string, unknown> {
return typeof value === "object" && value !== null && !Array.isArray(value)
}
function sanitizeServerOwner(value: SettingsDoc): SettingsDoc {
const next: SettingsDoc = { ...value }
const speech = isPlainObject(next.speech) ? { ...next.speech } : null
if (!speech) {
return next
}
const rawApiKey = typeof speech.apiKey === "string" ? speech.apiKey.trim() : ""
if (rawApiKey) {
delete speech.apiKey
speech.hasApiKey = true
} else if (!("hasApiKey" in speech)) {
speech.hasApiKey = false
}
next.speech = speech
return next
}
export function sanitizeConfigOwner(owner: string, value: SettingsDoc): SettingsDoc {
if (owner !== "server") {
return value
}
return sanitizeServerOwner(value)
}
export function sanitizeConfigDoc(value: SettingsDoc): SettingsDoc {
const next: SettingsDoc = { ...value }
if (isPlainObject(next.server)) {
next.server = sanitizeServerOwner(next.server)
}
return next
}

View File

@@ -4,6 +4,7 @@ import type { ConfigLocation } from "../config/location"
import { YamlDocStore, type SettingsDoc } from "./yaml-doc-store"
import { migrateSettingsLayout } from "./migrate"
import type { WorkspaceEventPayload } from "../api-types"
import { sanitizeConfigOwner } from "./public-config"
export type DocKind = "config" | "state"
@@ -45,10 +46,11 @@ export class SettingsService {
private publish(kind: DocKind, owner: string, value?: SettingsDoc) {
if (!this.eventBus) return
const type = kind === "config" ? "storage.configChanged" : "storage.stateChanged"
const nextValue = value ?? this.getOwner(kind, owner)
const payload: WorkspaceEventPayload = {
type,
owner,
value: value ?? this.getOwner(kind, owner),
value: kind === "config" ? sanitizeConfigOwner(owner, nextValue) : nextValue,
} as any
this.eventBus.publish(payload)
}

View File

@@ -0,0 +1,204 @@
import { Readable } from "node:stream"
import OpenAI from "openai"
import { toFile } from "openai/uploads"
import type { SpeechSynthesisResponse, SpeechTranscriptionResponse } from "../../api-types"
import type { Logger } from "../../logger"
import type { NormalizedSpeechSettings, SpeechSynthesisStreamResponse, SynthesizeSpeechInput, TranscribeAudioInput } from "../service"
interface OpenAICompatibleSpeechProviderOptions {
settings: NormalizedSpeechSettings
logger: Logger
}
export class OpenAICompatibleSpeechProvider {
constructor(private readonly options: OpenAICompatibleSpeechProviderOptions) {}
getCapabilities() {
const { settings } = this.options
return {
available: true,
configured: Boolean(settings.apiKey),
provider: settings.provider,
supportsStt: true,
supportsTts: true,
supportsStreamingTts: true,
baseUrl: settings.baseUrl,
sttModel: settings.sttModel,
ttsModel: settings.ttsModel,
ttsVoice: settings.ttsVoice,
ttsFormats: ["mp3", "wav", "opus", "aac"],
streamingTtsFormats: ["mp3", "wav", "opus", "aac"],
}
}
async transcribe(input: TranscribeAudioInput): Promise<SpeechTranscriptionResponse> {
const client = this.createClient()
const startedAt = Date.now()
const extension = extensionForMime(input.mimeType)
const buffer = Buffer.from(input.audioBase64, "base64")
const filename = input.filename?.trim() || `prompt-input.${extension}`
this.options.logger.info(
{
mimeType: input.mimeType,
bytes: buffer.byteLength,
language: input.language,
model: this.options.settings.sttModel,
},
"speech.transcribe",
)
const response = await this.requestTranscription(client, buffer, filename, input)
return {
text: typeof response?.text === "string" ? response.text : "",
language: typeof response?.language === "string" ? response.language : input.language,
durationMs: Number.isFinite(response?.duration) ? Math.round(Number(response.duration) * 1000) : Date.now() - startedAt,
segments: Array.isArray(response?.segments)
? response.segments
.filter((segment: any) => typeof segment?.text === "string")
.map((segment: any) => ({
startMs: Math.max(0, Math.round(Number(segment.start ?? 0) * 1000)),
endMs: Math.max(0, Math.round(Number(segment.end ?? 0) * 1000)),
text: String(segment.text),
}))
: undefined,
}
}
private async requestTranscription(
client: OpenAI,
buffer: Buffer,
filename: string,
input: TranscribeAudioInput,
): Promise<any> {
const baseRequest = {
model: this.options.settings.sttModel,
...(input.language ? { language: input.language } : {}),
...(input.prompt ? { prompt: input.prompt } : {}),
}
try {
const file = await toFile(buffer, filename, { type: input.mimeType })
return (await client.audio.transcriptions.create({
...baseRequest,
file,
response_format: "verbose_json" as any,
} as any)) as any
} catch (error) {
this.options.logger.warn({ err: error }, "speech.transcribe verbose_json failed; retrying default format")
const retryFile = await toFile(buffer, filename, { type: input.mimeType })
return (await client.audio.transcriptions.create({
...baseRequest,
file: retryFile,
} as any)) as any
}
}
async synthesize(input: SynthesizeSpeechInput): Promise<SpeechSynthesisResponse> {
const format = input.format ?? this.options.settings.ttsFormat
this.options.logger.info(
{
model: this.options.settings.ttsModel,
voice: this.options.settings.ttsVoice,
format,
},
"speech.synthesize",
)
const response = await this.requestSpeechAudio(input.text, format)
const mimeType = response.headers.get("content-type") || mimeTypeForFormat(format)
const audioBuffer = Buffer.from(await response.arrayBuffer())
return {
audioBase64: audioBuffer.toString("base64"),
mimeType,
}
}
async synthesizeStream(input: SynthesizeSpeechInput): Promise<SpeechSynthesisStreamResponse> {
const format = input.format ?? this.options.settings.ttsFormat
this.options.logger.info(
{
model: this.options.settings.ttsModel,
voice: this.options.settings.ttsVoice,
format,
},
"speech.synthesize.stream",
)
const response = await this.requestSpeechAudio(input.text, format)
if (!response.body) {
throw new Error("Speech provider did not return a stream.")
}
return {
stream: Readable.fromWeb(response.body as any),
mimeType: response.headers.get("content-type") || mimeTypeForFormat(format),
}
}
private async requestSpeechAudio(text: string, format: "mp3" | "wav" | "opus" | "aac"): Promise<Response> {
const { settings } = this.options
if (!settings.apiKey) {
throw new Error("Speech provider is not configured. Add an API key in Speech settings.")
}
const endpoint = new URL("audio/speech", ensureTrailingSlash(settings.baseUrl ?? "https://api.openai.com/v1"))
const response = await fetch(endpoint, {
method: "POST",
headers: {
Authorization: `Bearer ${settings.apiKey}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: settings.ttsModel,
voice: settings.ttsVoice,
input: text,
response_format: format,
}),
})
if (!response.ok) {
const detail = await response.text()
throw new Error(detail || `Speech synthesis failed with ${response.status}`)
}
return response
}
private createClient(): OpenAI {
const { settings } = this.options
if (!settings.apiKey) {
throw new Error("Speech provider is not configured. Add an API key in Speech settings.")
}
return new OpenAI({
apiKey: settings.apiKey,
baseURL: settings.baseUrl,
})
}
}
function extensionForMime(mimeType: string): string {
const normalized = mimeType.toLowerCase()
if (normalized.includes("webm")) return "webm"
if (normalized.includes("ogg")) return "ogg"
if (normalized.includes("wav")) return "wav"
if (normalized.includes("mpeg") || normalized.includes("mp3")) return "mp3"
if (normalized.includes("mp4") || normalized.includes("aac")) return "m4a"
return "webm"
}
function mimeTypeForFormat(format: "mp3" | "wav" | "opus" | "aac"): string {
if (format === "wav") return "audio/wav"
if (format === "opus") return 'audio/ogg; codecs="opus"'
if (format === "aac") return "audio/aac"
return "audio/mpeg"
}
function ensureTrailingSlash(value: string): string {
return value.endsWith("/") ? value : `${value}/`
}

View File

@@ -0,0 +1,106 @@
import { z } from "zod"
import type { Readable } from "node:stream"
import type { Logger } from "../logger"
import type { SettingsService } from "../settings/service"
import type { SpeechCapabilitiesResponse, SpeechSynthesisResponse, SpeechTranscriptionResponse } from "../api-types"
import { OpenAICompatibleSpeechProvider } from "./providers/openai-compatible"
const ServerSpeechSettingsSchema = z.object({
speech: z
.object({
provider: z.string().optional(),
apiKey: z.string().optional(),
baseUrl: z.string().optional(),
sttModel: z.string().optional(),
ttsModel: z.string().optional(),
ttsVoice: z.string().optional(),
ttsFormat: z.enum(["mp3", "wav", "opus", "aac"]).optional(),
})
.optional(),
})
export interface TranscribeAudioInput {
audioBase64: string
mimeType: string
filename?: string
language?: string
prompt?: string
}
export interface SynthesizeSpeechInput {
text: string
format?: "mp3" | "wav" | "opus" | "aac"
}
export interface SpeechSynthesisStreamResponse {
stream: Readable
mimeType: string
}
export interface SpeechProvider {
getCapabilities(): SpeechCapabilitiesResponse
transcribe(input: TranscribeAudioInput): Promise<SpeechTranscriptionResponse>
synthesize(input: SynthesizeSpeechInput): Promise<SpeechSynthesisResponse>
synthesizeStream(input: SynthesizeSpeechInput): Promise<SpeechSynthesisStreamResponse>
}
export interface NormalizedSpeechSettings {
provider: string
apiKey?: string
baseUrl?: string
sttModel: string
ttsModel: string
ttsVoice: string
ttsFormat: "mp3" | "wav" | "opus" | "aac"
}
const DEFAULT_PROVIDER = "openai-compatible"
const DEFAULT_STT_MODEL = "gpt-4o-mini-transcribe"
const DEFAULT_TTS_MODEL = "gpt-4o-mini-tts"
const DEFAULT_TTS_VOICE = "alloy"
const DEFAULT_TTS_FORMAT = "mp3"
export class SpeechService {
constructor(
private readonly settings: SettingsService,
private readonly logger: Logger,
) {}
getCapabilities(): SpeechCapabilitiesResponse {
return this.createProvider().getCapabilities()
}
async transcribe(input: TranscribeAudioInput): Promise<SpeechTranscriptionResponse> {
return this.createProvider().transcribe(input)
}
async synthesize(input: SynthesizeSpeechInput): Promise<SpeechSynthesisResponse> {
return this.createProvider().synthesize(input)
}
async synthesizeStream(input: SynthesizeSpeechInput): Promise<SpeechSynthesisStreamResponse> {
return this.createProvider().synthesizeStream(input)
}
private createProvider(): SpeechProvider {
const settings = this.resolveSettings()
return new OpenAICompatibleSpeechProvider({
settings,
logger: this.logger.child({ provider: settings.provider }),
})
}
private resolveSettings(): NormalizedSpeechSettings {
const parsed = ServerSpeechSettingsSchema.parse(this.settings.getOwner("config", "server") ?? {})
const speech = parsed.speech ?? {}
return {
provider: speech.provider?.trim() || DEFAULT_PROVIDER,
apiKey: speech.apiKey?.trim() || process.env.OPENAI_API_KEY,
baseUrl: speech.baseUrl?.trim() || process.env.OPENAI_BASE_URL || undefined,
sttModel: speech.sttModel?.trim() || DEFAULT_STT_MODEL,
ttsModel: speech.ttsModel?.trim() || DEFAULT_TTS_MODEL,
ttsVoice: speech.ttsVoice?.trim() || DEFAULT_TTS_VOICE,
ttsFormat: speech.ttsFormat ?? DEFAULT_TTS_FORMAT,
}
}
}