arcosoph_ai 6 hours ago

Hi HN,

This is Nanowakeword, an open-source Python framework designed to solve a common problem in voice AI: the complex and time-consuming process of training custom, high-performance wake word models.

The core of the project is an Intelligent Configuration Engine. Instead of requiring manual hyperparameter tuning (learning rates, model architecture, etc.), the engine analyzes the user's dataset and automatically generates an optimal, data-driven training configuration. The goal is to abstract away the complexity and replace hours of manual trial-and-error with a single `--auto-config` flag.

It works by analyzing the statistical properties of the provided audio data (duration, noise, balance) and then designs a suitable model architecture and training plan. The training itself uses a modern pipeline with techniques like Cyclical Learning Rates, and outputs optimized models in ONNX and TFLite formats, ready for edge devices.

The entire project is packaged for a simple `pip install "nanowakeword[train]"` and can be run with a clean command-line tool (`nanowakeword-train`). It's fully open-source under the Apache 2.0 license.

The project is still in its early stages but the core engine is robust. It's developed under the Arcosoph initiative, with the vision of creating powerful and accessible open-source AI tools. We are actively seeking feedback, suggestions, and criticisms from the community.

Tech Stack: PyTorch, ONNX, TensorFlow (for TFLite conversion).

  • MzHN 6 hours ago

    If I want to replace openWakeWord in Home Assistant Voice Assistant pipeline with this, any idea how difficult it would be?

    • arcosoph_ai 5 hours ago

      That's a great question, and it's a core design goal for Nanowakeword.

      The short answer is: it should be very easy.

      Since Nanowakeword is designed to be a full framework, the plan is for it to have an inference API that is largely compatible with how openWakeWord works. My understanding of the HA pipeline is that it's a Python-based system.

      In theory, the steps would be: 1. `pip uninstall openwakeword` and `pip install nanowakeword`. 2. In the relevant Home Assistant Python script, change the import from `from openwakeword import ...` to something like `from nanowakeword import Model`. 3. Instantiate the model with the path to your custom `.onnx` file.

      The Nanowakeword `Model` class is designed to handle the necessary audio preprocessing (feature extraction) internally, so you shouldn't need to worry about manually replicating it. The `.onnx` models are already compatible.

      While I haven't tested the Home Assistant integration myself yet, building a seamless replacement path is a top priority. Your question is a great confirmation that this is what users want. If you run into any issues trying this, please open an issue on the GitHub repo, and I'll be happy to help you debug it.

      Thanks!