How Audiobook Converter Pro Finds Chapters That Don't Exist
The silence detection engine that turns markerless audio into structured audiobooks
A lot of audiobooks have no chapter markers. Not a few. A lot. Any file ripped from a physical CD before 2010, any old recording from a narrator who just hit record and read the whole thing in one pass, any download from a smaller site that never embedded chapter metadata properly. The file arrives as one continuous stream from 0:00 to 16:43:07, and every player just shows a single progress bar scrubbing through the entire thing. No chapters. No structure. Just audio.
This is a real problem if you're trying to convert the file chapter-by-chapter, because there are no chapters to work with.
Audiobook Converter Pro has a way to handle this. It's not magic, and it's not perfect, but it's considerably more careful than guessing.
The Actual Problem With Markerless Files
When you load an M4B file that has embedded chapter markers, converting it chapter-by-chapter is straightforward. The markers are timestamps. The app reads them with FFprobe, slices the audio accordingly, and each output file corresponds to one embedded chapter. The information was already there.
When there are no markers, the app has two basic choices. Convert the whole file as a single output (which defeats most of the purpose of chapter-by-chapter conversion), or attempt to figure out where the chapters probably are by analyzing the audio itself.
The second option requires the app to actually listen to the file.
How Silence Detection Works
Audiobook narration has a specific structure. Readers pause between chapters. Those pauses are noticeably longer than the pauses between sentences or paragraphs, usually several seconds of near-silence. A narrator finishing a chapter and starting the next one typically produces a silence window of three to five seconds, sometimes longer. That pattern is consistent enough to be useful.
Audiobook Converter Pro feeds the file through FFmpeg's silencedetect filter, which scans the audio and reports timestamps for every moment where the volume drops below a given threshold and stays there for at least a minimum duration. The app parses that output, building a list of silence events, each one with a start time, an end time, and a duration.
Those events then go through a planning stage. The planner filters out anything outside the actual file duration, sorts events by position, and merges any two silence windows that fall within eight seconds of each other. The merge step matters because a narrator might cough, pause briefly, then continue the pause, which would otherwise produce two markers close together rather than one clean boundary.
From the merged silence list, the planner derives chapter boundaries. Each boundary is placed one second before the silence ends rather than exactly at its midpoint. That one-second pre-roll means chapters begin with a fraction of the lead-in audio, not with the dead air of the silence itself. The chapter starts where the narrator starts, not where the silence stops.
Then the planner filters again. Any chapter segment shorter than three minutes gets merged into the following chapter. Any chapter longer than two hours gets split at regular intervals. The split only applies on the Balanced preset; the Conservative preset trusts the detected markers entirely and leaves long chapters alone.
Three Presets, Different Tradeoffs
The detection parameters matter a lot. Too sensitive and every throat clear becomes a false boundary. Too conservative and real chapter breaks get ignored.
The Balanced preset uses a relative threshold 25 dB below the file's detected noise floor, requires at least 3.3 seconds of silence, and sets a three-minute minimum chapter length. This is the default and it works well on most commercially produced audiobooks.
The Conservative preset tightens the minimum silence duration to four seconds and uses a 20 dB offset. It also enforces a five-minute minimum chapter length and never time-splits long chapters. It produces fewer, longer chapters and is more likely to miss a genuine boundary than to invent a false one. Better for files where false positives are costly.
The Sensitive preset loosens everything. It requires only 2.5 seconds of silence, uses a 30 dB offset, and accepts chapters as short as two minutes. This catches more boundaries including borderline ones, which is useful on recordings with uneven narration pacing or unusual chapter lengths. It also produces more false positives on noisy files.
You can also set your own values. The threshold, minimum silence duration, pre-roll, and minimum chapter length are all configurable.
What You See While It's Running
The app shows a live progress panel while detection runs. There's a timeline strip across the panel, and as FFmpeg reports silence events, orange markers appear in real time at the positions where silences were found. You watch the markers accumulate. The panel shows the preset name, the threshold in dB, which detection attempt the scan is on, and a count of detected silences so far.
The detection can run multiple attempts. If the first pass produces a result with fewer than two chapters or more than eighty, the app tries again with adjusted parameters. The panel updates to show which attempt is running.
When the scan finishes, the planner scores the result. It assigns a confidence level of high, medium, or low based on the number and distribution of chapters. A high-confidence result has a plausible number of chapters with reasonable durations. Low confidence means the detection probably missed something or found too much noise.
What Happens Next
After detection, you have a choice. Accept the inferred chapters and convert, convert the file as a single output, or skip it. That's the default behavior when you run the app interactively. You pick what to do per-file, based on the timeline and the confidence level.
In Watch Folder mode with Auto Convert enabled, the app applies the policy automatically. The inferWhenHighConfidence option tells the Watch Folder to convert using inferred chapters only when the confidence score comes back high, and queue everything else for your review. This means low-confidence detections never get processed silently without your involvement.
Honest Assessment
Silence detection is a heuristic, not a guarantee. It works reliably on clean commercial productions with consistent pacing and standard narration style. It works less reliably on recordings with ambient noise, variable pacing, or narrator habits that involve frequent long pauses mid-paragraph.
The app will not invent structure that isn't there. If a file has no audible silence patterns that meet the detection criteria, the result comes back empty and you're offered the single-file fallback. That's the right behavior. A false chapter boundary inside a sentence is worse than no chapter boundary at all.
What the silence detection system does is give you something to work with on files that would otherwise be a wall-to-wall conversion with no internal structure. Most of the time, on well-recorded material, it produces results that hold up.
Try It
Audiobook Converter Pro is available on the Mac App Store and the Microsoft Store. The free tier lets you convert files and see how the conversion process works. The silence-based chapter inference is part of the Pro tier, along with Watch Folder automation and batch conversion features.
If you have a collection of old rips with no chapter markers, load one into the app and run the detection. The timeline view makes it immediately obvious whether the scan found meaningful boundaries or noise. That's the fastest way to know if it's useful for your specific files.
If something behaves unexpectedly, the preferences panel shows the full detection configuration and lets you adjust every parameter. Report what you find and what the file looked like; that feedback shapes where the system gets better.
Anoop builds Audiobook Converter Pro for Mac on evenings and weekends. The silence detection engine took longer to get right than he expected, mostly because audiobook recordings are messier than they look from the outside.



