This work explores prosodic cues for phone and silent pauses segmentation. We have analyzed the automatic diphone segmentation and adapted this segmentation into a new one based in combined prosodic features (pitch, energy and duration). Despite the ongoing nature of our work, we achieved promising results regarding the improvement of silence detection, the durations of silent pauses previously detected, and also the durations of diphones at initial and final position of a prosodic constituent or sentence-like unit level. This work also may be seen as a more reliable basis for processing prosodic and lexical features for the identification of sentence-like units and consequently for punctuation mark detection and capitalization. The immediate implications of our results are mainly oriented to two aspects: i) retraining the recognizer to account for the acoustic cues emphasized in this work; ii) applying a more sophisticated and dynamic way of learning/implementing the acoustic cues underlying silence detection and diphone durations.