The features of human speech signals and emotional states are used to estimate the blood pressure (BP)using a clustering-based model. The audio-emotion-dependent discriminative features are identifiedto distinguish individuals based on their speech to form emotional groups. We propose a bio-inspiredEnhanced grey wolf spotted hyena optimization (EWHO) technique for emotion clustering, whichadds significance to this research. The model derives the most informative and judicial features fromthe audio signal, along with the person’s emotional states to estimate the BP using the multi-classsupport vector machine (SVM) classifier. The EWHO-based clustering method gives better accuracy(95.59%), precision (97.08%), recall (95.16%) and F1 measure (96.20%), as compared to other methodsused for BP estimation. Additionally, the proposed EWHO algorithm gives superior results in terms ofparameters such as the silhouette score, Davies-Bouldin score, homogeneity score, completeness score,Dunn index, and Jaccard similarity score.