Thailand Statistician

Estimation of Regression Model with Interacted Autoregressive Integrated Moving Average (INTARIMA) Errors

Baskaran Thangarajan, Nimitha John — Sun, 28 Jun 2026 00:00:00 +0700

Regression analysis assumes independence of errors. Failure of it makes the model untenable.
This article introduces a new method for estimating a regression model when the errors are correlated,
specifically in the Interacted Autoregressive Integrated Moving Average (INTARIMA) structure. The
proposed method outperforms traditional methods dealing with regression models with independent
or ARIMA errors. The method is unique in the sense that it deals with the autocorrelated error with
interactions, which hitherto not been addressed. The superiority of the new approach is established
through a simulation analysis. To validate the suitability of the suggested method for real-world
applications, we apply it to the Nelson-Plosser macroeconomic time series data on the Consumer
Price Index (CPI) and Interest Rate of the United States. The data analysis with the new method
provides better parameter estimates over the methods featuring ARIMA or independent errors. While
the magnitude of this gain in accuracy is over 40 percent compared to the technique for ARIMA
errors, it is still higher when compared to the method for independent errors. Practitioners in fields
like economics, data analysis, or others would benefit using the INTARIMA model in real world
scenarios whenever they face a situation wherein the regression model has autocorrelated error with
interactions. Thus, this article establishes the proposed method as the most effective approach for
regression models for addressing serially correlated errors with significant interaction effects.

The New Generalized 2Kth-Order Equilibrium Model: Estimation and Characterization

Nuzhat Ahad, S.P. Ahmad, J.A. Reshi — Sun, 28 Jun 2026 00:00:00 +0700

This manuscript introduces a generalization of familiar probability distributions such as Maxwell, Rayleigh, Half-normal, Chi-square, Gamma, Weibull and some related distributions of Rayleigh andMaxwell-Boltzmann distribution. It utilizes the concept of a weighted probability distribution, applying higher-order weighting to more precisely capture and represent complex real-life scenarios. The introduced model is termed as the Generalized equilibrium Maxwell-Boltzmann distribution (GEMBD). Various structural properties and characterizations of the newly proposed model have been derived. The ordering properties of the proposed model are analyzed and compared with those of the base distribution. Parameters are estimated via the maximum likelihood estimation (MLE)
method. A simulation study is conducted using the Anderson-Darling test statistic to assess the asymptotic normality of the MLEs. Additionally, the behaviors of bias and mean square error are observed with the increase in sample size. The applications of new distribution are illustrated through its fitting to two different real life datasets. Ultimately, a comparison is conducted among GEMBD and its sub-models regarding their fit using Information Criterion tools.

Estimation of Finite Population Distribution Function Using Auxiliary Information Under Stratified Random Sampling

Sun, 28 Jun 2026 00:00:00 +0700

This paper introduces a set of estimators designed for the estimation of the distribution function of a finite population under the framework of stratified random sampling. These estimators make use of supplementary information, including the mean of the distribution function and the empirical distribution function of an auxiliary variable. To assess the performance of the proposed estimators, we employ a first-order approximation to analyze their biases and mean squared errors. A comprehensive comparative analysis, both theoretically and numerically, is carried out to contrast these new estimators with adapted distribution function estimators. The results indicate that, in terms of mean squared error and percentage relative efficiency, the proposed estimators outperform the adapted estimators.

Statistical Properties of Area-Biased Geometric Distribution

Sun, 28 Jun 2026 00:00:00 +0700

This article is devoted to a comprehensive exploration into the development of the area-biased geometric distribution and its applications. We present probability mass, distribution, moment generating, probability generating functions and moments. For the parameter estimation procedure, we suggest the Method of Maximum Likelihood and Method of Moments. Interestingly, these two methods provide the same estimator. Asymptotic normality of the parameter is investigated by the famous delta-method and the parameters of the asymptotic normality are derived. Finally, a real life application is provided that shows good performance of the area-biased geometric distribution.

Exploring Improved Inference of Birnbaum-Saunders Distribution Based on Modified Moment Estimation

Waqas Makhdoom, Muahammad Kashif Ali Shah, Nighat Zahra, Syed Ejaz Ahmed — Sun, 28 Jun 2026 00:00:00 +0700

The Birnbuam-Saunders distribution is widely well-known lifetime distribution having shape and scale parameters. In this study, we employed improved estimation methodologies for the both parameters of the Birnbaum-Saunders distribution considering for one parameter and keeping other known. We have used modified moment estimators instead of maximum likelihood estimators of Birnbaum Saunders distribution while integrating uncertain prior information with the sample information. The three improved estimators are considered and their performance is compared with their respective unrestricted modified moment estimator. The Walds test statistics are also suggested to test the available uncertain prior information. The asymptotic theoretical results of the estimators also provided. The performance of the proposed estimators is evaluated on the basis of asymptotic mean square error. The simulation study of the estimators with their graphical presentation are furnished with real life data application. This study concludes that our proposed improved estimators for both cases of parameters have appealing performance.

The SAS Family of Distributions: Properties and Applications

Arun Kaushik, Shantanu Kumar Yadav — Sun, 28 Jun 2026 00:00:00 +0700

In this study, our aim is to introduce a new transformation based on the cumulative distribution function (CDF), known as the SAS-transformation, an acronym derived from the names of its developers: Shantanu, Arun and Shubham. The proposed transformation technique is illustrated using the Toppe-Leone distribution as a baseline. We computed its various statistical properties, including the survival function, hazard rate function, moments, conditional moments, quantile function, mean deviation, order statistics, entropy, stochastic ordering and identifiability. We conduct a simulation study to assess the long-term performance of the model and estimator’s under different classical estimation methods. Finally, to demonstrate the practical applicability of the proposed model, we apply it to three real datasets and assess its performance using the Akaike information criterion (AIC), Corrected Akaike information criterion (AICc), Bayesian information criterion (BIC), and log-likelihood (LL) values.

Generalized Direct Product Type Estimators for Mean of Domain Using Supplementary Variable in Missing Data

Kamlesh Kumar, Shivani Kumari — Sun, 28 Jun 2026 00:00:00 +0700

Several research works for estimating the mean of population in case of missing data have been done but a very few research works for estimating the mean of domain in case of missing data has been found. Domain means estimation in missing data is also very essential in real life. Bhushan et al. (2024a) first have proposed the estimators for mean of domain using imputation techniques in case of missing data. The main aim of the present research paper is to extend the research works of Bhushan et al. (2024a) by proposing the different imputation techniques. So, keeping this point in the view, we have suggested generalized direct product type estimators for estimating the mean of domain using information on supplementary variable in missing data. The estimators bias and mean square error are obtained. Some of the earlier existing estimators are found as the special cases of the proposed estimators and efficiency comparisons of the proposed estimators are carried with the other earlier existing estimators. Simulation studies are conducted to check the usefulness of the estimators.

Improving Process Monitoring with a New Modified EWMA Control Chart for ARMA Models: A Case Study on Palm Oil Prices in Thailand

Sun, 28 Jun 2026 00:00:00 +0700

This investigation was designed to develop precise formulations for calculating the average run length (ARL) of an autoregressive moving average process, with a particular focus on the ARMA(p,q) process. The accuracy of these formulations was assessed by comparing them to the results of the Gauss-Legendre quadrature rule-based numerical integral equation (NIE) approach, which also considered CPU time. Furthermore, the ARL values derived from explicit formulas were compared across control charts that utilized exponentially weighted moving averages (EWMA), modified EWMA (MEWMA), and new modified EWMA (NMEWMA). Performance was assessed using metrics such as the performance comparison index (PCI), average extra quadratic loss (AEQL), and relative mean index (RMI). The results indicated that the NMEWMA control chart was more effective in identifying changes than the EWMA and MEWMA control charts. The efficacy of our explicit formulae technique was further evaluated by comparing the performance of actual data on palm oil prices in Thailand that weighted over 15 kilograms. The NMEWMA control chart outperforms the others by a significant margin, as evidenced by the results of applying the ARL to this data using the explicit formulas.

Transformer-Based Email Classification for Workflow Automation in Small and Medium-Sized Enterprises

Takorn Prexawanprasut — Sun, 28 Jun 2026 00:00:00 +0700

Email remains a primary medium of business communication, yet small and medium-sized enterprises (SMEs) often lack the capacity to adopt enterprise-level solutions, resulting in inefficiencies in handling large volumes of unstructured messages. This study evaluates advanced natural language processing (NLP) techniques for automating email classification and integrating structured outputs into workflow management systems (WMS). A dataset of 12,500 emails collected between January 2021 and December 2024 was categorized into four operational domains—sales, shipping, billing, and transportation—and used to compare three approaches: a keyword-based rule system, classical machine learning classifiers (naïve Bayes, logistic regression, support vector machines, random forest), and transformer-based architectures (BERT, DistilBERT, XLM-R). Performance was assessed using accuracy, precision, recall, and F1-score, with statistical tests applied to confirm significance. Results show that while the rule-based baseline achieved limited performance, and classical models offered moderate improvements, transformer-based methods achieved the highest overall accuracy, with XLM-R surpassing 92%. Importantly, integration of the best-performing model into a prototype WMS demonstrated practical value by enabling real-time classification and extraction of structured information such as invoice numbers, shipment codes, and client identifiers. These findings highlight the potential of transformer-based models to deliver scalable, cost-effective workflow automation for SMEs, reducing manual workload, enhancing efficiency, and improving responsiveness in dynamic business environments.

Designing an Efficient Exponentially Weighted Moving Average Control Chart to Monitor the Mean of a Time-Series Process

Wilasinee Peerajit — Sun, 28 Jun 2026 00:00:00 +0700

This study covers the design of an efficient exponentially weighted moving average (EWMA) control chart for monitoring the mean of a time-series process characterized by long-memory dependence. Specifically, the study is focused on a fractionally integrated seasonal moving average process with an exogenous variable and exponential white noise. The structure of the proposed EWMA control chart is first formulated, followed by an evaluation of its performance based on the average run length (ARL) via a simulation study. An analytical expression for the ARL is derived through explicit formulas obtained by solving integral equations, while an approximated ARL is computed via a numerical integral equation approach. The existence and uniqueness of the analytical ARL solution are theoretically guaranteed using Banach’s fixed-point theorem. The simulation results reveal that both the analytical and approximated ARL methods produce comparable out-of-control ARL values, with similar accuracy. The simulation results reveal that both the analytical and approximate ARL methods produced comparable out-of-control ARL values, exhibiting similar accuracy in terms of detection performance. Nevertheless, the analytical approach based on explicit formulas demonstrated a clear computational advantage: achieving faster processing times while maintaining accuracy, as evidenced by a significantly lower average time-to-signal (ATS) criterion. These findings indicate that the explicit formula performs effectively for detecting shifts in process mean values. Therefore, this approach is recommended for practical applications. An illustrative example using real-world data further confirms the effectiveness and applicability of the proposed analytical ARL method in time-series process monitoring.

Investigating Lack of Trust in Quantitative Optional Randomized Response Models Using Double Responses

Sayeda Tahira Nazar, Zawar Hussain — Sun, 28 Jun 2026 00:00:00 +0700

Social desirability bias (SDB) frequently causes low response rates or, at worst, dishonest responses. The randomized response technique (RRT) is an effective surveying technique to reduce the SDB. Respondents can submit a scrambled response using RRT to get around SDB. A scrambling model is presented that takes the respondents’ lack of faith into account. We introduce an improved optional enhanced trust (OET) quantitative RRT model using double responses to estimate the mean and sensitivity of a sensitive attribute. To compare the empirical mean and variance of our suggested estimators with their corresponding theoretical values, a simulation study is done using a combined measure of privacy and model effectiveness. In comparison to the existing models, the performance evaluation of the proposed model is observed to be better. Furthermore, using the measure of privacy protection and efficiency given by Azeem (2023b), the comparison of the proposed model with the previous well-known RRT models is given in tabular and graphical forms. The proposed model is found to be more effective, more protective of privacy, and more efficient as compared to the existing models.

Optimization of Imbalanced Tuberculosis Data Classification Using Cost-Sensitive Binary Logistic Regression

Sun, 28 Jun 2026 00:00:00 +0700

Tuberculosis (TB) remains a major public health challenge in Indonesia, particularly in urban areas. This study aims to optimize the classification of TB case predictions by comparing three binary logistic regression approachesard binary logistic regression, cost-sensitive binary logistic regression, and SMOTE-based binary logistic regression. The dataset consists of 5,180 patient samples obtained from a health foundation. Initial analysis reveals a significant class imbalance, with TB negative cases dominating the data, while TB-positive cases are relatively scarce. The standard binary logistic regression model demonstrates weak predictive performance for positive cases; out of 195 TB-positive cases, only 4 were correctly identified, while 191 were misclassified as negative, posing a high risk in real-world implementation.

Conversely, the cost-sensitive binary logistic regression approach assigns higher weights to the minority class to reduce bias caused by class imbalance. The class weights are determined based on the inverse class frequency using the formula Based on the distribution of the training dataset, which consists of 3.175 negative cases and 451 positive cases, the resulting weights are approximately and The application of this weighting scheme improves the model's ability to detect positive cases, with 76 cases correctly classified, particularly in the context of low public disclosure regarding health conditions. The SMOTE-based binary logistic regression model achieves a higher recall, detecting 82 positive cases; however, the use of synthetic data introduces potential concerns regarding predictive validity. Overall, the cost-sensitive model achieved a recall of 39%, an F1-score of 32%, and an overall accuracy of 79%, with higher AUC-ROC and AUC-PR values compared to the baseline model. Although the improvement in recall remains moderate at 39%, the cost-sensitive approach shows potential in enhancing the model’s ability to detect positive cases. Therefore, this approach may be considered as a supporting method in efforts to improve more targeted TB control strategies in Indonesia.

Machine Learning Models to Forecast Rice Prices at the Milling Level According to Quality

Andika Putri Ratnasari, Kismiantini — Sun, 28 Jun 2026 00:00:00 +0700

Machine learning methods have the ability to model nonlinear time series data. Support vector regression (SVR) and double random forest (DRF) are supervised learning techniques that can be applied to such data. Since a comparative analysis of SVR and DRF for rice price forecasting has not been conducted in previous studies, this research aims to compare the performance of these two models in analyzing nonlinear rice price dynamics. The dataset consists of monthly milling-level rice prices for premium, medium, and out-of-quality categories, all of which exhibit nonlinear characteristics. Model performance was evaluated using three forecasting accuracy metrics: mean absolute percentage error (MAPE), mean absolute error (MAE), and root mean square error (RMSE). The results show that the MAPE values for all models are below 10%, indicating high predictive accuracy. Across all evaluation metrics, the SVR model consistently outperformed the DRF model. Therefore, SVR was selected as the best model to generate forecasts for premium, medium, and out of quality rice prices.

Pair Copula Construction for Dependencies Structure in Cryptocurrencies Trading Volumes

Friday I. Agu — Sun, 28 Jun 2026 00:00:00 +0700

Understanding how cryptocurrency trading volumes co-move remains challenging, especially in high dimensions where bivariate models fall short. Exploring these dependencies aids risk management, diversification, and policy oversight. This study investigates monthly trading volumes for Binance Coin (BNB), Bitcoin (BTC), TRON (TRX), Ethereum (ETH), and Dogecoin (DOGE) from October 2017 to July 2021. Monthly aggregation dampens microstructure noise and short-lived bursts that distort daily data, while providing far more observations and modeling stability than annual series, which are too coarse for reliable vine estimation. After transforming margins to uniform, we model dependence using pair-copula constructions within D-vine, C-vine, and R-vine frameworks, fitting Student’s t, Clayton, Gumbel, and Frank copulas via sequential estimation. Log-likelihood, BIC evaluates model fit, and dependence is summarized via Kendall’s τ. We also assess trading volume volatility using the mean and standard deviation of monthly changes. Across vine structures, the Gumbel copula consistently provides the best fit, indicating pronounced upper-tail dependence among cryptocurrencies. Conditioning reduces τ at higher vine trees, showing that conditional links capture much of the residual dependence. BTC exhibits the highest volatility risk, followed by DOGE, ETH, TRX, and BNB. These results support stress-testing for simultaneous surges, inform portfolio construction, diversification with lower-dependence pairs, and offer practical guidance for traders and regulators monitoring systemic risk.

Modeling of Cancer Patients Data Using Burr-X Log Logistic

Kanak Modi — Sun, 28 Jun 2026 00:00:00 +0700

The Burr-X Log Logistic probability distribution with three parameters is established with application. The projected distribution has a unimodal and bathtub shaped density function and hazard rate function with inverted bathtub shape. We considered its statistical properties to interpret about the nature of proposed distribution. Plots for its density function and survival function are drawn for different combination of parameters. Order statistics distribution and corresponding moment for proposed distribution is also considered. Rényi entropy and Probability weighted moments are calculated. Maximum likelihood estimation technique is applied to compute parameter estimates. We executed a simulation study to compare performance of the estimators. We apply derived distribution on three real datasets and results show that it provides better fit than some existing distributions.