GIS-based infectious disease mapping: A case study of hotspots of dengue virus in Ho Chi Minh City, Vietnam

Background : Over the years, the incidences of dengue fever (DF) and dengue hemorrhagic fever (DHF) reported in Vietnam have been on the rise. Therefore, the objective of the study is to investigate the use of a Geographic Information System (GIS) in infectiou disease mapping. Methods: Histogram and Getis and Ord’s statistic will be employed to analyze the spatial distribution of DHF. More specifically, histogram will be firstly used to study the distribution of DHF. Getis Ord’s statistic-based spatial autocorrelation analysis will be then applied to detect spatial distribution of hotspots of DHF. Spatial distribution of DHF hotspot will be mapped with the help of a GIS. Finally, hotspot of DHF in August of 2023 in Ho Chi Minh city will be discussed. Results: it was found that a total of 3 hotspots and 1 coldspot was successfully detected. Three hotspots of DHF incidence were identified in the most dense and intense of the city's residential zoning districts in the west and south of the city including District 4 (11 cases/100,000 people), Binh Chanh (16 cases/100,000 people), Can Gio (5 cases/100,000 people). Whereas, a coldspot was also detected in Tan Phu (24 cases/100,000 people), a district in the city


Introduction
Vector borne diseases are the most common worldwide health hazard and represent a constant and serious risk to a large part of the world's population (1).Among these, dengue fever especially is sweeping the world in majority of the tropical and arid zones.It is transmitted to the man by the mosquito of the genus Aedes and exists in two forms: the Dengue Fever (DF) or classic dengue and the Dengue Haemorrhagic Fever (DHF), which may evolve into Dengue Shock Syndrome (DSS) (2).Dengue infection occurs due to the bite of the mosquito Aedes aegypti, that is infected with one of the four dengue virus serotypes (3).The infection, earlier restricted to urban/semi-urban centres, can now be seen in rural areas as well (4).Therefore, analysis of hotspot of DF/DHF plays an important role in the prevention of the spread of DF/DHF.Geographic Information Systems (GIS) is a suite of software tools for mapping and analyzing data which is georeferenced (assigned a specific location on the surface of the Earth.GIS can be used to detect geographic patterns in other data, such as disease clusters resulting from toxins, sub-optimal water access, etc. GIS tools can map and visualize the relationship between location coordinates.Therefore, a GIS has mainly been used to study natural resouces (5), environment (6) and climate change (7).In addition, many studies utilizing a GIS in socioeconomic studies were also reported (8,9).With the wide range of applications of GIS, it has been utilized in studies of medicine (10), epidemiology (11), and health science (12).For example, Voronoi is the commonly used interpolation method in COVID-19 studies.When identifying high-risk areas in states in India, a combination of GIS-based Voronoi diagrams and Bayesian probabilistic modeling were used to investigate the connection between COVID-19 cases and density of the (13).GIS can be used for COVID-19 mapping using of hotspots and spatial clustering analyses.For instance, Moran's I and Getis-Ord   * statistics were used to examine spatio-temporal clustering patterns and to identify sociodemographic factors associated with COVID-19 infections in Helsinki, Finland (14).(14).With the help of global and local Moran's I statistics in the analysis of the spatio-temporal COVID-19 transmission and its influencing factors, a study in China revealed that the global and local spatial correlation characteristics of the epidemic distribution were positively correlated (15).In Vietnam, a study on the spatiotemporal distribution of COVID-19 in Vietnam over the first seven months of the outbreak was carried out by means of the local Moran's I statistic, where this study found a spatial cluster in Vinh Phuc province's initial phase (16).Later, using a dataset of 10,742 locally transmitted cases collected from four COVID-19 waves in 63 prefecture-level cities and provinces in Vietnam, the local Moran's I spatial statistic and Moran scatterplot were also successfully used to identify high-high and low-low clusters and low-high and high-low outliers of COVID-19 cases (17).The exploratory spatial data analysis and the geodetector method was employed to analyze the spatial and temporal differentiation characteristics and the influencing factors of the COVID-19 epidemic spread in mainland China based on the cumulative confirmed cases, average temperature, and socio-economic data (15).The global (Moran's I) and local indicators of spatial autocorrelation (LISA), both univariate and bivariate, were successfully applied to derive significant clustering of COVID-19 pandemic (18).The global Moran's I statistic and the retrospective space-time scan statistic were also successfully used to analyze spatio-temporal clusters of COVID-19 (19).In addition, GIS has been also widely employed to map the COVID-19 vulnerability.For instance, in Palestine, the COVID-19 vulnerability map for the West Bank was successfully developed using the combination of Analytic Hierarchy Process, GIS, multi-criteria decision analysis and some selected potential criteria including population, population density, elderly population, accommodation and food service activities, school students, chronic diseases, hospital beds, health insurance, and pharmacy (20).In India, , through geographic information system, attempts were also made to model the COVID-19 vulnerability using an integrated fuzzy multi-criteria decision-making approach, namely fuzzy-analytical hierarchy process and fuzzy-technique for order preference by similarity to ideal solution for West Bengal (21).Also with the help of GIS, the analysis of vulnerability to COVID-19 occurrence was also successfully carried out in many studies in other contries such as in the United States (22), Ethiopia (23), Algeria (24), and México (25).
This study aims to use GIS to investigate the spatial distribution of DHF hotspot in August of 2023, in Ho Chi Minh city, Vietnam.Histogram and Getis and Ord's statistic will be employed to analyze the spatial distribution of DHF.More specifically, histogram will be firstly used to study the distribution of DHF.Getis Ord's statistic-based spatial autocorrelation analysis will be then applied to detect spatial distribution of hotspots of DHF.Spatial distribution of DHF hotspot will be mapped with the help of a GIS.Finally, hotspot of DHF in August of 2023 in Ho Chi Minh city will be discussed.

Data used
Dengue Fever and Dengue Haemorrhagic Fever has become a major international public health concern.Many countries/areas in Asia such as Vietnam have been experiencing unusually high levels of dengue/dengue haemorrhagic fever activity.In Vietnam, dengue hemorrhagic fever was first identified in 1963 in the Mekong Delta region of southern Vietnam.Dengue is one of 24 infectious diseases in Vietnam for which there is monthly mandatory reporting from all administrative levels, which, in increasing size, are communes, districts, provinces, and regions.In this study, a dataset of DHF incidence collected in August of 2023 was employed to study the hotspots of DHF incidence in Ho Chi Minh city.DHF incidence in August 2023 was collected from the website of HCMC Center for Disease Control (HCDC).

Histogram
A histogram is a graph that shows the frequency of numerical data using rectangles.The height of a rectangle (the vertical axis) represents the distribution frequency of a variable (the amount, or how often that variable appears).The width of the rectangle (horizontal axis) represents the value of the variable.Histograms are good for showing general distributional features of dataset variables.It can be seen roughly where the peaks of the distribution are, whether the distribution is skewed or symmetric, and if there are any outliers.A histogram is symmetric if, when the two sides are identical in shape and size.A skewed histogram is one with a long tail extending to either the right or the left.The former is called positively skewed, and the latter is called negatively skewed as shown in Figure 1.

Hotspot analysis
A hotspot can be defined as an area that has higher concentration of events compared to the expected number given a random distribution of events (26).A hotspot is defined as a condition indicating some form of clustering in a spatial distribution (27).Hotspot analysis is based on the Getis-Ord's   * statistic.Hotspot analysis characterizes the presence of hotspots (high clustered values) and coldspots (low clustered values) over an entire area by looking at each feature within the context of its neighboring features (26).Hotspot can separate clusters of high values from cluster of low values.It is, therefore, Getis-Ord's   * statistic was used to identify the counties of high and low numbers of DHF infection rates (17,28).The form of Getis-Ord's   * statistic is defined as follows (29): with: and: For the pattern analysis tools, it is the probability that the observed spatial pattern was created by some random process.When the p-value is very small, it means it is very unlikely (small probability) that the observed spatial pattern is the result of random processes, so the null hypothesis is rejected.Z-scores are standard deviations.If, for example, a tool returns a z-score of +2.5, this means the result is 2.5 standard deviations.Both z-scores and p-values are associated with the standard normal distribution as shown in Figure 2. Very high or very low (negative) z-scores, associated with very small p-values, are found in the tails of the normal distribution.When a feature pattern analysis tool is used and it yields small p-values and either a very high or a very low z-score, this indicates it is unlikely that the observed spatial pattern reflects the theoretical random pattern represented by the null hypothesis.

GIS-based mapping
A geographic information system (GIS) is a system designed to capture, store, manipulate, analyze, manage, and present all types of geographical data.The key word to this technology is Geography -this means that some portion of the data is spatial.In other words, data that is in some way referenced to locations on the earth.GIS can be used as tool in both problem solving and decision making processes, as well as for visualization of data in a spatial environment.Geospatial data can be analyzed to determine (1) the location of features and relationships to other features, (2) where the most and/or least of some feature exists, (3) the density of features in a given space, ( 4) what is happening inside an area of interest (AOI), ( 5) what is happening nearby some feature or phenomenon, and ( 6) and how a specific area has changed over time (and in what way).One of the main advantages of GIS is that mapping the spatial location of real-world features can be carried out to visualize the spatial relationships among them.Therefore, in this study, GIS is employed to map the spatial distribution of DHF infection rates and Getis Ord's statistic.

Distribution of DHF incidence
In Vietnam, dengue hemorrhagic fever was first identified in 1963 in the Mekong Delta region of southern Vietnam (31).Between 1963 and 1995, Vietnam reported 1,518,808 dengue hemorrhagic fever cases and 14,133 deaths.Data from the dengue surveillance program in southern Vietnam show epidemic peaks of increasing magnitude occurring approximately every 5 years between 1975 and 1987, with a longer gap of 11 years preceding a large epidemic of 119,429 dengue hemorrhagic fever (DHF) cases and 342 fatalities in 1998 (32).It is not known whether age-related trends in dengue incidence in Vietnam are following a similar pattern to Thailand, where a striking increase in the average age of dengue cases has been observed (33).
The distribution of DHF incidence was shown in the map and histogram as shown in Figure 3. Data from Figure 3 (left) shows the spatial distribution of DHF infection rates in August 2023 in Ho Chi Minh City.It can be seen that high and very high DHF infection rates were mainly concentrated in urban districts in the city center, whereas very low and low DHF infection rates appeared in the southern districts, the north of the city, respectively.Data in Figure 3 (right) shows a fairly even distribution of DHF infection rates on the left and right sides of the histogram.The histogram of DHF incidence illustrates that the DHF infection rate was mainly concentrated in the center of the histogram corresponding with frequency ranging from 14.8 to 18.1 cases/100,000 people.In August, Getis Ord's statistic detected 3 hotspots and 1 coldspot.Three hotspots of DHF incidence were identified in the west and south of the city in the districts of District 4 (11 cases/100,000 people), Binh Chanh (16 cases/100,000 people), Can Gio (5 cases/100,000 people).Whereas, a coldspot was also detected in Tan Phu (24 cases/100,000 people), and a district in the city center.Comparing with those obtained in July 2023, the hotspot of DHF incidence tends to move to the west and south of the city.In August 2023, some districts had high DHF infection rates (over 20 cases/100,000 people) but no hot spots were detected in these districts, including Nha Be (28 cases/100,000 people), Quan 1 (26 cases/100,000 people), Quan 7 (24 cases/100,000 people), Tan Phu (24 cases/100,000 people) and Binh Thanh (21 cases/100,000 people).The explanation for not detecting hot spots in these districts is due to the local migration of districts with high infection rates.

Conclusion
The aim of the study is to investigate the use of a Geographic Information System (GIS) in infectiou disease mapping.
Histogram and Getis and Ord's statistic will be employed to analyze the spatial distribution of DHF.More specifically, histogram will be firstly used to study the distribution of DHF.Getis Ord's statistic-based spatial autocorrelation analysis will be then applied to detect spatial distribution of hotspots of DHF.Spatial distribution of DHF hotspot will be mapped with the help of a GIS.Finally, hotspot of DHF in August of 2023 in Ho Chi Minh city will be discussed.The study results showed that a of 3 hotspots and 1 coldspot was successfully detected in August 2023.Three hotspots of DHF incidence were identified in the most dense and intense of the city's residential zoning districts in the west and south of the city including District 4 (11 cases/100,000 people), Binh Chanh (16 cases/100,000 people), Can Gio (5 cases/100,000 people).Whereas, a coldspot was also detected in Tan Phu (24 cases/100,000 people), a district in the city center.It can be concluded that the results of this study support the use of GIS for infectious disease mapping of DHF incidence.These findings implicate the DHF spread is influenced by human activities and linked with population density.

Figure 2
Figure 2 The distribution of significance level (p-values) and z-score Most statistical tests begin by identifying a null hypothesis.The null hypothesis for the pattern analysis tools (Analyzing Patterns toolset and Mapping Clusters toolset) is Complete Spatial Randomness (CSR), either of the features themselves or of the values associated with those features.The z-scores and p-values returned by the pattern analysis tools.The pvalue is a probability.For the pattern analysis tools, it is the probability that the observed spatial pattern was created by some random process.When the p-value is very small, it means it is very unlikely (small probability) that the observed spatial pattern is the result of random processes, so the null hypothesis is rejected.Z-scores are standard deviations.If, for example, a tool returns a z-score of +2.5, this means the result is 2.5 standard deviations.Both z-scores and p-values are associated with the standard normal distribution as shown in Figure2.Very high or very low (negative) z-scores, associated with very small p-values, are found in the tails of the normal distribution.When a feature pattern analysis tool is used and it yields small p-values and either a very high or a very low z-score, this indicates it is unlikely that the observed spatial pattern reflects the theoretical random pattern represented by the null hypothesis.

Figure 4
Figure 4 Natural break map of Getis and Ord statistic

Figure 5
Figure 5Hotspot and coldspot maps of dengue hemorrhagic fever incidence Data from Figure5(right) illustrate the spatial distribution of the statistical significance of the Getis Ord statistic for each district obtained in August 2023 in Ho Chi Minh City.It can be seen that levels of statistical significance were also presented on 4 different scales ranging from unsignificance (p-value > 0.05) and significance at the levels of 0.05, 0.01 and 0.001, respectively.Data from Figure5(right) demonstrates the spatial distribution of p-value.It can be seen from Figure5(right) that all four hotspots and cold spot were statistically significant at the level of 0.05.No hotspots and coldspots were detected with statistical significance at the high (0.01) and very high (0.001) levels.