1、maxent原版英文说明maxent原版英文说明RESM 575 Spatial AnalysisSpring 2010Lab 6 Maximum EntropyAssigned: Monday March 1Due: Monday March 820 pointsThis lab exercise was primarily written by Steven Phillips, Miro Dudik and Rob Schapire, with support from AT&T Labs-Research, Princeton University, and the Center for
2、 Biodiversity and Conservation, American Museum of Natural History. This lab exercise is based on their paper and data: Steven J. Phillips, Robert P. Anderson, Robert E. Schapire. Maximum entropy modeling of species geographic distributions. Ecological Modelling, Vol 190/3-4 pp 231-259, 2006.My goal
3、 is to give you a basic introduction to use of the MaxEnt program for maximum entropy modeling of species geographic distributions. The environmental data consist of climatic and elevational data for South America, together with a potential vegetation layer. The sample species the authors used will
4、be Bradypus variegatus, the brown-throated three-toed sloth. NOTE on the Maxent softwareThe software consists of a jar file, maxent.jar, which can be used on any computer running Java version 1.4 or later. It can be downloaded, along with associated literature, from www.cs.princeton.edu/schapire/max
5、ent. If you are using Microsoft Windows (as we assume here), you should also download the file maxent.bat, and save it in the same directory as maxent.jar. The website has a file called “readme.txt”, which contains instructions for installing the program on your computer.The software has already bee
6、n downloaded and installed on the machines in 317 Percival.First go to the class website and download the maxent-tutorial-data.zip file. Extract it to the c:/temp folder which will create a c:/temp/tutorial-data directory.Find the maxent directory on the c:/ drive of your computer and simply click o
7、n the file maxent.bat. The following screen will appear:To perform a run, you need to supply a file containing presence localities (“samples”), a directory containing environmental variables, and an output directory. In our case, the presence localities are in the file “c:temptutorial-datasamplesbra
8、dypus.csv”, the environmental layers are in the directory “layers”, and the outputs are going to go in the directory “outputs”. You can enter these locations by hand, or browse for them. While browsing for the environmental variables, remember that you are looking for the directory that contains the
9、m you dont need to browse down to the files in the directory. After entering or browsing for the files for Bradypus, the program looks like this:The file “samplesbradypus.csv” contains the presence localities in .csv format. The first few lines are as follows:species,longitude,latitudebradypus_varie
10、gatus,-65.4,-10.3833bradypus_variegatus,-65.3833,-10.3833bradypus_variegatus,-65.1333,-16.8bradypus_variegatus,-63.6667,-17.45bradypus_variegatus,-63.85,-17.4There can be multiple species in the same samples file, in which case more species would appear in the panel, along with Bradypus. Other coord
11、inate systems can be used, other than latitude and longitude, as long as the samples file and environmental layers use the same coordinate system. The “x” coordinate should come before the “y” coordinate in the samples file.The directory “layers” contains a number of ascii raster grids (in ESRIs .as
12、c format), each of which describes an environmental variable. The grids must all have the same geographic bounds and cell size. MAKE SURE YOUR ASCII FILES HAVE THE .asc EXTENSION! One of our variables, “ecoreg”, is a categorical variable describing potential vegetation classes. You must tell the pro
13、gram which variables are categorical, as has been done in the picture above.Doing a runSimply press the “Run” button. A progress monitor describes the steps being taken. After the environmental layers are loaded and some initialization is done, progress towards training of the maxent model is shown
14、like this:The “gain” starts at 0 and increases towards an asymptote during the run. Maxent is a maximum-likelihood method, and what it is generating is a probability distribution over pixels in the grid. Note that it isnt calculating “probability of occurrence” its probabilities are typically very s
15、mall values, as they must sum to 1 over the whole grid. The gain is a measure of the likelihood of the samples; for example, if the gain is 2, it means that the average sample likelihood is exp(2) 7.4 times higher than that of a random background pixel. The uniform distribution has gain 0, so you ca
16、n interpret the gain as representing how much better the distribution fits the sample points than the uniform distribution does. The gain is closely related to “deviance”, as used in statistics.The run produces a number of output files, of which the most important is an html file called “bradypus.ht
17、ml”. Part of this file gives pointers to the other outputs, like this:Looking at a predictionTo see what other (more interesting) content there can be in c:temptutorial-dataoutpusbradpus_variegatus.html, we will turn on a couple of options and rerun the model. Press the “Make pictures of predictions
18、” button, then click on “Settings”, and type “25” in the “Random test percentage” entry. Lastly, press the “Run” button again. You may have to say “Replace All” for this new run. After the run completes, the file bradypus.html contains this picture:The image uses colors to show prediction strength,
19、with red indicating strong prediction of suitable conditions for the species, yellow indicating weak prediction of suitable conditions, and blue indicating very unsuitable conditions. For Bradypus, we see strong prediction through most of lowland Central America, wet lowland areas of northwestern So
20、uth America, the Amazon basin, Caribean islands, and much of the Atlantic forests in south-eastern Brazil. The file pointed to is an image file (.png) that you can just click on (in Windows) or open in most image processing software. The test points are a random sample taken from the species presenc
21、e localities. Test data can alternatively be provided in a separate file, by typing the name of a “Test sample file” in the Settings panel. The test sample file can have test localities for multiple species.Statistical analysisThe “25” we entered for “random test percentage” told the program to rand
22、omly set aside 25% of the sample records for testing. This allows the program to do some simple statistical analysis. It plots (testing and training) omission against threshold, and predicted area against threshold, as well as the receiver operating curve show below. The area under the ROC curve (AU
23、C) is shown here, and if test data are available, the standard error of the AUC on the test data is given later on in the web page.A second kind of statistical analysis that is automatically done if test data are available is a test of the statistical significance of the prediction, using a binomial
24、 test of omission. For Bradypus, this gives:Which variables matter?To get a sense of which variables are most important in the model, we can run a jackknife test, by selecting the “Do jackknife to measure variable important” checkbox . When we press the “Run” button again, a number of models get cre
25、ated. Each variable is excluded in turn, and a model created with the remaining variables. Then a model is created using each variable in isolation. In addition, a model is created using all variables, as before. The results of the jackknife appear in the “bradypus.html” files in three bar charts, a
26、nd the first of these is shown below.We see that if Maxent uses only pre6190_l1 (average January rainfall) it achieves almost no gain, so that variable is not (by itself) a good predictor of the distribution of Bradypus. On the other hand, October rainfall (pre6190_l10) is a much better predictor. T
27、urning to the lighter blue bars, it appears that no variable has a lot of useful information that is not already contained in the others, as omitting each one in turn did not decrease the training gain much.The bradypus_variegatus.html file has two more jackknife plots, using test gain and AUC in pl
28、ace of training gain. This allows the importance of each variable to be measure both in terms of the model fit on training data, and its predictive ability on test data.How does the prediction depend on the variables?Now press the “Create response curves”, deselect the jackknife option, and rerun th
29、e model. This results in the following section being added to the “bradypus_variegatus.html” file:Each of the thumbnail images can be clicked on to get a more detailed plot. Looking at frs6190_ann, we see that the response is highest for frs6190_ann = 0, and is fairly high for values of frs6190_ann
30、below about 75. Beyond that point, the response drops off sharply, reaching -50 at the top of the variables range.So what do the values on the y-axis mean? The maxent model is an exponential model, which means that the probability assigned to a pixel is proportional to the exponential of some additi
31、ve combination of the variables. The response curve above shows the contribution of frs6190_ann to the exponent. A difference of 50 in the exponent is huge, so the plot for frs6190_ann shows a very strong drop in predicted suitability for large values of the variable.On a technical note, if we are m
32、odeling interactions between variables (by using product features) as we are for Bradypus here, then the response curve for one variable will depend on the settings of other variables. In this case, the response curves generated by the program have all other variables set to their mean on the set of presence localities.Note also that if the environmental variables are correlated, as they are here, the response curves can be misleading. If two closely correlated variables have strong response curves that are near opposites
copyright@ 2008-2022 冰豆网网站版权所有
经营许可证编号:鄂ICP备2022015515号-1