The Expertise Behind the Software

When choosing Locus, you can be confident that your EHS software is built and supported by the experts. Our team holds degrees and certifications in environmental engineering, mathematics, computer science, and beyond. We understand the challenges of EHS compliance and build our solutions with those in mind.

Locus Technologies Experts Behind the Software

Have Questions? Contact us to learn more.

    Name

    Company Email

    Phone

    Tell us about your company's needs

    Locus is committed to preserving your privacy.

    A Visualization is Worth a Thousand Data Points

    Visualize environmental data with Locus EIM.

    You’ve probably heard the saying “A picture is worth a thousand words”. While the advice seems timeless, it actually is fairly modern and started with newspaper advertisements from the 1910s. Furthermore, it’s only since the 1970s that cognitive science has caught up and determined the truth in the saying. Basically, humans have very limited working memory, which is the “storage space” for processing data while making decisions and reasoning through problems. A good picture, though, works as “offline storage” that lets you push information out of your limited working memory and into another format for use as needed. This advantage is especially true when the picture is a useful data visualization such as a chart or map. In this case, you could say “A visualization is worth a thousand data points”.

    How limited is working memory? There is a rough consensus, known as Miller’s law, that you can only have “seven, plus or minus two” items in memory at one time. Think of a typical 10-digit phone number that you may need to memorize for a short period. It can be hard to remember all ten individual digits as one large number, as that exceeds working memory. However, you can employ a technique called “chunking” to group items together, reducing the number of items to remember. If you group the phone numbers into the typical ###-###-#### pattern, you only have to remember 3 chunks of 3 to 4 items. A good visualization not only stores information offline, reducing pressure on your brain; it also groups many data items into a much smaller number of chunks so you can process the data more efficiently.

    Let’s look at some real examples of how visualizations help by working through a typical scenario using EIM, Locus Technologies’ cloud-based application for environmental data management. Assume you manage a site where you are tracking tritium (H-3) levels in groundwater using a set of monitoring wells. You want to know where tritium has been high over the past ten years. EIM provides different visualizations for exploring your data and finding the answers you need.

    First let’s just look at an export of all the data. Using the analysis functions in EIM, you search for all tritium concentrations from monitoring wells for the past ten years. EIM sends the results to a table as shown in Figure 1.

    Tabular view of Tritium query results in Locus EIM

    Figure 1 Tabular view of Tritium query results

    The table has 717 results for multiple wells. It is very difficult to see overall patterns here, either spatially or temporally. Each of the 717 results is one item, and if you try to scroll and sort the table to see if tritium is increasing or decreasing over time, your working memory is quickly overwhelmed. This is where a good data visualization can help.

    To start, you decide to send the data to the Locus GIS+ application, using the graduated color and size options. The GIS+ takes the concentrations from the results table and plots them on a site map using the stored coordinates for each well, as shown in Figure 2. The map represents each location with a symbol that is colored and sized to reflect the actual maximum value at that location. The map legend shows you how this was done. Large red circles, for example, represent results from 4,500 to 7,000 pCi/L. As the sizes get smaller, and the colors go from red to blue, the actual result gets smaller.

    Graduated symbol and color map in Locus EIM

    Figure 2 Graduated symbol and color map of tritium concentrations

    This map is great for showing spatial patterns in the data. You can easily pick out a couple of “areas of concern” near the center of the map – one with orange and yellow circles, and another with red circles. To revisit our discussion on working memory and chunks, the map takes the 717 results and summarizes them so your brain can quickly pick out the two areas of concern.

    Let’s look more closely at the area of concern with higher results. If we zoom in on the map, we see the two red locations are wells MCOI-5 and MCOI-6 as shown in Figure 2.

    Zoomed map for one area of concern in Locus GIS+

    Figure 3 Zoomed map for one area of concern

    The map shows you where these two high concentrations of tritium are located. But what if you want to see how the concentrations vary over time? You can make a time series chart in EIM for these wells and include a desired regulatory limit, as shown in Figure 4. The green and blue lines represent the tritium concentrations over time for the two wells. The red line at top shows a regulatory action limit.

    Line chart in Locus EIM

    Figure 4 Line chart showing time series for tritium for two wells, with action limit

    The chart shows you two important things. First, and most importantly, all the tritium concentrations for both wells lie well below the regulatory action limit! Second, the concentrations have very different trends for the two wells: MCOI-6 started higher but has trended lower, while MCOI-5 started below MCOI-6 but has now surpassed it. You can confirm these general impressions by running concentration regression charts in EIM for the two locations, as shown in Figure 5. The charts show the best fit regression line and the strength of the relation.

    Regression chart in Locus EIM Regression chart in Locus EIM

    Figure 5 Concentration regression charts in EIM

    You can grasp these facts quickly because the of how the chart works. Each series of concentrations for a well consists of multiple data items that are ‘chunked’ into one line on the chart. There are two many individual data points on this chart for your working memory, but only three lines, which can easily be manipulated in your brain. For comparison, Figure 6 shows the actual data values for the chart. The time trends shown above in the charts are not as obvious from the table.

    Data values in Locus EIM

    Figure 6 Actual data values for the chart in Figure 4

    Now, this might be counter-intuitive, but what if you wanted to put some of these values on the map? While visualizations do help understand data, sometimes it can be useful to have the data shown as well so viewers can see where the visualizations came from. The EIM Data Callouts function can do this. Figure 7 shows data callouts for the two wells. Each callout shows the maximum annual tritium result for 2010-2020. Now you have the actual tritium concentrations located spatially next to the matching wells!

    Data callouts in Locus GIS+

    Figure 7 Data Callouts in EIM GIS+

    Now that you know where your tritium might be a concern, suppose you want to see what’s going on with groundwater at your site. The EIM contouring module does that for you. There are multiple contouring options, but for this example let’s use the default options for kriging. We know from Figure 2 that the wells MCOI-5 and MCOI-6 are located in the Mortandad Canyon. Figure 8 shows the contouring map generated from EIM for the groundwater wells in that canyon, using the most recent groundwater levels. Higher groundwater values are lighter in color than lower values.

    The area of concern is marked with an arrow at upper left. The contour lines and values can help you determine how the tritium might migrate in your site. Imagine trying to picture this just using tables of groundwater readings! With the contour map, the readings turn into lines that can be chunked together for analysis: the higher levels at the upper left forming a “plateau”, the closely packed lines moving across the map to the east, and then the “saddle” area at lower right. These different line patterns carry particular meanings to engineers and scientists who interpret contour maps.

    Contour map for groundwater in Locus GIS+

    Figure 8 Contour map for groundwater levels

    The contour map completes our tour of some of the visualization tools in EIM. Because visualizations let you chunk items together, you can look at the ‘big picture” and not get lost in tables of data results. Your working memory stays within its capacity, your analysis of the information becomes more efficient, and you can gain new insights into your data.

    Acknowledgments: All the data in EIM used in the examples was obtained from the publicly available chemical datasets online at Intellus New Mexico.

    [sc_button link=”https://www.locustec.com/applications/environmental-information-management/” text=”Learn more about Locus EIM” link_target=”_self” color=”#ffffff” background_color=”#52a6ea” centered=”1″]

     

    [sc_image width=”150″ height=”150″ src=”16303″ style=”11″ position=”centered” disable_lightbox=”1″ alt=”Dr. Todd Pierce”]

    About the Author—Dr. Todd Pierce, Locus Technologies

    Dr. Pierce manages a team of programmers tasked with development and implementation of Locus’ EIM application, which lets users manage their environmental data in the cloud using Software-as-a-Service technology. Dr. Pierce is also directly responsible for research and development of Locus’ GIS (geographic information systems) and visualization tools for mapping analytical and subsurface data. Dr. Pierce earned his GIS Professional (GISP) certification in 2010.

    Locus Technologies receives prestigious EBJ Award for 14 consecutive years

    Environmental Business Journal (EBJ) recognized the firm for growth and innovation in the field of Information Technology

    MOUNTAIN VIEW, Calif., 10 February 2020

    Locus Technologies, leading provider of environmental management and EHS software, was awarded a 14th consecutive award from Environmental Business Journal (EBJ) for growth and innovation in the field of Information Technology.

    EBJ is a business research publication providing strategic business intelligence to the environmental industry. Locus received the 2019 EBJ Award for Information Technology by expanding their software and services.

    Among the key drivers for Locus in 2019 was the growth of key software applications for waste and sustainability, as well as the introduction of their facilities management app. Locus software also now further integrates with EPA compliance systems like CMDP, eManifest, and eGGRT. Finally, in terms of services, Locus achieved over 500 GHG verifications under the California AB32 program, being the first company to do so. They were also among the first independent bodies to become certified for the new California Low Carbon Fuel Standard verification.

    “We would like to express our gratitude for receiving the EBJ Information Technology award for another year. We look forward to providing our customers with cutting-edge software and services as we seek to improve in the areas of artificial intelligence, IoT integration, and blockchain technology,” said Wes Hawthorne, President of Locus Technologies.

    How to extend your EHS software with integrated systems

    Integration with other systems, whether on-premises or in the cloud, has become a key wishlist item for many EHS software buyers. It allows you to take advantage of other tools used by your organization (or available from third parties) to simplify processes, access information, and enhance communication, both internally and externally.

    Why Companies Replace Their EHS&S Software Systems

    A recent NAEM study explored the main reasons EHS&S professionals look to replace their current software configuration. Among the most reported issues were overall performance, customer support, and software customization. The following infographic highlights both why EHS&S professionals are seeking new software, and wheat criteria are most important in shopping for a new software system.

    locus_infographic_why-companies-replace-software-1

    [sc_button link=”/why-locus/” text=”See why others choose Locus” link_target=”_self” color=”#FFFFFF” background_color=”#52a6ea” centered=”1″]

    Top 10 Enhancements to Locus Environmental Software in 2019

    Let’s look back on the most exciting new features and changes made in EIM, Locus’ environmental data management software, during 2019!

    1. Migration to AWS Cloud

    In August, Locus migrated EIM into the Amazon Web Services (AWS) cloud. EIM already had superior security, reliability, and performance in the Locus Cloud. The move to AWS improves on those metrics and allows Locus to leverage AWS specific tools that handle big data, blockchain, machine learning, and data analytics. Furthermore, AWS is scalable, which means EIM can better handle demand during peak usage periods. The move to AWS helps ensure that EIM remains the world’s leading water quality management software.

    Infographic: 6 Benefits of EHS on AWS

    2. SSO Login

    EIM now supports Single Sign-On (SSO), allowing users to access EIM using their corporate authentication provider. SSO is a popular security mechanism for many corporations. With SSO, one single login allows access to multiple applications, which simplifies username and password management and reduces the number of potential targets for malicious hacking of user credentials. Using SSO with EIM requires a one-time configuration to allow EIM to communicate with a customer’s SSO provider.

    Locus Single Sign On (SSO)

    3. GIS+ Data Callouts

    The Locus GIS+ solution now supports creating data callouts, which are location-specific crosstab reports listing analytical, groundwater, or field readings. A user first creates a data callout template using a drag-and-drop interface in the EIM enhanced formatted reports module. The template can include rules to control data formatting (for example, action limit exceedances can be shown in red text). When the user runs the template for a specific set of locations, EIM displays the callouts in the GIS+ as a set of draggable boxes. The user can finalize the callouts in the GIS+ print view and then send the resulting map to a printer or export the map to a PDF file.

    Locus GIS+ Data Callouts

    4. EIM One

    For customers who don’t require the full EIM package, Locus now offers EIM One, which gives the ability to customize EIM functionality. Every EIM One purchase comes with EIM core features: locations and samples; analytical and field results; EDD loading; basic data views; and action limit exceedance reports. The customer can then purchase add-on packages to get just the functionality desired–for example a customer with DMR requirements may purchase the Subsurface and Regulatory Reporting packages. EIM One provides customers with a range of pricing options to get the perfect fit for their data management needs.

    EIM One Packages

    5. IoT data support

    EIM can now be configured to accept data from IoT (internet of things) streaming devices. Locus must do a one-time connection between EIM and the customer’s IoT streaming application; the customer can then use EIM to define the devices and data fields to capture. EIM can accept data from multiple devices every second. Once the data values are in EIM, they can be exported using the Expert Query tool. From there, values can be shown on the GIS+ map if desired. The GIS+ Time Slider automation feature has also been updated to handle IoT data by allowing the time slider to use hours, minutes, and seconds as the time intervals.

    Locus IoT Data

    6. CIWQS and NCDEQ exports

    EIM currently supports several dozen regulatory agency export formats. In 2019, Locus added two more exports for CIWQS (California Integrated Water Quality System Project) and the NCDEQ (North Carolina Department of Environmental Quality). Locus continues to add more formats so customers can meet their reporting requirements.

    CIWQS and NCDEQ Exports

    7. Improved Water Utility reporting

    EIM is the world’s leading water quality management software, and has been used since 1999 by many Fortune 500 companies, water utilities, and the US Government. Locus added two key reports to EIM for Water in 2019 to further support water quality reporting. The first new report returns chlorine averages, ranges, and counts. The second new report supports the US EPA’s Lead and Copper rule and includes a charting option. Locus will continue to enhance EIM for Water by releasing the 2019 updates for the Consumer Confidence Report in January 2020.

    Locus Water Utility Reporting

    8. Improved Non-Analytical Views

    Locus continues to upgrade and improve the EIM user interface and user experience. The most noticeable change in 2019 was the overhaul of the Non-analytical Views pages in EIM, which support data exports for locations, samples, field readings, groundwater levels, and subsurface information. Roughly 25 separate pages were combined into one page that supports all these data views. Users are directed through a series of filter selections that culminate in a grid of results. The new page improves usability and provides one centralized place for these data reports. Locus plans to upgrade the Analytical Views in the same way in 2020.

    Non-analytical views in Locus EIM

    9. EIM search box

    To help customers find the correct EIM menu function, Locus added a search box at the top right of EIM. The search box returns any menu items that match the user’s entered search term. In 2020, Locus will expand this search box to return matching help file documents and EDD error help, as well as searches for synonyms of menu items.

    Locus EIM Quick Search

    10. Historical data reporting in EDD loading

    The EIM EDD loader now has a new “View history” option for viewing previously loaded data for the locations and parameters in the EDD. This function lets users put data in the EDD holding table into proper historical context. Users can check for any unexpected increases in parameter concentrations as well as new maximum values for a given location and parameter.

    Historical Data in Locus EIM

    Contact us to see a demo of Locus EIM

      Name

      Company Email

      Phone

      Tell us about your company's needs

      Locus is committed to preserving your privacy.

      Mapping All of Space and Time

      Today is GIS Day, a day started in 1999 to showcase the many uses of geographical information systems (GIS). To celebrate the passage of another year, this blog post examines how maps and GIS show time, and how Locus GIS+ supports temporal analysis for use with EIM, Locus’s cloud-based, software-as-a-service application for environmental data management.

      Space and Time

      Since GIS was first imagined in 1962 by Roger Tomlinson at the Canada Land Inventory, GIS has been used to display and analyze spatial relationships. Every discrete object (such as a car), feature (such as an acre of land), or phenomenon (such as a temperature reading) has a three-dimensional location that can be mapped in a GIS as a point, line, or polygon. The location consists of a latitude, longitude, and elevation. Continuous phenomenon or processes can also be located on a map. For example, the flow of trade between two nations can be shown by an arrow connecting the two countries with the arrow width indicating the value of the traded goods.

      However, everything also has a fourth dimension, time, as locations and attributes can change over time. Consider the examples listed above. A car’s location changes as it is driven, and its condition and value change as the car gets older. An acre of land might start covered in forest, but the land use changes over time if the land is cleared for farming, and then later if the land is paved over for a shopping area. The observed temperature at a given position changes with time due to weather and climate changes spanning multiple time scales from daily to epochal. Finally, the flow of trade between two countries changes as exports, imports, and prices alter over time.

      Maps and Time

      Traditional flat maps already collapse three dimensions into two, so it’s not surprising that such maps do not handle the extra time dimension very well. Cartographers have always been interested in showing temporal data on maps, though, and different methods can be employed to do so. Charles Minard’s famous 1861 visualization of Napoleon’s Russian campaign in 1812-1813 is an early example of “spatial temporal” visualization. It combines two visuals – a map of troop movements with a time series graph of temperature – to show the brutal losses suffered by the French army. The map shows the army movement into Russia and back, with the line width indicating the troop count. Each point on the chart is tied to a specific point on the map. The viewer can see how troop losses increased as the temperature went from zero degrees Celsius to -30 degrees. The original thick tan line has decreased to a black sliver at the end of the campaign.

      Minard's map

      Charles Minard’s map of Napoleon’s Russian campaign in 1812-1813.

      The Minard visual handles time well because the temperature chart matches single points on the map; each temperature value was taken at a specific location. Showing time changes in line or area features, such as roads or counties, is harder and is usually handled through symbology. In 1944, the US Army Corps of Engineers created a map showing historical meanders in the Mississippi River. The meanders are not discrete points but cover wide areas. Thus, past river channels are shown in different colors and hatch patterns. While the overlapping meanders are visually complex, the user can easily see the different river channels. Furthermore, the meanders are ‘stacked’ chronologically, so the older meanders seem to recede into the map’s background, similar to how they occur further back in time.

      Alluvial Valley

      Inset from Geological Investigation of the Alluvial Valley of the Lower Mississippi River.

      Another way to handle time is to simply make several maps of the same features, but showing data from different times. In other words, a temporal data set is “sliced” into data sets for a specific time period. The viewer can scan the multiple maps and make visual comparisons. For example, the Southern Research Station of the US Forest Service published a “report card” in 2011 for Forest Sustainability in western North Carolina. To show different land users over time, small maps were generated by county for three years. Undeveloped land is colored green and developed land is tan. Putting these small maps side by side shows the viewer a powerful story of increasing development as the tan expands dramatically. The only drawback is that the viewer must mentally manipulate the maps to track a specific location.

      Buncombe County land use map

      Land Use change over time for Buncombe County, NC

      GIS and Time

      The previous map examples prove that techniques exist to successfully show time on maps. However, such techniques are not widespread. Furthermore, in the era of “big data” and the “Internet of Things”, showing time is even more important. Consider two examples. First, imagine a shipment of 100 hazardous waste containers being delivered on a truck from a manufacturing facility to a disposal site. The truck has a GPS unit which transmits its location during the drive. Once at the disposal site, each container’s active RFID tag with a GPS receiver tracks the container’s location as it proceeds through any decontamination, disposal, and decommission activities. The locations of the truck and all containers have both a spatial and a temporal component. How can you map the location of all containers over time?

      As a second example, consider mobile data collection instruments deployed near a facility to check for possible contamination in the air. Each instrument has a GPS so it can record its location when the instrument is periodically relocated. Each instrument also has various sensors that check every minute for chemical levels in the air plus wind speed and temperature. All these data points are sent back to a central data repository. How would you map chemical levels over time when both the chemical levels and the instrument locations are changing?

      In both cases, traditional flat maps would not be very useful given the large amounts of data that are involved. With the advent of GIS, though, all the power of modern computers can be leveraged. GIS has a powerful tool for showing time: animation. Animation is similar to the small “time slice” maps mentioned above, but more powerful because the slices can be shown consecutively like a movie, and many more time slices can be created. Furthermore, the viewer no longer has to mentally stack maps, and it is easier to see changes over time at specific locations.

      Locus has adopted animation in its GIS+ solution, which lets a user use a “time slider” to animate chemical concentrations over time. When a user displays EIM data on the GIS+ map, the user can decide to create “time slices” based on a selected date field. The slices can be by century, decade, year, month, week or day, and show the maximum concentration over that time period. Once the slices are created, the user can step through them manually or run them in movie mode.

      To use the time slider, the user must first construct a query using the Locus EIM application. The user can then export the query results to the GIS+ using the time slider option. As an example, consider an EIM query for all benzene concentrations sampled in a facility’s monitoring wells since 2004. Once the results are sent to the GIS+, the time slider control might look like what is shown here. The time slices are by year with the displayed slice for 3/30/2004 to 3/30/2005. The user can hit play to display the time slices one year at a time, or can manually move the slider markers to display any desired time period.

      Locus GIS+ time slider

      Locus GIS+ time slider

      Here is an example of a time slice displayed in the GIS+. The benzene results are mapped at each location with a circle symbol. The benzene concentrations are grouped into six numerical ranges that map to different circle sizes and colors; for example, the highest range is from 6,400 to 8,620 µg/L. The size and color of each circle reflect the concentration value, with higher values corresponding to larger circles and yellow, orange or red colors. Lower values are shown with smaller circles and green, blue, or purple colors. Black squares indicate locations where benzene results were below the chemical detection limit for the laboratory. Each mapped concentration is assigned to the appropriate numerical range, which in turn determines the circle size and color. This first time slice for 2004-2005 shows one very large red “hot spot” indicating the highest concentration class, two yellow spots, and several blue spots, plus a few non-detects.

      Locus GIS+ time slice

      Time slice for a year for a Locus GIS+ query

      Starting the time slider runs through the yearly time slices. As time passes in this example, hot spots come and go, with a general downward trend towards no benzene detections. In the last year, 2018-2019, there is a slight increase in concentrations. Watching the changing concentrations over time presents a clear picture of how benzene is manifesting in the groundwater wells at the site.

      GIS+ time slider in action

      GIS+ time slider in action

      While displaying time in maps has always been a challenge, the use of automation in GIS lets users get a better understanding of temporal trends in their spatial data. Locus continues to bring new analysis tools to their GIS+ system to support time data in their environmental applications.

      Time slice for a Locus GIS+ query

      Time slice for a Locus GIS+ query

      Interested in Locus’ GIS solutions?

      Locus GIS+ features all of the functionality you love in EIM’s classic Google Maps GIS for environmental management—integrated with the powerful cartography, interoperability, & smart-mapping features of Esri’s ArcGIS platform!

      [sc_button link=”https://www.locustec.com/applications/gis-mapping/” text=”Learn more about Locus’ GIS solutions” link_target=”_self” color=”#ffffff” background_color=”#52a6ea” centered=”1″]

      [sc_image width=”150″ height=”150″ src=”16303″ style=”11″ position=”centered” disable_lightbox=”1″ alt=”Dr. Todd Pierce”]

      About the Author—Dr. Todd Pierce, Locus Technologies

      Dr. Pierce manages a team of programmers tasked with development and implementation of Locus’ EIM application, which lets users manage their environmental data in the cloud using Software-as-a-Service technology. Dr. Pierce is also directly responsible for research and development of Locus’ GIS (geographic information systems) and visualization tools for mapping analytical and subsurface data. Dr. Pierce earned his GIS Professional (GISP) certification in 2010.

      EPA to set tougher requirements for lead in water

      The Environmental Protection Agency (EPA) announced that it would impose stricter requirements on water utilities to manage lead and copper contamination in drinking water supplies. The EPA said that tackling water pollution is a core duty of the agency.

      The proposed changes, the first affecting lead level in water since 1991, would also give utilities more time to replace lead pipes in their systems. Some environmental groups are not happy with the proposed rule because the change slows by 20 years the timeline for removing aging lead service pipes that could expose children to lead. Lead is a toxin known to harm developing brains. The rule slows down the removal of pipelines where lead levels exceed 15 μg/L to 33 years from the 13 years in the original law.

      The new rule requires water utilities to identify and remove sources of lead when a water sample at faucet exceeds 15 micrograms per liter (μg/L). The EPA said water systems would also have to follow new, improved sampling procedures and adjust sampling sites to better target locations with higher lead levels.

      Health advocates estimate that as many as six million or more lead water lines remain underground in U.S. cities and towns. Additional sampling and monitoring can help to identify affected areas, and ensure the quality of drinking water sources.

      [sc_button link=”https://www.locustec.com/applications/industry/water-utilities/” text=”Locus for water quality management” link_target=”_self” color=”#ffffff” background_color=”#52a6ea” centered=”1″]

      Predicting Water Quality with Machine Learning

      At Locus Technologies, we’re always looking for innovative ways to help water users better utilize their data. One way we can do that is with powerful technologies such as machine learning. Machine learning is a powerful tool which can be very useful when analyzing environmental data, including water quality, and can form a backbone for competent AI systems which help manage and monitor water. When done correctly, it can even predict the quality of a water system going forward in time. Such a versatile method is a huge asset when analyzing data on the quality of water.

      To explore machine learning in water a little bit, we are going to use some groundwater data collected from Locus EIM, which can be loaded into Locus Platform with our API. Using this data, which includes various measurements on water quality, such as turbidity, we will build a model to estimate the pH of the water source from various other parameters, to an error of about 1 pH point. For the purpose of this post, we will be building the model in Python, utilizing a Jupyter Notebook environment.

      When building a machine learning model, the first thing you need to do is get to know your data a bit. In this case, our EIM water data has 16,114 separate measurements. Plus, each of these measurements has a lot of info, including the Site ID, Location ID, the Field Parameter measured, the Measurement Date and Time, the Field Measurement itself, the Measurement Units, Field Sample ID and Comments, and the Latitude and Longitude. So, we need to do some janitorial work on our data. We can get rid of some columns we don’t need and separate the field measurements based on which specific parameter they measure and the time they were taken. Now, we have a datasheet with the columns Location ID, Year, Measurement Date, Measurement Time, Casing Volume, Dissolved Oxygen, Flow, Oxidation-Reduction Potential, pH, Specific Conductance, Temperature, and Turbidity, where the last eight are the parameters which had been measured. A small section of it is below.

      Locus Machine Learning - Data

      Alright, now our data is better organized, and we can move over to Jupyter Notebook. But we still need to do a bit more maintenance. By looking at the specifics of our data set, we can see one major problem immediately. As shown in the picture below, the Casing Volume parameter has only 6 values. Since so much is missing, this parameter is useless for prediction, and we’ll remove it from the set.

      Locus Machine Learning - Data

      We can check the set and see that some of our measurements have missing data. In fact, 261 of them have no data for pH. To train a model, we need data which has a result for our target, so these rows must be thrown out. Then, our dataset will have a value for pH in every row, but might still have missing values in the other columns. We can deal with these missing values in a number of ways, and it might be worth it to drop columns which are missing too much, like we did with Casing Volume. Luckily, none of our other parameters are, so for this example I filled in empty spaces in the other columns with the average of the other measurements. However, if you do this, it is necessary that you eliminate any major outliers which might skew this average.

      Once your data is usable, then it is time to start building a model! You can start off by creating some helpful graphs, such as a correlation matrix, which can show the relationships between parameters.

      Locus Machine Learning - Corr

      For this example, we will build our model with the library Keras. Once the features and targets have been chosen, we can construct a model with code such as this:

      Locus Machine Learning - Construct

      This code will create a sequential deep learning model with 4 layers. The first three all have 64 nodes, and of them, the initial two use a rectified linear unit activation function, while the third uses a sigmoid activation function. The fourth layer has a single node and serves as the output.

      Our model must be trained on the data, which is usually split into training and test sets. In this case, we will put 80% of the data into the training set and 20% into the test set. From the training set, 20% will be used as a validation subset. Then, our model examines the datapoints and the corresponding pH values and develops a solution with a fit. With Keras, you can save a history of the reduction in error throughout the fit for plotting, which can be useful when analyzing results. We can see that for our model, the training error gradually decreases as it learns a relationship between the parameters.

      Locus Machine Learning - Construct

      The end result is a trained model which has been tested on the test set and resulted in a certain error. When we ran the code, the test set error value was 1.11. As we are predicting pH, a full point of error could be fairly large, but the precision required of any model will depend on the situation. This error could be improved through modifying the model itself, for example by adjusting the learning rate or restructuring layers.

      Locus Machine Learning - Error

      You can also graph the true target values with the model’s predictions, which can help when analyzing where the model can be improved. In our case, pH values in the middle of the range seem fairly accurate, but towards the higher values they become more unreliable.

      Locus Machine Learning - Predict

      So what do we do now that we have this model? In a sense, what is the point of machine learning? Well, one of the major strengths of this technology is the predictive capabilities it has. Say that we later acquire some data on a water source without information on the pH value. As long as the rest of the data is intact, we can predict what that value should be. Machine learning can also be incorporated into examination of things such as time series, to forecast a trend of predictions. Overall, machine learning is a very important part of data analytics and the development of powerful AI systems, and its importance will only increase in the future.

      What’s next?

      As the technology around machine learning and artificial intelligence evolves, Locus will be working to integrate these tools into our EHS software. More accurate predictions will lead to more insightful data, empowering our customers to make better business decisions.

      Contact us today to learn how machine learning and AI can help your EHS program thrive

        Name

        Company Email

        Phone

        Tell us about your company's needs

        Locus is committed to preserving your privacy.

        Infographic: 6 Benefits of EHS on AWS

        In this infographic, we have outlined a few of the ways EHS programs benefit from having an AWS-hosted solution. Locus customers recently received these benefits as a result of moving our entire infrastructure to Amazon Web Services—the world’s leading cloud. Learn more about the move to AWS.

        Infographic: 6 Benefits of EHS on AWS

        Contact us to learn more about these benefits

          Name

          Company Email

          Phone

          Tell us about your company's needs

          Locus is committed to preserving your privacy.