A key factor for applications dealing with lots of data – including complex event processing – is security. Nowadays as IoT is more popular than ever, one can hear more and more stories about security breaches, as the simple internet connected devices are often less secured, thus more vulnerable to different types of attacks.
REACH uses Fog Computing , which means none of the data leaves the factory's territory, making the external attack itself impossible. Of course, this doesn't mean that everything is secured, as if hundreds, thousands, or even tens of thousands of employees and vendors can access every data without restrictions, the chance of a potential disaster is excessive. Just think about what could happen if someone deletes all the data collected the past years - it doesn't matter if it was intentional or not.
Most companies only think about security after the Armageddon already happened – such as a leak or destruction of private data. All of the incidents are avoidable with enough care. Fortunately, there are multiple ways to address these problems, and REACH also has these solutions integrated together by default.
Kerberos – developed by MIT – plays a key role in authentication, to only let people (and services) access the data if they can prove their identity. The client authenticates itself at the Kerberos server, and receives an encrypted timestamped ticket-granting ticket (or TGT for short), and whenever it wants to access a new service in the TGT’s lifespan, it asks for a separate ticket for that exact service. The different services are accessible only with these valid tickets, which also have a lifespan so they are unusable after a short period, and all the tickets are encrypted with AES256, which could take an eternity to brute force with billions of supercomputers.
The next level of security is authorization, where rules specify who can do what. Lightweight Directory Access Protocol – shortened as LDAP – is an industry standard for distributed directory access, created by University of Michigan. It’s OpenLDAP implementation is fully open source, and integrates well with Kerberos, what makes them a perfect fit for security. It holds all the information about the users and services, and tells which user has permission to access a specific resource.
However, one piece is still missing: what if the fog devices are communicating with each other? They still have to send data across the local network to collaborate, and one could sniff those packets. The solution for this is the usage of Transport Layer Security (TLS), which is a cryptographic protocol. It encrypts the data over the network, so only the intended recipient can open the messages.
Remember, no matter how tall, spiky, strong fence you have at 95% of your territory’s circumference, your fence is as strong as its weakest part. Any of the above technologies wouldn't be enough alone, but together they form an all-round security layer to protect your valued data.
The role of Industrial IoT in maintenance and manufacturing optimization
Maintenance is a task that is carried out in factories on a daily basis for keeping machines healthy and the whole manufacturing process efficient. The main goal is to do maintenance before a particular machine starts producing waste or even suffers complete failure. It is easy to prove that preventing machines from being stuck means lower operating costs and helps keeping production smooth and fluent. Still many factories have a hard time dealing with downtime due to asset failure.
Standard maintenance procedures – Preventive Maintenance (PM)
The purpose of regular care and service done by maintenance personnel is to make sure that the equipment remains productive, without any major breakdowns. For this purpose, maintenance periods are specified conservatively, usually based on data measured by the equipment manufacturer or at the beginning of operation. However, these procedures do not account for the actual condition of the machine resulting from different environmental effects – like ambient temperature and air humidity –, raw material quality, load profiles and more.
In order to take the ever changing operation conditions into account, condition-relevant data needs to be collected and processed. This is condition based maintenance (CBM). In case of machining it is essential to measure ambient parameters, machine vibration, sound and motor current, which give a picture about the concrete health state of the machine and machining tools. The availability of this data enables making the step towards a more sophisticated maintenance mode: Predictive maintenance.
Bring maintenance to the next level: Predictive maintenance (PdM)
Maintenance work that is based on prediction presumes fulfilling following requirements. First of all, the data that is collected from the asset contains the information showing the signs of an upcoming event. In other words, patterns precisely describing each event can be identified in the measurement signals. If this hypothesis holds true, the next step is to either hard-code the conditions indicating oncoming failure, or use a Machine Learning algorithm to identify and literally learn the particular failure mode patterns.
The second requirement comes into picture as soon as the patterns have been identified and the model is capable of predicting the unwanted event soon enough to take action. This requirement addresses the architecture of the system making the prediction: it needs to operate in real time. The reason is that there are many applications where damage can be predicted only shortly before the event (usually measured in minutes or seconds). Advanced IIoT systems feature real-time operation.
And last but not least, manufacturing situations where multiple machines and robots are involved – a group of machines together having impact on the product quality or operation efficiency of consequential machines –, events and data are very complex. Processing this data requires a system that has high computing performance and tools to handle the complexity.
The complex data processing and predictive capability of REACH lies in the most advanced Big Data technologies, built-in Machine Learning algorithms and the real-time Fog Computing architecture . The system is capable of learning and distinguishing between failure modes and sending alerts in case of an expected breakdown.
Although Predictive maintenance is a big step for most manufacturers, there is still a next level to go for.
State-of-the-art: Prescriptive maintenance (RxM)
Prescriptive maintenance requires even more detailed data and a checklist on what actions to take in case of a detected failure mode pattern. Although technology makes implementing prescriptive systems utterly possible, only few organizations make it to this point. This step requires a very good harmonization between maintenance and production departments, a fair understanding of the problem and efficient cross-department information sharing. These are the key criteria of a successful Industrial IoT implementation anyway.
Manufacturing companies that take the effort to collect data, analyze the problem, identify patterns of inadequate operation, understand and prepare their data, can reach the level of a Smart Factory regarding maintenance operations, too. In the presented case, not only the checklist is being displayed on the REACH UI, but emails and SMS can be sent to maintenance personnel and other relevant stakeholders. This minimizes the time required to prepare for required actions.
Besides using email and SMS alerting, REACH can send status messages to engineers and other personnel even via our chatbot called RITA. Using state-of-the art technology needs to be fun, too!
In our earlier posts we talked about how to store, process and analyze data, but missed a crucial step, how to collect them. An outstanding challenge for the IoT lies in connecting sensors, devices, endpoints in a cost effective and secure way to capture, analyze and effectively gain insights from the massive amounts of data. IoT gateway is the key element in this process, below we describe why.
The definition of an IoT gateway has changed over time as the market developed. Just like traditional gateways in networks do, IoT gateways function like bridges – and they bridge a lot, positioned between edge systems and our REACH solution.
IoT Gateway market
IoT gateways fulfil several roles in IoT projects. IoT gateways are built on chipsets that feature low-power connectivity and may be rugged for critical conditions. Some gateways also focus on fog computing applications, in which customers need critical data so that machines can make split-second decisions. Based on this IoT vendors can be divided into three groups. Vendors those who just give hardware (Dell), companies who focus on softwares & analytics (Kura, Kepware) and the end-to-end providers (Eurotech).
Our IoT Gateway solution belongs to the software & analytics group, which is an OPC client (communicating with an OPC server). OPC is a software interface standard that allows secure and reliable exchange of data with industrial hardware devices.
What are IoT Gateways?
Gateways are emerging as a key element of bringing legacy and next-gen devices to the Internet of Things (IoT). Modern IoT gateways also play an increasingly important role in helping to provide analytics so that only the most important information and alerts are sent up to the REACH to be acted upon. They integrate protocols for networking, help manage storage and analytics on the data, and facilitate data flow securely between edge devices and REACH.
Mainly in Industrial IoT there is an increasing movement towards the fog as is the case in many technologies.
Intelligent IoT gateways
With fog computing (and the movement to the edge overall) we really enter the space of what is now known as an intelligent IoT gateway. Whereas in the initial and more simple picture an IoT gateway sat between the sensors, devices and so forth on one hand and the cloud on the other, a lot of analytics and filtering of information is now increasingly done closer to the sensors through fog nodes for myriad possible reasons as explained in our article on fog computing. The illustration below shows where the intelligent IoT gateway (and soon they’ll all be intelligent) sits in an IoT architecture.
(img source: https://www.postscapes.com/iot-gateways/ )
Machine learning is a process how we make software algorithms to learn from huge amounts of data. This term was originally used by Arthur L. Samuel, who described it as: “programming of a digital computer to behave in a way which if done by human beings”. ML is an alternative way to build AI with help of statistics to find patterns in data rather than using explicitly hard-coded routines with millions of lines of code. There is a group of algorithms, that allows to build such applications that can receive input data and predict an output dependings on the input. ML is also can be understood as a process, when you “show” tons of data – text, pictures, sensor data – to the machine, with the required output – this is the training part – and then you “show” a new picture without required output and ask the machine to guess the result.
Machine learning has grown to be a very powerful tool for various problems from different areas, for example text processing – for categorizing documents or speech recognizers (chatbots) – or image processing – where we train the algorithm with hundreds of thousands of tagged pictures, to be able to recognize persons, objects, etc.
However, from our point of view, the more important use-cases are those where factory machines and processes are involved. In this case we have to collect, assort and store many different sensor readings from different producing robots to train our machine learning solutions for different purposes – like predict these machine’s failures.
Before training, different tasks have to be done, like data preparation - which contains for example filling or throwing empty cells out, standardize data, sort the important features etc. - training - involves feeding the cleaned data, to the algorithm, to adjust itself - and finally, we have to be able to measure and improve our solution with new data, and parameters.
For these purposes a data pipeline should be built, using the same processes to a newly arrived data as we have done in the training state. With REACH, we are able to do all of these tasks - data preparation, training and building the data pipeline - easily, and in a user friendly way through the UI, for useful solutions which will result in downtime and cost reduction, too.
Machine Learning within REACH
We provide solutions for many different problems occurring during the lifetime of a machine learning project: we have different tools for different roles: we provide an easy and simple graphical UI with pre-configured models, which helps you to focus only on the data and the pattern behind it.
Of course, whit this approach you will also be able to tune the model parameters and compare them to find out which parameters are more suitable for your application. For developers who want to build their own solution, REACH is also perfectly suitable; with an embedded jupyter notebook, users are able to build any model with different technologies – like scikit learn, spark ML lib, tensorflow, etc. These models could be deployed to a single machine or distributed to the cluster to reach the best performance and better scalability.
Towards to the future
Machine learning as a term is so pervasive today, but many people use it in wrong way, or mix it up with AI or deep learning – our following blog posts topics. Whit this introduction you can get a little insight from this technology to have a general picture how to build applications which are able to improve their performance without any human interaction by analysing data and using feedback of performance.
As we discussed in our previous blog post, data lake and Big Data is a required technology to cover all the needs of the Industry 4.0. But what about my classic data? Should I transfer them to a data lake? Do I have to redesign all my application and process to use the new storage layer?
The answer is definitely not. You do not need to throw out your classic databases, systems and processes that uses them. Our terminology and best practices says that a classical database engine can live in a smooth symbiosis with a modern Big Data data lake system approach. It’s only about a well-designed architecture, which is not easy to create. That’s why our experts designed REACH to be ready to handle such situations and we call it a hybrid architecture, where all data goes its proper place. Some data should be stored in the data lake system, but some of them should go to the classical storage layer.
The question is where to combine these datasets. REACH is designed to be ready to combine these different data sources also on process and analytical level, so at the end of an analysis you cannot distinguish whether a data came from the classical layer or from the Big Data data lake.
We believe in the proper storage technique that says all the data must go to its proper storage layer, we should not say use only one kind of storage for all your data. Going further by implementing this technology our experts designed REACH to handle multiple storage layer not only saying this should go to the big data layer or into a classical layer but saying that you should use different storage techniques inside the data lake and also in the classical storage layer. A good example for this where Kudu, HBase and HDFS lives next to each other by extending the storage techniques from the classical layer where relational database storage is also mixed with standard file storage techniques. That is why we cannot say that one database engine is good for all. REACH is designed to support this multistorage approach to get the maximum out of your data.
In the world of Big Data traditional data warehouses are not sufficient anymore to support the requirements of the 4.0 level industry and to become the foundation of truly real-time solutions. In contrast to the structured data storage concept of the traditional data warehouses, a Data Lake can offer a solution that will keep the original format and state of the data and provide real-time access to them. The greatest advantage of a Data Lake is that it is capable of storing tremendous amount of data while preserving its raw format in a distributed, scalable storage system. Therefore, it is possible to store data coming from various data sources so that it is adaptable for future requirements, resulting in such a flexibility that current data warehouses cannot provide.
What can a Data Lake offer?
The concept of a Data Lake enables factories to fulfill the requirements of the Industry 4.0, to make data generated during production accessible for other participants in the production line, in the swiftest, smoothest way with the help of the Complex Event Processing method (introduced in our previous blog post), as the data is stored in its original raw format and no data transformation will slow down this process. For this reason, REACH is putting Data Lake in the heart of its architecture so that we could contribute to the competitiveness of our partners. Without the modernization of the storage process, real-time analysis and automations are not possible. Companies that seek to utilize machine learning methods, need to possess a wide range of data sources to provide the sufficient amount of data for the algorithms. Cost is also an important element: in case of a Hadoop-based Data Lake utilizing well-known big data techniques, storage costs are minimal compared to a standard data warehouse solution, because Hadoop consists of open source technologies. Furthermore, its hardware requirement is also lower due to its distributed setup, so it can be built even on commodity hardware.
Should data warehouses be replaced?
One of the main aspects during the design of the REACH architecture was integrability, therefore it offers interfaces to connect to various data sources and applications. See our upcoming blogpost of hybrid architectures!