CIOs: 5 Big Data Operational Changes To Make Now
Preparing Your Organization for GDPR Compliance
The threat of a $24 million fine is enough to make any organization sit up and listen to what changes they must make to adhere to new European Union laws on data protection. But, in preparing for General Data Protection Regulation (GDPR), are U.S. companies focused too much on the “data” in their big data clusters? David Dingwall, of Fox Technologies, believes so. He says putting these clusters through GDPR compliance is dependent on some fundamental technical setups. Getting the “plumbing” wrong can bypass all that expensive compliance process review work and cause your organization to fail audit reviews.
The beauty of building extra-large Linux clusters is that it’s easy. Hadoop, OpenStack, hypervisor and HPC installers enable you to build on commodity hardware and deal with node failure reasonably simply. However, a minimum fine of at least €20 million (US$24 million) for a GDPR violation does make you focus on how auditors are going to treat their review of your organization’s people-related data storage and manipulation.
Most of the GDPR review articles you may have read in the last 12 months reinforce that privacy and encryption of people data is hugely important. Multiple layers of encryption for data at rest and in transit through your infrastructure is appropriate. However, when dealing with new big data infrastructures, crucial audit areas of concern include being clear how the software manipulates, aggregates, anonymizes or de-anonymizes (soon to be illegal in the U.K.) people data.
There are some key lessons from the financial services marketplace, which have been using Linux-based HPC and blade clusters for data modelling and forecasting for the last 15 years, especially the operational planning and setup that make ongoing cycles audits easier to complete.
Big Data Cluster Fundamentals: The Large Sausage Machine Without Real People
There is temptation to build a new data-processing cluster on a standalone network to constrict data movement, with supplemental admin access on a second corporate LAN interface. Once loaded, however, like an Oracle database in the past, a data work package for Hadoop and HPC clusters tends to execute all running data transforming tasks in a cluster with a single account (e.g., “hadoop”), not the submitting user ID.
Audit needs to prove not just how personal data is stored, but also how data is manipulated. Therefore, this includes understanding who on your staff can create, change or log in at these application-specific accounts, or worse, the operating system root account.
Read full article on Corporate Compliance Insights