EXCLUSIVE INTERVIEW

Can’t We All Just Get Along? Q&A With OSA Community Dev Chair Gopi Ganapathy

The Open Solutions Alliance (OSA), a federated community of open source business and developer communities, opened in earlyDecember a community portal built upon technology from one of its newest member organizations, Essentia. The company develops software platforms and solutions for online communities and commerce.

The EssentiaESP is a community-engagement platform designed specifically for commercial open source and taps into the latest trends in social networking to encourage greater collaboration among companies and open source communities.

The EssentiaESP environment supports multiple project types. Some of the core projects hosted there include the Common Customer View (CCV) project; Interoperability projects between OSA member companies like the Hyperic/Jaspersoft integration; and member projects for OSA companies including Hyperic, Ingres, Jaspersoft, Openbravo, SpikeSource and others.

“What the OSA is trying to do is ensure that all of the different companies can cooperate and integrate. And it wants to make sure that there are a set of standards that all companies can publish to,” Gopi Ganapathy, president and CEO of Essentia and new OSA Community Development Chair, told LinuxInsider.

OSA’s choice of EssentiaESP reaffirms Essentia’s leadership in community development, collaboration, commerce and open source development, he said. His company’s involvement with OSA will foster deeper interoperability efforts among OSA member company that will be open to the worldwide community of open source developers, he explained.

Ganapathy recently addressed the second annual Malaysian Government Open Source Software Conference (MyGOSSCON) in Kula Lumpur, Malaysia. MyGOSSCON is a conference designed to support the Accelerated Adoption (Phase II) of the Malaysian Public Sector OSS Master Plan Program.

He addressed the conference on the topic of Building Vibrant Open Source Communities drawing on his experience from the OpenOffice.org, java.net, NetBeans and JasperForge.org communities.

LinuxInsider met with Ganapathy to discuss Essentia’s interest in the OSA and the goals of the new community portal.

LinuxInsider: What sparked your enthusiasm for working so closely with the OSA?

Gopi Ganapathy:

Open source has a couple of key drivers. One obviously is the ability to get source code software distributed fast. The second is the supporting community that actually comes together to build the software. OSA is very interested in insuring that open source gets adopted. One of the challenges for OSA is that there really wasn’t a deep system integrator that really understands all theoptions and the offerings and how to put it all together.

LI: What role is Essentia playing with helping the OSA meet this challenge?

GG:

The problem for OSA is getting software adopted and building cross-product traditions that comprise a number of member companies’ products. This demonstrates how fairly complex this system is to work under a common framework. To make all these things happen you need to have a suitable open source platform or environment where companies can collaborate through the use of a mega community that is basically a community of communities.

LI: How unique is the platform Essentia developed for the OSA portal?

GG:

I think there is no other product out there that can actually handle the federation of communities and allow broadening participation of community members from different communities toparticipate in wide dissemination of knowledge and adoptions of products and be able to create vertical solutions or localized solutions for specific markets.

LI: Is EssentiaESP something your company already had, or is it something you created for the OSA?

GG:

We always had a history of working with open source platforms. Back in the 1980s we were involved in developing platforms for fairly large software deployments in OpenOffice. I was personally responsible for running the dream team for developing several products. Since then we have built a couple of generations of community platforms.

LI: What changes in community use are you noticing with this current platform generation?

GG:

The interest in the current generation is that several communities have moved from being very developer-centric to becoming primarily business users rather than developers. The interest is really to figure out how to experience the information that exists among the broad users and ensure that we have a low risk in terms of adopting.

LI: How else are open source communities adapting to the use of a massive collaboration portal?

GG:

We also see that there are new paradigms of social networking developing that people have become quite comfortable using. These include all the video collaborative capabilities like Skype and IM and Twitter and everything else that is going on.

LI: Have you integrated these communication trends into EssentiaESP?

GG:

We decided to basically bring together all the core software tools for open software development with social networking to provide a modern platform that can fully scale. When we say scale, we are talking about hundreds of thousands of users who can come together to work on a product.

LI: Is there any cost for members to participate in this portal, or do they have access by virtue of belonging to the OSA?

GG:

OSA underwrites the cost of writing this platform and supporting members use of it. OSA wants to ensure that they have a highly stable environment in which a lot of talented open sourcecommunities like SourceForge and JasperForge are all in one place. That really shows the strength of the participation that is possible among open source companies.

LI: Who actually runs the portal?

GG:

We [Essentia] not only designed the solution, we actually handle the whole operation. So the entire solution is built and managed and hosted by us. So now everything happens through EssentiaESP and all the team members are based in California and the East Coast. We also have community management with people who really understand how to build an active and viable open source community and ensure that participation happens and support the users of the platform.

Leave a Comment

Please sign in to post or reply to a comment. New users create a free account.

Related Stories
More by Jack M. Germain
More in Exclusives

TechNewsWorld Channels

EXCLUSIVE INTERVIEW

Data Observability’s Big Challenge: Build Trust at Scale

The cost of cleaning data is often beyond the comfort zone of businesses swamped with potentially dirty data. That clogs the pathways to trustworthy and compliant corporate data flow.

Few companies have the resources needed to develop tools for challenges like data observability at scale, according to Kyle Kirwan, co-founder and CEO of data observability platform Bigeye. As a result, many companies are essentially flying blind, reacting when something goes wrong rather than proactively addressing data quality.

Data trust provides a legal framework for managing shared data. It promotes collaboration through common rules for data security, privacy, and confidentiality; and enables organizations to securely connect their data sources in a shared repository of data.

Bigeye brings data engineers, analysts, scientists, and stakeholders together to build trust in data. Its platform helps companies automate monitoring and anomaly detection and create SLAs to ensure data quality and reliable pipelines.

With complete API access, a user-friendly interface, and automated yet flexible customization, data teams can monitor quality, proactively detect and resolve issues, and ensure that every user can rely on the data.

Uber Data Experience

Two early members of the data team at Uber — Kirwan and Bigeye Co-founder and CTO Egor Gryaznov — set out to use what they learned building Uber’s scale to create easier-to-deploy SaaS tools for data engineers.

Kirwan was one of Uber’s first data scientists and the first metadata product manager. Gryaznov was a staff-level engineer who managed Uber’s Vertica data warehouse and developed several internal data engineering tools and frameworks.

They realized the tools their teams were building to manage Uber’s massive data lake and thousands of internal data users were far ahead of what was available to most data engineering teams.

Automatically monitoring and detecting reliability issues within thousands of tables in data warehouses is no easy task. Companies like Instacart, Udacity, Docker, and Clubhouse use Bigeye to keep their analytics and machine learning working continually.

A Growing Field

Founding Bigeye in 2019, they recognized the growing problem enterprises face in deploying data into high-ROI use cases like operations workflows, machine learning-powered products and services, and strategic analytics and business intelligence-driven decision making.

The data observability space saw a number of entrants in 2021. Bigeye separated itself from that pack by providing users the ability to automatically assess customer data quality with more than 70 unique data quality metrics.

These metrics are trained with thousands of separate anomaly detection models to ensure data quality problems — even the hardest to detect — never make it past the data engineers.

Last year, data observability burst onto the scene with no less than ten data observability startups announcing significant funding rounds.

This year, data observability will become a priority for data teams as they seek to balance the demand of managing complex platforms with the need to ensure data quality and pipeline reliability, Kirwan predicted.

Solution Rundown

Bigeye’s data platform is no longer in beta. Some enterprise-grade features are still on the roadmap, like complete role-based access control. But others, like SSO and in-VPC deployments are available today.

The app is closed source, and so are the proprietary models used for anomaly detection. Bigeye is a big fan of open-source options but decided to develop its own to achieve the performance goals internally set.

Machine learning is used in a few key places to bring a unique blend of metrics to each table in a customer’s connected data sources. The anomaly detection models are trained on each of those metrics to detect abnormal behavior.

Three features built-in at the end of 2021 automatically detect and alert on data quality issues and enable data quality SLAs.

The first, Deltas, makes it easy to compare and validate multiple versions of any dataset.

Issues, the second, bring multiple alerts together into a single timeline with valuable context about related issues. This makes it simpler to document past fixes and speed up resolutions.

The third, Dashboard, provides an overall view of the health of the data, helping to identify data quality hotspots, close gaps in monitoring coverage, and quantify a team’s improvements to reliability.

Eyeballing Data Warehouses

TechNewsWorld spoke with Kirwan to demystify some of the complexities his company’s data sniffing platform offers data scientists.

TechNewsWorld: What makes Bigeye’s approach innovative or cutting edge?

Kyle Kirwan
Bigeye Co-founder and CEO
Kyle Kirwan, co-founder and CEO of Bigeye

Kyle Kirwan: Data observability requires constant and complete knowledge of what is happening inside all the tables and pipelines in your data stack. It is similar to what SRE [site reliability engineering] and DevOps teams use to keep applications and infrastructure working around the clock. But it is reimagined for the world of data engineering and data science.

While data quality and data reliability have been an issue for decades, data applications are now critical to how many leading businesses run; because any loss of data, outage, or degradation can quickly result in lost revenue and customers.

Without data observability, data dealers must constantly react to data quality issues and have to wrangle the data as they go to use it. A better solution is identifying the issues proactively and fixing the root causes.

How does trust impact the data?

Kirwan: Often, problems are discovered by stakeholders like executives who do not trust their often-broken dashboard. Or users get confusing results from in-product machine learning models. The data engineers can better get ahead of the problems and prevent business impact if they are alerted early enough.

How is this concept different from similar-sounding technologies such as unified data management?

Kirwan: Data observability is one core function within data operations (think: data management). Many customers look for best-of-breed solutions for each of the functions within data operations. This is why technologies like Snowflake, Fivetran, Airflow, and dbt have been exploding in popularity. Each is considered an important part of “the modern data stack” rather than a one-size-fits-none solution.

Data observability, data SLAs, ETL [extract, transform, load] code version control, data pipeline testing, and other techniques should be used in tandem to keep modern data pipelines all working smoothly. Just like high-performance software engineers and DevOps teams use their sister techniques.

What role do data pipeline and DataOps play with data visibility?

Kirwan: Data observability is closely related to DataOps and the emerging practice of data reliability engineering. DataOps refers to the broader set of all operational challenges that data platform owners will face. Data reliability engineering is a part of data ops, but only a part, just as site reliability engineering is related to, but does not encompass all of DevOps.

Data observability could have benefits to data security, as it could be used to identify unexpected changes in query volume on different tables or changes in behavior to ETL pipelines. However, data observability would not likely be a complete data security solution on its own.

What challenges does this technology face?

Kirwan: These challenges cover problems like data discovery and governance, cost tracking and management, and access controls. It also covers how to manage an ever-growing number of queries, dashboards, and ML features and models.

Reliability and uptime are certainly challenges for which many DevOps teams are responsible. But they are often also charged with other aspects like developer velocity and security considerations. Within these two areas, data observability enables data teams to know whether their data and data pipelines are error-free.

What are the challenges of implementing and maintaining data observability technology?

Kirwan: Effective data observability systems should integrate into the workflows of the data team. This enables them to focus on growing their data platforms rather than constantly reacting to data issues and putting out data fires. A poorly tuned data observability system, however, can result in a deluge of false positives.

An effective data system should also take much of the maintenance out of testing for data quality issues by automatically adapting to changes in the business. A poorly optimized data observability system, however, may not correct for changes in the business or overcorrect for changes in the business, requiring manual tuning, which can be time-consuming.

Data observability can also be taxing on the data warehouse if not optimized properly. The Bigeye teams have experience optimizing data observability at scale to ensure that the platform does not impact data warehouse performance.

Jack M. Germain has been an ECT News Network reporter since 2003. His main areas of focus are enterprise IT, Linux and open-source technologies. He is an esteemed reviewer of Linux distros and other open-source software. In addition, Jack extensively covers business technology and privacy issues, as well as developments in e-commerce and consumer electronics. Email Jack.

Leave a Comment

Please sign in to post or reply to a comment. New users create a free account.

Related Stories
More by Jack M. Germain
More in Data Management