EXCLUSIVE INTERVIEW

NYTCo Dons Software Vendor Cap: Q&A With News Services Director Christine Topalian

It’s no secret that newspaper companies are seeking new sources of revenue as readers — and the advertising dollars that typically follow them — migrate to electronic media. The New York Times Co. was among the first newspaper publishers to deliver its content to mobile devices. Now, it’s hoping to turn a profit by helping other publishers to do the same.

The New York Times Company has effectively entered the software development business, creating applications that publishers can use to deliver their content to Apple iPhones and iPads.

Several publishing companies are scheduled to adopt the Times’ Press Engine application when it becomes available later this year. These companies, some of which could be considered Times competitors, will pay a one-time license fee and monthly maintenance fees for the Times to maintain their applications. Publishers will retain all advertising and subscription revenue generated by their individual publications.

The E-Commerce Times spoke with Christine Topalian, director, news services, with the New York Times Company, about this new venture and its broader implications for the newspaper industry.

E-Commerce Times: Why is the New York Times Company, in effect, going into the software business?

Christine Topalian:

I’m part of a group called “news services” within the New York Times Company. We are a client-facing organization that serves roughly 1,500 clients globally. We provide content, which is the primary asset of the Times company, to these various media companies. Several clients came to us and asked if we would be interested in licensing the code for our own iPhone applications. That started a process of exploration through which we realized that it was a great idea and something that other publishers could benefit from.

ECT: What will the Press Engine platform offer that publishers can’t get from cutting their own deals with Apple to put their content on iPads and other devices directly?

Topalian:

Typically, publishers don’t do deals with Apple or other device manufacturers. They do deals with vendors who create applications for them. In that realm, the New York Times offers several years of experience in this space. We have an understanding of how users consume content on these devices, as well as technology and design expertise that publishers will find helpful.

ECT: Is the development of what might be considered ancillary products and services that support the distribution of content something we should expect to see from newspaper companies as a way of adjusting to a world in which ad revenue is declining?

Topalian:

It’s already happening. If you look at the mobile market today, compared with four years ago, there has been huge growth — and mobile content would have been considered an ancillary product to the website four years ago.

ECT: The Times company already has signed up a number of customers for Press Engine, including several companies that might be considered competitors. Is there an expectation going forward that newspaper companies that have competed against one another will have to enter into customer-supplier relationships, or even partnerships, to solidify their futures?

Topalian:

We’re not competing with anyone by providing them a framework to distribute content, because their content is stronger than that framework. A user who reads The Dallas Morning News or The Daily Telegraph will continue to do so. Users are coming for the content, and they are very loyal to specific brands.

ECT: Does the Times Company anticipate other publishing companies developing platforms similar to Press Engine?

Topalian:

There definitely will be a certain amount of competition. As I mentioned, technology vendors are creating these types of platforms for publishers; other organizations are doing so as well — and additional competition is to be expected. The difference between getting this type of service from the New York Times as opposed to someone else goes back to what I said about our history of expertise in this area. We have a history of creating products that distribute content.

ECT: The Press Engine business model depends to a large degree on publishers generating a certain amount of revenue through paid content. News Corp., a Times Company competitor, has a project in the works that calls for delivering content to e-readers — and that project also relies on having users pay for content. What has the Times company’s research shown in regard to users’ willingness to pay for content?

Topalian:

Let’s go back to the first part of your question, which assumes that these are going to be paid applications. Advertising revenue is also an important part of these applications, as well as the publishers’ business models. We do have ad units in the application. Publishers can serve ads directly into the applications and generate revenue in that fashion. On the second part of the question, the New York Times Company has announced plans to offer a paid model on its website later in early 2011. Arthur Sulzberger Jr., the publisher of The New York Times, has said that our audiences are very loyal, and we believe they will be willing to pay for our award-winning digital content. We don’t really want to say much more about the pay model.

ECT: What about non-New York Times content that’s distributed through Press Engine? Do you have an opinion on whether users will be willing to pay for that content?

Topalian:

Every publisher has a different audience. We’re not going to make business decisions for the individual publishers. Each of them has its own strategy in this area. Several applications offer the possibility of placing ads, some offer subscription models. There are several different ways that a publisher can look at this business and a number of different factors that go into their decision making.

ECT: How important is the device that readers use to access content to the paid content model? Press Engine is expected to launch by delivering content to the iPhone and iPad, two wildly popular and very trendy devices. Would you be as confident about the product’s chances to succeed if it were slated to start off delivering content to Windows-based tablets, BlackBerries, or desktop PCs?

Topalian:

We’re responding to a market need. Our clients have been asking for iPhone and iPad applications, and that’s where we’re focusing our efforts at this time.

ECT: Does that mean you expect to go to other devices at some point down the road?

Topalian:

We’re looking into it.

Leave a Comment

Please sign in to post or reply to a comment. New users create a free account.

Related Stories
More by Sidney Hill
More in Exclusives

TechNewsWorld Channels

EXCLUSIVE INTERVIEW

Data Observability’s Big Challenge: Build Trust at Scale

The cost of cleaning data is often beyond the comfort zone of businesses swamped with potentially dirty data. That clogs the pathways to trustworthy and compliant corporate data flow.

Few companies have the resources needed to develop tools for challenges like data observability at scale, according to Kyle Kirwan, co-founder and CEO of data observability platform Bigeye. As a result, many companies are essentially flying blind, reacting when something goes wrong rather than proactively addressing data quality.

Data trust provides a legal framework for managing shared data. It promotes collaboration through common rules for data security, privacy, and confidentiality; and enables organizations to securely connect their data sources in a shared repository of data.

Bigeye brings data engineers, analysts, scientists, and stakeholders together to build trust in data. Its platform helps companies automate monitoring and anomaly detection and create SLAs to ensure data quality and reliable pipelines.

With complete API access, a user-friendly interface, and automated yet flexible customization, data teams can monitor quality, proactively detect and resolve issues, and ensure that every user can rely on the data.

Uber Data Experience

Two early members of the data team at Uber — Kirwan and Bigeye Co-founder and CTO Egor Gryaznov — set out to use what they learned building Uber’s scale to create easier-to-deploy SaaS tools for data engineers.

Kirwan was one of Uber’s first data scientists and the first metadata product manager. Gryaznov was a staff-level engineer who managed Uber’s Vertica data warehouse and developed several internal data engineering tools and frameworks.

They realized the tools their teams were building to manage Uber’s massive data lake and thousands of internal data users were far ahead of what was available to most data engineering teams.

Automatically monitoring and detecting reliability issues within thousands of tables in data warehouses is no easy task. Companies like Instacart, Udacity, Docker, and Clubhouse use Bigeye to keep their analytics and machine learning working continually.

A Growing Field

Founding Bigeye in 2019, they recognized the growing problem enterprises face in deploying data into high-ROI use cases like operations workflows, machine learning-powered products and services, and strategic analytics and business intelligence-driven decision making.

The data observability space saw a number of entrants in 2021. Bigeye separated itself from that pack by providing users the ability to automatically assess customer data quality with more than 70 unique data quality metrics.

These metrics are trained with thousands of separate anomaly detection models to ensure data quality problems — even the hardest to detect — never make it past the data engineers.

Last year, data observability burst onto the scene with no less than ten data observability startups announcing significant funding rounds.

This year, data observability will become a priority for data teams as they seek to balance the demand of managing complex platforms with the need to ensure data quality and pipeline reliability, Kirwan predicted.

Solution Rundown

Bigeye’s data platform is no longer in beta. Some enterprise-grade features are still on the roadmap, like complete role-based access control. But others, like SSO and in-VPC deployments are available today.

The app is closed source, and so are the proprietary models used for anomaly detection. Bigeye is a big fan of open-source options but decided to develop its own to achieve the performance goals internally set.

Machine learning is used in a few key places to bring a unique blend of metrics to each table in a customer’s connected data sources. The anomaly detection models are trained on each of those metrics to detect abnormal behavior.

Three features built-in at the end of 2021 automatically detect and alert on data quality issues and enable data quality SLAs.

The first, Deltas, makes it easy to compare and validate multiple versions of any dataset.

Issues, the second, bring multiple alerts together into a single timeline with valuable context about related issues. This makes it simpler to document past fixes and speed up resolutions.

The third, Dashboard, provides an overall view of the health of the data, helping to identify data quality hotspots, close gaps in monitoring coverage, and quantify a team’s improvements to reliability.

Eyeballing Data Warehouses

TechNewsWorld spoke with Kirwan to demystify some of the complexities his company’s data sniffing platform offers data scientists.

TechNewsWorld: What makes Bigeye’s approach innovative or cutting edge?

Kyle Kirwan
Bigeye Co-founder and CEO
Kyle Kirwan, co-founder and CEO of Bigeye

Kyle Kirwan: Data observability requires constant and complete knowledge of what is happening inside all the tables and pipelines in your data stack. It is similar to what SRE [site reliability engineering] and DevOps teams use to keep applications and infrastructure working around the clock. But it is reimagined for the world of data engineering and data science.

While data quality and data reliability have been an issue for decades, data applications are now critical to how many leading businesses run; because any loss of data, outage, or degradation can quickly result in lost revenue and customers.

Without data observability, data dealers must constantly react to data quality issues and have to wrangle the data as they go to use it. A better solution is identifying the issues proactively and fixing the root causes.

How does trust impact the data?

Kirwan: Often, problems are discovered by stakeholders like executives who do not trust their often-broken dashboard. Or users get confusing results from in-product machine learning models. The data engineers can better get ahead of the problems and prevent business impact if they are alerted early enough.

How is this concept different from similar-sounding technologies such as unified data management?

Kirwan: Data observability is one core function within data operations (think: data management). Many customers look for best-of-breed solutions for each of the functions within data operations. This is why technologies like Snowflake, Fivetran, Airflow, and dbt have been exploding in popularity. Each is considered an important part of “the modern data stack” rather than a one-size-fits-none solution.

Data observability, data SLAs, ETL [extract, transform, load] code version control, data pipeline testing, and other techniques should be used in tandem to keep modern data pipelines all working smoothly. Just like high-performance software engineers and DevOps teams use their sister techniques.

What role do data pipeline and DataOps play with data visibility?

Kirwan: Data observability is closely related to DataOps and the emerging practice of data reliability engineering. DataOps refers to the broader set of all operational challenges that data platform owners will face. Data reliability engineering is a part of data ops, but only a part, just as site reliability engineering is related to, but does not encompass all of DevOps.

Data observability could have benefits to data security, as it could be used to identify unexpected changes in query volume on different tables or changes in behavior to ETL pipelines. However, data observability would not likely be a complete data security solution on its own.

What challenges does this technology face?

Kirwan: These challenges cover problems like data discovery and governance, cost tracking and management, and access controls. It also covers how to manage an ever-growing number of queries, dashboards, and ML features and models.

Reliability and uptime are certainly challenges for which many DevOps teams are responsible. But they are often also charged with other aspects like developer velocity and security considerations. Within these two areas, data observability enables data teams to know whether their data and data pipelines are error-free.

What are the challenges of implementing and maintaining data observability technology?

Kirwan: Effective data observability systems should integrate into the workflows of the data team. This enables them to focus on growing their data platforms rather than constantly reacting to data issues and putting out data fires. A poorly tuned data observability system, however, can result in a deluge of false positives.

An effective data system should also take much of the maintenance out of testing for data quality issues by automatically adapting to changes in the business. A poorly optimized data observability system, however, may not correct for changes in the business or overcorrect for changes in the business, requiring manual tuning, which can be time-consuming.

Data observability can also be taxing on the data warehouse if not optimized properly. The Bigeye teams have experience optimizing data observability at scale to ensure that the platform does not impact data warehouse performance.

Jack M. Germain has been an ECT News Network reporter since 2003. His main areas of focus are enterprise IT, Linux and open-source technologies. He is an esteemed reviewer of Linux distros and other open-source software. In addition, Jack extensively covers business technology and privacy issues, as well as developments in e-commerce and consumer electronics. Email Jack.

Leave a Comment

Please sign in to post or reply to a comment. New users create a free account.

Related Stories
More by Jack M. Germain
More in Data Management
EXCLUSIVE INTERVIEW

The Business Case for Clean Data and Governance Planning

Do you know if your company’s data is clean and well managed? Why does that matter anyway?

Without a working governance plan, you might not have a company to worry about — data-wise.

Data governance is a collection of practices and processes establishing the rules, policies, and procedures that ensure data accuracy, quality, reliability, and security. It ensures the formal management of data assets within an organization.

Everyone in business understands the need to have and use clean data. But ensuring that it is clean and usable is a big challenge, according to David Kolinek, vice president of product management at Ataccama.

That challenge is even greater when business users must rely on scarce technical resources. Often, no one person oversees data governance, or that individual lacks a complete understanding of how the data will be used and how to clean it.

This is where Ataccama comes into play. The company’s mission provides a solution that even people without technical knowledge, such as SQL skills, can use to find the data they need, evaluate its quality, understand how to fix any issues, and determine whether that data will serve their purposes.

“With Ataccama, business users don’t need to involve IT to manage, access, and clean their data,” Kolinek told TechNewsWorld.

Keeping Users in Mind

Ataccama was founded in 2007 and basically bootstrapped.

It started as a part of Adastra, a consulting company, which is still in business today. However, Ataccama’s was focused on software rather than consulting. So management spun off that operation as a product company that addresses data quality issues.

Ataccama started with a basic approach — an engine that performed basic data cleansing and transformation. But this still required an expert user because of the user-provided configuration.

“So, we added a visual presentation for the steps that enable data transformation and things like cleansing. This made it a low-code platform since the users were able to do the majority of the work just by using the application user interface. But it was still a thick-client platform,” Kolinek explained.

The current version, however, is designed with a non-technical user in mind. The software includes a thin client, a focus on automation, and an easy-to-use interface.

“But what really stands out is the user experience, which is built off the seamless integration we were able to achieve with the 13th version of our engine. It delivers robust performance that’s tuned to perfection,” he offered.

Digging Deeper Into Data Management Issues

I asked Kolinek to discuss the data governance and quality issues further. Here is our conversation.

TechNewsWorld: How does Ataccama’s concept of centralizing or consolidating data management differ from other cloud systems such as Microsoft, Salesforce, AWS, and Google Cloud?

David Kolinek: We are platform agnostic and do not target one specific technology. Microsoft and AWS have their own native solutions that work well, but only within their own infrastructure. Our portfolio is wide open so it can serve all the use cases that must be covered across any infrastructure.

Further, we have data processing capabilities that not all cloud providers possess. Metadata is useful for automated processing, generating more metadata, which in turn can be used for additional analytics.

We developed both of these technologies in-house so we can provide native integration. As a result, we can deliver a superior user experience and a whole lot of automation.

How is this concept different from the notion of standardization of data?

David Kolinek
David Kolinek
VP of Product Management,
Ataccama

Kolinek: Standardization is just one of many things we do. Usually, standardization can be easily automated, the same way we can automate cleansing or data enrichment. We also provide manual data correction when solving some issues, like a missing social security number.

We cannot generate the SSN, but we could come up with a date of birth from other information. So, standardization is not different. It is a subset of things that improve quality. But for us, it is not only about data standardization. It is about having good quality data so information can be properly leveraged.

How does Ataccama’s data management platform benefit users?

Kolinek: The user experience is really our biggest benefit, and the platform is ideal for handling multiple personas. Companies need to enable both business users and IT people when it comes to data management. That requires a solution for business and IT to collaborate.

Another enormous benefit of our platform is the strong synergy between data processing and metadata management it provides.

The majority of other data management vendors cover only one of these areas. We also use machine learning and a rules-based approach and validation/standardization, which, again, are often not both supported by other vendors.

Also, because we are technology agnostic, users can connect to many different technologies from the same platform. With edge processing, for instance, you can configure something once in Ataccama ONE, and the platform will translate it for different platforms.

Does Ataccama’s platform lock-in users the way proprietary software often does?

Kolinek: We developed all the core components of the platform ourselves. They are tightly integrated together. There has been a huge wave of acquisitions lately in this space, with big vendors buying smaller ones to fill in gaps. In some cases, you are not really buying and managing one platform, but many.

With Ataccama, you can purchase just one module, like data quality/standardization, and later expand to others, such as master data management (MDM). It all works together seamlessly. Just activate our modules as you need them. This makes it easy for customers to start small and expand when the time is right.

Why is a unified data platform so important in this process?

Kolinek: The biggest benefit of a unified platform is that companies are not looking for a point solution to solve just a single problem, like data standardization. It is all interconnected.

For instance, to standardize you must validate the quality of the data, and for that, you must first find and catalog it. If you have an issue, even though it may look like a discrete problem, it more than likely involves many other aspects of data management.

The beauty of a unified platform is that in most use cases, you have one solution with native integration, and you can start using other modules.

What role do AI and ML play today in data governance, data quality, and master data management? How is it changing the process?

Kolinek: Machine learning enables customers to be more proactive. Previously, you would identify and report an issue. Someone would have to investigate what went awry and see if there was something wrong with the data. Then you would create a rule for data quality to prevent a recurrence. That is all reactive and is based on something breaking down, being found, reported, and then fixed.

Again, ML lets you be proactive. You give it training data instead of rules. The platform then detects differences in patterns and identifies anomalies to alert you before you even realized there was an issue. This is not possible with a rules-based approach, and it is much easier to scale if you have huge amounts of data sources. The more data you have, the better the training and its accuracy will be.

Other than cost savings, what benefits can enterprises gain through consolidating their data repositories? For instance, does it improve security, CX outcomes, etc.?

Kolinek: It does improve security and mitigates potential future leaks. For example, we had customers who were storing data that no one was using. In many cases, they did not even know the data existed! Now, they are not only unifying their technology stack, but they can also see all the stored data.

Onboarding new people onto the platform is also much easier with consolidated data. The more transparent the environment, the sooner people can use it and start gaining value.

It is not so much about saving money as it is about leveraging all your data to generate a competitive advantage and generate additional revenue. It provides data scientists with the means to build things that will advance the business.

What are the steps in adopting a data management platform?

Kolinek: Begin with the initial analysis. Focus on the biggest issues the company wants to tackle and select the platform modules to address them. Defining goals is key at this stage. What KPIs do you want to target? What level of ID do you want to achieve? These are questions you need to ask.

Next, you need a champion to advance execution and identify the main stakeholders who could drive the initiative. That requires extensive communications among different stakeholders, so it is vital to have someone focused on educating others about the benefits and helping teams onboard the system. Then comes the implementation phase where you address the key issues identified in the analysis, followed by rollout.

Finally, think about the next set of issues that need to be addressed, and if needed, enable additional modules in the platform to achieve those goals. The worst thing to do is purchase a tool and provide it, but offer no service, education, or support. This will ensure that adoption will be low. Education, support, and service are very important for the adoption phase.

Jack M. Germain has been an ECT News Network reporter since 2003. His main areas of focus are enterprise IT, Linux and open-source technologies. He is an esteemed reviewer of Linux distros and other open-source software. In addition, Jack extensively covers business technology and privacy issues, as well as developments in e-commerce and consumer electronics. Email Jack.

Leave a Comment

Please sign in to post or reply to a comment. New users create a free account.

Related Stories