A next-generation SQL processing engine, Apache Tajo brings together the latest advances in distributed processing and query optimization, delivering traditional database performance on both routine and massive data sets.
The graduation of Tajo from the Apache Incubator Program comes at an exciting time for the Hadoop-based data warehouse solution.
“This is a fantastic milestone for our team of committers and contributors across the world, and we are honored to have our work recognized by our peers and the ASF”, said a clearly chuffed Hyunsik Choi, PMC Chair and project co-founder.
With a major release adding convenient Hive integration features, table partitioning, expanded SQL data type and function support—among many other enhancements—due early this month, Apache Tajo is gaining increasing recognition from industry watchers such as the Gartner Group, despite keeping a low profile during its incubation.
According to Choi, the Apache Tajo team is trying to assemble a feature set that positions enterprise users for the long haul.
“The successful merging of the Hadoop and the traditional SQL spaces rests heavily on the application of sophisticated database techniques to the task of crunching big data. That means absorbing both the latest distributed processing methods and the latest query optimization techniques as quickly as possible in order to give users a market edge.”
With its roots in the prestigious Database Lab at Korea University, Apache Tajo has a global pedigree which puts it in good stead as it quietly garners important users across the world. Its team of international committers are enthusiastic and focused, having taken the project through the Apache Incubator program in a single year.
Dr. Chris Mattmann, Chief Architect in the Instrument and Science Data Systems Section at the renowned NASA Jet Propulsion Laboratory at Caltech, Pasadena, and the project’s ASF Incubator Mentor, says Tajo has been a “model community” which has demonstrated its merit in impressive big data analytics company.
Staff Software Engineer at LinkedIn and ASF Member, Jakob Homan, concurs, believing Tajo is not only a “cool” technology, but also “…an excellent example of a community building around a core piece of technology”.
Sponsored by South Korea’s go-to Hadoop experts, Gruter, Apache Tajo has been undergoing rigorous testing in a deployment at Korea’s largest mobile carrier, SK Telecom. The field test has seen Tajo pitted against enormous workloads, with the results impressing SK Telecom, itself now a significant contributor to the open-source project.
Keuntae Park, a Senior Developer in the Big Data Team at SK Telecom and an Apache Tajo committer, explains:
“As a mobile carrier, not only do we have mountains of data, but we also have a very demanding marketing agenda at the analytics end, so we’ve been able to submit Apache Tajo to the full rigors of petabyte-scale workloads. So far, our testing has shaved multiple days off our old Hive-based reporting process…Tajo is proving itself a serious alternative to enterprise DW on our workloads.”
The software is also being tested at other notable organizations around the world, including NASA’s Jet Propulsion Laboratory in Pasadena, where Mattman’s Instrument and Science Data Systems Section are evaluating its query processing and storage capabilities in Radio Astronomy and Airborne Snow Observatory (ASO) projects.
Such significant support in the field points to exciting times ahead for Apache Tajo as it helps push the boundaries of information processing in the brave new world of big data.
While achieving Top-Level Project status with the Apache Software Foundation has been a great achievement for all involved, including project stalwarts such as Korea University doctoral candidate and project co-founder, Jihoon Son, and Gruter Tajo committer, Jae-hwa Jung, the Apache Tajo team barely has time for a celebratory ale.
Reports Jung, “We’re currently in the process of adding multi-tenancy and a window function, after which we’ll be continuing our longer-term work on frontier features such as JIT query compilation and a vectorized engine. So it’s a case of no rest for the wicked now we’re getting so much international take up.”
Young-kil Kwon, CEO of major project sponsor, Gruter, chimes in, “I think they deserve a few sojus and a good night on the town after all they’ve achieved over the past year. They’ve done a great job, and we couldn’t be prouder.”
The Apache Tajo team is planning to release Tajo 0.8 early this month.
Apache Tajo Select Features
Fully distributed SQL query processing on large data sets stored in HDFS
Standard ANSI/ISO SQL 2003 Standard support
Hive compatibility through HiveQL mode
Cost-based join optimization
Extensible rewrite rule engine
Tajo Resource Manager specialized for low-latency queries
Apache Tajo Further Information