Kiji Project

Modular, open-source framework for collecting, analyzing, and serving entity data in real time.  Apache 2.0 Licensed.


Download Bento Box 2.0.1

View on Github

  • SF Kiji User Group

Hadoop Logo        HBase Logo        Avro Logo       Scalding Logo        Cassandra Logo

About the Kiji Project

What is Kiji?

Kiji is an Apache 2.0 Licensed, open-source framework for storing and serving user data to enable real-time personalization as users interact across channels. Kiji allows for batch model training and real-time model scoring, ensuring the user experience is adapted with each interaction.

Kiji is developed on the Hadoop ecosystem, using HBase as its original underlying data store. With Kiji, developers can create a flexible and comprehensive entity-centric schema that enables a 360 degree view of each customer. Data is stored in a rich, compressed, binary Avro format allowing the application to support complex data types. Kiji handles all aspects of serialization and deserialization while maintaining schema metadata, ensuring backwards compatibility as an application’s schema evolves, and captures real-time application interactions.

Why Kiji?

Most organizations are collecting and storing data in a distributed file systems, such as HDFS, and key value stores, such as HBase, in order to better serve individual customers. However, these tools are very raw, difficult to use, and have no underlying framework for integration. Kiji is the middleware necessary to ingest detail data, stream real-time data, build predictive models and deploy those models on the fly. The various components of Kiji address the common use cases and solve the common challenges experienced by: developers, engineers, analysts and data scientists.

Kiji Components

The Kiji Project is modularized into separate components to support a wide range of usage and encourage clean separation of functionality. The Bento Box contains all Kiji modules assembled in a self-contained download. Each module can also be individually downloaded on GitHub.

Kiji Bento Box

  • KijiSchema: simplifies real-time storage and retrieval of diverse data from primitive types to objects, time-series and event streams. KijiSchema handles challenges with serialization, schema design and evolution, and meta data management common in NoSQL storage solutions.

    KijiSchema DDL Shell: provides a Data Definition Language that allows for the creation, inspection, and modification of schemas for KijiSchema.

  • KijiMapReduce: provides a powerful paradigm to apply MapReduce in both batch and real-time workloads. KijiMapReduce introduces producers to perform record-wise analytics and gatherers, which build predictive models by analyzing aggregate behaviors.

    KijiMapReduce Library: is a library of helpful examples and useful implementations of MapReduce jobs that can be created within Kiji.

    Kiji Hive Adapter: provides HiveQL access to Kiji data through a familiar SQL shell.

    KijiExpress: provides a Scala-based scripting language for analyzing Kiji data via Scalding.

  • Kiji Model Repository: is a library of machine learning tools built on top of KijiExpress.

  • KijiREST: provides an HTTP REST API for front-end developers to access Kiji data and to trigger model scoring.

    KijiScoring: provides the real-time scoring of predictive models within your application.

Get Involved in the Kiji Community


Download the Kiji Bento Box