Where and When
- 2017.08.21 - 25
- Location: Adobe Basel, Meeting Room: Joggeli (2nd floor)
Attendees
Who |
When |
Marcel Reutegger |
21. - 24. |
Robert Munteanu |
22. - 24. |
Alex Deparvu |
21. - 24. |
Michael Dürig |
21. - 23. + either 24. or 25. |
Kevin Wellenzohn |
21. - 24. |
Tommaso Teofili |
21. - 24. |
Thomas Mueller |
21. - 24. (tentative) |
Chetan Mehrotra |
21. - 25. (remote) |
Valentin Olteanu |
21. - 25. (partially available) |
Matt Ryan |
21. - 25. (afternoons - remote) |
Tomek Rekawek |
21. - 25. |
Julian Sedding |
21. + 22. |
Topics/Discussions/Goals
Title |
Summary |
Effort |
Participants |
Example |
Describe what the topic is about. Is is mainly a discussion, implementing a feature, or a POC? Add links to further information and details. |
The expected effort. 2h, 1d, full week, etc. |
Add names of participants. |
Key Signing Party |
To update the Oak team's WOT. |
1h |
Open to anyone. |
Update OSGi Annotations in Oak |
Proposal to update the OSGi annotation to latest standard as described here. Mostly removal or org.apache.felix.scr.annotations, others can be tackled if found. |
1 day |
AlexD, Robert |
Workload-aware Property Index |
The workload-aware index aims to improve update performance and reduce the number of index conflicts. Discussion of an implementation in Oak. |
2h |
Kevin Wellenzohn, Thomas Mueller (tentative) |
Next phase of hybrid index to enable support for unique index and sync indexes |
2 d |
Chetan Mehrotra, Thomas Mueller (tentative) |
|
New scalable design for async indexer |
2 d |
Chetan Mehrotra, Thomas Mueller (tentative) |
|
Finalize and review composite node store |
Make sure the current version is ready for production |
2d |
Robert Munteanu, Tomek Rekawek |
Implement metrics |
Implement Metrics across Oak and prototype RRDTool persistence |
3d |
Marcel, Thomas Mueller (tentative), Alex D |
Modularization |
Anythign left to be done here? Robert: Split out oak-core more possibly - document-based storage is a candidate |
3d |
Robert (if composite and OSGi annotations are done in time), Marcel, Thomas Mueller (tentative), Alex D |
Manage Jenkins configurations in SVN |
Stop relying on the Jenkins UI for configuration, but instead use a Jenkinsfile or the DSL Job plugin |
0.5d |
Robert |
Segment store analyser |
Implement a tool to analyse the content of a segment store |
2-5d |
Michael |
Provide JCR node info to blob store |
Prototype a way to provide JCR node information - path, node type, properties and property values - to blob store, for Composite Blob Store |
2 d |
Matt Ryan |
Unmanaged binary support |
Deliver a prototype demo of at least one unmanaged binary support use case (see JCR Binary Usecase UC1, UC5, UC6, UC13). UC13 is proposed but could be another. |
5 d |
Matt Ryan |
Work out a way to: (1) gather data about sequences of nodes read from the DocumentStore, (2) use this data to predict what nodes will be requested soon. |
5d |
Tomek Rekawek, Thomas Mueller (tentative) |
Agenda Proposal
There are two long running tracks not listed in the below Agenda: Segment store analyser lead by Michael and Blob store & Unmanaged binaries lead by Matt. Those two tracks are mostly independent from the other proposed topics and will run for the full week of the Oakathon. The General track is mostly administrative stuff and intended for all participants.
Mon |
General |
Index Track |
Misc 1 |
Misc 2 |
9:00-12:30 |
9:00 Setup |
10:30 Workload-aware Property Index| | |
13:30-17:00 |
|
Hybrid Index v2 |
Update OSGi annotations |
Metrics |
|
|
|||
Tue |
|
|
|
|
9:00-12:30 |
|
9:30 Hybrid Index v2 intro by Chetan |
Composite NS |
Metrics |
13:30-17:00 |
|
Hybrid Index v2 |
14:00 Composite BlobStore & Unmanaged Binary intro by Matt |
|
|
|
|||
Wed |
|
|
|
|
9:00-12:30 |
|
Hybrid Index v2 |
Composite NS |
Modularize DocNS |
13:30-17:00 |
13:30 Key Signing Party (1h) |
Journal based async indexer |
||
|
|
|||
Thu |
|
|
|
|
9:00-12:30 |
|
Journal based async indexer |
Composite NS |
Modularize DocNS |
Lunch sponsored by Adobe |
|
|||
13:30-17:00 |
15:00- Wrap-up results |
|
|
|
|
|
|||
Fri |
|
|
|
|
9:00-12:30 |
|
Journal based async indexer |
|
|
13:30-17:00 |
|
|
|
Prep Work
Notes from the Oakathon
Oak Tooling API
See also OAK-6584.
Current situation
Current segment store related tools are implemented ad-hoc by potentially relying on internal implementation details of Oak Segment Tar. This makes those tools less useful, portable, stable and potentially applicable than they should be.
Goal
Provide a common and sufficiently stable Oak Tooling API for implementing segment store related tools. The API should be independent of Oak and not available for normal production use of Oak. Specifically it should not be possible to it to implement production features and production features must not rely on it. It must be possible to implement the Oak Tooling API in Oak 1.8 and it should be possible for Oak 1.6.
Typical use cases
- Query the number of nodes / properties / values in a given path satisfying some criteria
- Aggregate a certain value on queries like the above
- Calculate size of the content / size on disk
- Analyse changes. E.g. how many binaries bigger than a certain threshold were added / removed between two given revisions. What is the sum of their sizes?
- Analyse locality: measure of locality of node states. Incident plots (See https://issues.apache.org/jira/browse/OAK-5655?focusedCommentId=15865973&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15865973).
- Analyse level of deduplication (e.g. of checkpoint)
Validation
Reimplement Script Oak on top of the tooling API.
API draft
- Whiteboard shot of the IMG_20170822_163256.jpg identified so far.
- Drafting of the API takes place on Github for now. We'll move to the Apache SVN as soon as considered mature enough.
Metrics
A metrics reporter using RRD4J has been implemented in Apache Sling (SLING-7055). It writes metrics periodically into a time-series database on the local file system.
Open questions
- How expensive is it to gather metrics? What if reading a metric is too expensive because the custom metric implementation is troublesome? Should the reporter detect and blacklist such metrics? Alternatively, the reporter could automatically increase the interval for reading such metrics.
- It would be useful to also have RRD archives with MAXIMUM consolidation function. E.g. for tracking the observation queue length.
- How should timer and histogram metrics be written to RRD? Those are not simple values, but expose already aggregated data.
More metrics
It may be useful to have more metrics OOTB in Oak.
- MongoDB metrics exposed by the MongoDocumentStore (storage size, free blocks, caches, etc.)
- RDB metrics exposed by the RDBDocumentStore
Other areas?
Composite Node Store
A high-level review of the composite approach was done. A number of Jira issues were opened as a result:
- OAK-6578 - Enhance the IndexStoreStrategy to return list of matching values and paths
- OAK-6579 - Define how the counter index works in a composite setup
- OAK-6580 - Ensure mounts are consistent with the node type registry
- OAK-6581 - Ensure mounts are consistent with the namespace registry
- OAK-6577 - Determine the approach for reindexing in case of CompositeNodeStore setups
- OAK-6582 - Review MBean interactions in a composite setup
Open questions
- Do we need additional ops tooling to help with diagnosing/recovering from failed mount-time checks?
Update OSGi Annotations in Oak
Work in progress, tracked under OAK-6741 - Switch to OSGi R6 annotations