Commit f8e85b35 authored by acoburn's avatar acoburn
Browse files

add triplestore indexer

parent 55adc00f
......@@ -38,6 +38,7 @@ These modules listen to repository events and react accordingly.
* [`acrepo-connector-idiomatic`](acrepo-connector-idiomatic): Id Mapping Service: This maps a public ID to a (internal and typically much longer) Fedora URI
* [`acrepo-connector-idiomatic-mysql`](acrepo-connector-idiomatic-mysql): Id Mapping Service Database: This exposes a MySQL datastore for use with the Id Mapping service
* [`acrepo-connector-idiomatic-pgsql`](acrepo-connector-idiomatic-pgsql): Id Mapping Service Database: This exposes a Postgres datastore for use with the Id Mapping service
* [`acrepo-connector-triplestore`](acrepo-connector-triplestore): Triplestore Indexing Service: This indexes Fedora resources into named graphs in an external triplestore.
Other OSGi Features
-------------------
......@@ -73,6 +74,7 @@ command from its shell:
feature:install acrepo-connector-idiomatic
feature:install acrepo-connector-idiomatic-mysql
feature:install acrepo-connector-idiomatic-pgsql
feature:install acrepo-connector-triplestore
feature:install acrepo-exts-fits
feature:install acrepo-exts-image
......
Amherst College Triplestore Indexer
===================================
The Triplestore indexer will index Fedora content into an external triplestore. It differs
from the `fcrepo-indexer-triplestore` in that each resource is indexed into its own named
graph.
`edu.amherst.acdc.connector.triplestore.cfg` is the configuration file for this service.
Deploying in OSGi
-----------------
This project can be deployed in an OSGi container. For example using
[Apache Karaf](http://karaf.apache.org) version 4.x and above, you can run the following
command from its shell:
feature:repo-add mvn:edu.amherst.acdc/acrepo-karaf/LATEST/xml/features
feature:install fcrepo-service-activemq
feature:install acrepo-connector-triplestore
Configuration
-------------
This application can be configured by creating the following configuration
file `$KARAF_HOME/etc/edu.amherst.acdc.connector.triplestore.cfg`. The following
values are available for configuration:
The Camel URI for the incoming message stream
input.stream=broker:queue:fedora
In the event of failure, the maximum number of times a redelivery will be attempted.
error.maxRedeliveries=10
It is possible to control the representation of fedora resources with Prefer headers
by including or excluding certain types of triples. For instance, `ldp:contains` triples
are excluded by default. This is so because, for large repositories, the `ldp:contains` triples
may number in the hundreds of thousands or millions of triples, which lead to very large
request/response sizes. It is important to note that `fedora:hasParent` functions as a logical
inverse of `ldp:contains`, so in the context of a triplestore, you can use the inverse
property in SPARQL queries to much the same effect. Alternately, a built-in reasoner will
allow you to work directly with `ldp:contains` triples even if they haven't been explicitly
added to the triplestore.
prefer.omit=http://www.w3.org/ns/ldp#PreferContainment
prefer.include=
The camel URI for handling reindexing events.
triplestore.reindex.stream=broker:queue:triplestore.reindex
The base URL of the triplestore being used.
triplestore.baseUrl=http://localhost:8080/fuseki/test/update
The Fedora configuration.
fcrepo.baseUrl=http://localhost:8080/fcrepo/linkeddata
fcrepo.authUsername=
fcrepo.authPassword=
A comma-delimited list of URIs to filter. That is, any Fedora resource that either matches or is contained in one of
the URIs listed will not be processed by the application.
filter.containers=http://localhost:8080/fcrepo/linkeddata/test
By editing this file, any currently running routes in this service will be immediately redeployed
with the new values.
More information
----------------
For more information, please visit [Apache Camel](http://camel.apache.org) documentation
apply plugin: 'osgi'
description = 'Triplestore Indexer'
dependencies {
compile group: 'org.apache.camel', name: 'camel-core', version: camelVersion
compile group: 'org.apache.camel', name: 'camel-http4', version: camelVersion
compile group: 'org.apache.camel', name: 'camel-blueprint', version: camelVersion
compile(group: 'org.fcrepo.camel', name: 'fcrepo-camel', version: fcrepoCamelVersion) {
exclude(module: 'slf4j-log4j12')
}
}
jar {
manifest {
description project.description
docURL project.docURL
vendor project.vendor
license project.license
instruction 'Import-Package', "org.apache.camel,org.fcrepo.camel,org.apache.camel.component.http4,${defaultOsgiImports}"
instruction 'Export-Package', "edu.amherst.acdc.connector.triplestore;version=${projectOsgiVersion}"
}
}
artifacts {
archives (file('build/cfg/main/edu.amherst.acdc.connector.triplestore.cfg')) {
classifier 'configuration'
type 'cfg'
}
}
# Which queue/topic to listen to on the above broker
input.stream=broker:queue:fedora
# In the event of failure, the maximum number of times a redelivery will be attempted.
error.maxRedeliveries=10
# Control prefer headers
prefer.omit=http://www.w3.org/ns/ldp#PreferContainment
prefer.include=
# The Camel URI for handling reindexing events
triplestore.reindex.stream=broker:queue:triplestore.reindex
# The base URL of the triplestore being used
triplestore.baseUrl=http://localhost:8080/fuseki/test/update
# Containers to filter
filter.containers=
# Fedora connection information
fcrepo.baseUrl=http://localhost:8080/fcrepo/linkeddata
fcrepo.authUsername=
fcrepo.authPassword=
/*
* Copyright 2016 Amherst College
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package edu.amherst.acdc.connector.triplestore;
import static java.net.URLEncoder.encode;
import static java.util.stream.Collectors.toList;
import static org.apache.camel.Exchange.CONTENT_TYPE;
import static org.apache.camel.Exchange.HTTP_METHOD;
import static org.apache.camel.builder.PredicateBuilder.in;
import static org.apache.camel.builder.PredicateBuilder.not;
import static org.apache.camel.builder.PredicateBuilder.or;
import static org.apache.camel.util.ExchangeHelper.getMandatoryHeader;
import static org.fcrepo.camel.FcrepoHeaders.FCREPO_NAMED_GRAPH;
import static org.fcrepo.camel.FcrepoHeaders.FCREPO_EVENT_TYPE;
import static org.fcrepo.camel.FcrepoHeaders.FCREPO_URI;
import static org.fcrepo.camel.processor.ProcessorUtils.tokenizePropertyPlaceholder;
import static org.slf4j.LoggerFactory.getLogger;
import java.io.IOException;
import java.io.UncheckedIOException;
import org.apache.camel.LoggingLevel;
import org.apache.camel.builder.RouteBuilder;
import org.fcrepo.camel.processor.EventProcessor;
import org.fcrepo.camel.processor.SparqlUpdateProcessor;
import org.slf4j.Logger;
/**
* A content router for handling Fedora events.
*
* @author Aaron Coburn
*/
public class TriplestoreRouter extends RouteBuilder {
private static final Logger LOGGER = getLogger(TriplestoreRouter.class);
private static final String RESOURCE_DELETION = "http://fedora.info/definitions/v4/event#ResourceDeletion";
private static final String DELETE = "https://www.w3.org/ns/activitystreams#Delete";
/**
* Configure the message route workflow.
*/
public void configure() throws Exception {
/**
* A generic error handler (specific to this RouteBuilder)
*/
onException(Exception.class)
.maximumRedeliveries("{{error.maxRedeliveries}}")
.log("Index Routing Error: ${routeId}");
/**
* route a message to the proper queue, based on whether
* it is a DELETE or UPDATE operation.
*/
from("{{input.stream}}")
.routeId("FcrepoTriplestoreRouter")
.process(new EventProcessor())
.setHeader(FCREPO_NAMED_GRAPH).header(FCREPO_URI)
.choice()
.when(or(header(FCREPO_EVENT_TYPE).contains(RESOURCE_DELETION),
header(FCREPO_EVENT_TYPE).contains(DELETE)))
.to("direct:delete.triplestore")
.when(not(header(FCREPO_URI).contains("#")))
.to("direct:index.triplestore");
/**
* Handle re-index events
*/
from("{{triplestore.reindex.stream}}")
.routeId("FcrepoTriplestoreReindex")
.setHeader(FCREPO_NAMED_GRAPH).header(FCREPO_URI)
.to("direct:index.triplestore");
/**
* Based on an item's metadata, determine if it is indexable.
*/
from("direct:index.triplestore")
.routeId("FcrepoTriplestoreIndexer")
.filter(not(in(tokenizePropertyPlaceholder(getContext(), "{{filter.containers}}", ",").stream()
.map(uri -> or(
header(FCREPO_URI).startsWith(constant(uri + "/")),
header(FCREPO_URI).isEqualTo(constant(uri))))
.collect(toList()))))
.removeHeaders("CamelHttp*")
.to("direct:update.triplestore");
/**
* Remove an item from the triplestore index.
*/
from("direct:delete.triplestore")
.routeId("FcrepoTriplestoreDeleter")
.log(LoggingLevel.INFO, LOGGER,
"Deleting Triplestore Graph ${headers[CamelFcrepoUri]}")
.setHeader(HTTP_METHOD).constant("POST")
.setHeader(CONTENT_TYPE).constant("application/x-www-form-urlencoded; charset=utf-8")
.process(e -> e.getIn().setBody(sparqlUpdate(deleteAll(getMandatoryHeader(e, FCREPO_URI, String.class)))))
.to("{{triplestore.baseUrl}}?useSystemProperties=true");
/**
* Perform the sparql update.
*/
from("direct:update.triplestore")
.routeId("FcrepoTriplestoreUpdater")
.to("fcrepo:{{fcrepo.baseUrl}}?accept=application/n-triples" +
"&preferOmit={{prefer.omit}}&preferInclude={{prefer.include}}")
.process(new SparqlUpdateProcessor())
.log(LoggingLevel.INFO, LOGGER,
"Indexing Triplestore Object ${headers[CamelFcrepoUri]}")
.to("{{triplestore.baseUrl}}?useSystemProperties=true");
}
private static String deleteAll(final String graphName) {
return "DELETE WHERE { GRAPH <" + graphName + "> { ?s ?p ?o } }";
}
private static String sparqlUpdate(final String command) {
try {
return "update=" + encode(command, "UTF-8");
} catch (final IOException ex) {
throw new UncheckedIOException(ex);
}
}
}
<?xml version="1.0" encoding="UTF-8"?>
<blueprint
xmlns="http://www.osgi.org/xmlns/blueprint/v1.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:cm="http://aries.apache.org/blueprint/xmlns/blueprint-cm/v1.1.0"
xsi:schemaLocation="
http://aries.apache.org/blueprint/xmlns/blueprint-cm/v1.1.0 http://aries.apache.org/schemas/blueprint-cm/blueprint-cm-1.1.0.xsd
http://www.osgi.org/xmlns/blueprint/v1.0.0 http://www.osgi.org/xmlns/blueprint/v1.0.0/blueprint.xsd
http://camel.apache.org/schema/blueprint http://camel.apache.org/schema/blueprint/camel-blueprint.xsd">
<cm:property-placeholder persistent-id="edu.amherst.acdc.connector.triplestore" update-strategy="reload">
<cm:default-properties>
<cm:property name="input.stream" value="broker:queue:fedora" />
<cm:property name="error.maxRedeliveries" value="10" />
<cm:property name="prefer.omit" value="http://www.w3.org/ns/ldp#PreferContainment" />
<cm:property name="prefer.include" value="" />
<cm:property name="triplestore.reindex.stream" value="broker:queue:triplestore.reindex" />
<cm:property name="triplestore.baseUrl" value="http://localhost:8080/fuseki/test/update" />
<cm:property name="filter.containers" value="" />
<cm:property name="fcrepo.baseUrl" value="http://localhost:8080/fcrepo/linkeddata"/>
<cm:property name="fcrepo.authUsername" value=""/>
<cm:property name="fcrepo.authPassword" value=""/>
</cm:default-properties>
</cm:property-placeholder>
<reference id="broker" interface="org.apache.camel.Component" filter="(osgi.jndi.service.name=fcrepo/Broker)"/>
<bean id="http" class="org.apache.camel.component.http4.HttpComponent"/>
<bean id="https" class="org.apache.camel.component.http4.HttpComponent"/>
<!-- configuration of fcrepo component -->
<bean id="fcrepo" class="org.fcrepo.camel.FcrepoComponent">
<property name="authUsername" value="${fcrepo.authUsername}"/>
<property name="authPassword" value="${fcrepo.authPassword}"/>
<property name="baseUrl" value="${fcrepo.baseUrl}"/>
</bean>
<camelContext id="AcrepoConnectorTriplestore" xmlns="http://camel.apache.org/schema/blueprint">
<package>edu.amherst.acdc.connector.triplestore</package>
</camelContext>
</blueprint>
......@@ -86,6 +86,7 @@ public class AcrepoServicesIT extends AbstractOSGiIT {
features(maven().groupId("edu.amherst.acdc").artifactId("acrepo-karaf")
.type("xml").classifier("features").versionAsInProject(),
"acrepo-connector-broadcast",
"acrepo-connector-triplestore",
"acrepo-exts-fits",
"acrepo-exts-image",
......@@ -120,6 +121,7 @@ public class AcrepoServicesIT extends AbstractOSGiIT {
assertTrue(featuresService.isInstalled(featuresService.getFeature("fcrepo-camel")));
assertTrue(featuresService.isInstalled(featuresService.getFeature("fcrepo-service-activemq")));
assertTrue(featuresService.isInstalled(featuresService.getFeature("acrepo-connector-broadcast")));
assertTrue(featuresService.isInstalled(featuresService.getFeature("acrepo-connector-triplestore")));
assertTrue(featuresService.isInstalled(featuresService.getFeature("acrepo-exts-fits")));
assertTrue(featuresService.isInstalled(featuresService.getFeature("acrepo-exts-image")));
assertTrue(featuresService.isInstalled(featuresService.getFeature("acrepo-exts-ldpath")));
......
......@@ -194,6 +194,19 @@
<bundle>mvn:com.fasterxml.jackson.core/jackson-databind/${jacksonVersion}</bundle>
</feature>
<feature name="acrepo-connector-triplestore" version="${project.version}">
<details>Installs the triplestore indexing service</details>
<feature version="${camelVersionRange}">camel</feature>
<feature version="${camelVersionRange}">camel-blueprint</feature>
<feature version="${camelVersionRange}">camel-http4</feature>
<feature version="${fcrepoCamelVersionRange}">fcrepo-camel</feature>
<bundle>mvn:edu.amherst.acdc/acrepo-connector-triplestore/${project.version}</bundle>
<configfile finalname="/etc/edu.amherst.acdc.connector.triplestore.cfg">mvn:edu.amherst.acdc/acrepo-connector-triplestore/${project.version}/cfg/configuration</configfile>
</feature>
<feature name="acrepo-connector-broadcast" version="${project.version}">
<details>Installs the message broadcasting service</details>
......
......@@ -26,7 +26,7 @@
</li>
</ul>
</main>
<footer><a href="amherst.edu">Amherst College</a> • 220 South Pleasant Street, Amherst, MA 01002 • (413) 542-2000 • <a href="amherst.edu">Amherst.edu</a></footer>
<footer><a href="https://amherst.edu">Amherst College</a> • 220 South Pleasant Street, Amherst, MA 01002 • (413) 542-2000 • <a href="https://amherst.edu">Amherst.edu</a></footer>
</body>
</html>
......@@ -3,6 +3,7 @@ include ':acrepo-connector-broadcast'
include ':acrepo-connector-idiomatic'
include ':acrepo-connector-idiomatic-mysql'
include ':acrepo-connector-idiomatic-pgsql'
include ':acrepo-connector-triplestore'
include ':acrepo-exts-fits'
include ':acrepo-exts-image'
include ':acrepo-exts-ldpath'
......@@ -19,6 +20,7 @@ project(':acrepo-connector-broadcast').projectDir = "$rootDir/acrepo-connector-b
project(':acrepo-connector-idiomatic').projectDir = "$rootDir/acrepo-connector-idiomatic" as File
project(':acrepo-connector-idiomatic-mysql').projectDir = "$rootDir/acrepo-connector-idiomatic-mysql" as File
project(':acrepo-connector-idiomatic-pgsql').projectDir = "$rootDir/acrepo-connector-idiomatic-pgsql" as File
project(':acrepo-connector-triplestore').projectDir = "$rootDir/acrepo-connector-triplestore" as File
project(':acrepo-exts-fits').projectDir = "$rootDir/acrepo-exts-fits" as File
project(':acrepo-exts-image').projectDir = "$rootDir/acrepo-exts-image" as File
project(':acrepo-exts-ldpath').projectDir = "$rootDir/acrepo-exts-ldpath" as File
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment