Opened 11 years ago
Closed 9 years ago
#735 closed question (fixed)
GetCapabilities pings rasdaman for every coverage
Reported by: | Dimitar Misev | Owned by: | Alex Dumitru |
---|---|---|---|
Priority: | major | Milestone: | 9.0.x |
Component: | petascope | Version: | development |
Keywords: | getcapabilities performance | Cc: | Jelmer Oosthoek, Marcus Sen, James Passmore, an.rossi@… |
Complexity: | Medium |
Description
A GetCapabilities request seems to send request like
select sdom(c)[0] from coll as c
for every axis of every coverage in petascope. This should be optimized (caching, perhaps) as it really slows it down (think about thousands of coverages). The simplest optimization would be to send only one request per coverage
select sdom(c) from coll as c
Attachments (2)
Change History (28)
comment:1 by , 11 years ago
comment:2 by , 11 years ago
Component: | undecided → petascope |
---|---|
Keywords: | getcapabilities performance added |
Owner: | changed from | to
comment:3 by , 11 years ago
hm, BBOXes or not seems like a different issue from a client perspective (and relevant there indeed). For this ticket, I prefer Dimitar's idea of making sure to issue only 1 sdom() per coverage.
Stupid question: we are repeating some metadata already in the PS_ tables, wouldn't this one make sense, too? Maybe other use cases where sdom() slows down? I'm not easily for caching, but a complexity of O(n) for a GetCapabilities doesn't sound like fun.
comment:4 by , 11 years ago
Replying to dmisev:
for every axis of every coverage in petascope. This should be optimized (caching, perhaps) as it really slows it down (think about thousands of coverages).
Even for hundreds of coverages the response time is quite slow for example, comparing GetCapabilities responses between Rasdaman 8 and 9 on the servers themselves to cut out any networking issues with requests like:
Rasdaman 8 =========== time curl -w %{size_download} / --request GET "http://localhost/petascope?service=WCS&request=GetCapabilities" Rasdaman 9 =========== time curl -w %{size_download} / --request GET "http://localhost/rasdaman/ows?service=WCS&request=GetCapabilities"
We have:
Server details | Average response time | Response document (bytes) | Number of Coverages |
---|---|---|---|
Current internal service (Rasdaman 8) | less than 0.5 seconds | 20946 | 65 |
Current public service (Rasdaman 8) | about 0.5 seconds | 32812 | 131 |
New service (Rasdaman 9) | about 49 seconds | 134016 | 186 |
comment:5 by , 11 years ago
Yes, and for thousands of coverages it takes 10+ minutes (Jelmer has experience with this).
I think first step is to disable the bboxes by default, and add an option to enable them in the petascope.properties. Then we can optimize it further.
comment:6 by , 11 years ago
hm, I see a ratio of 100x in the response times, that seems not explained sufficiently with a maximum of 4x when accessing each axis. Maybe some other, additional effect in the code?
comment:7 by , 11 years ago
Cc: | added |
---|
follow-up: 9 comment:8 by , 11 years ago
Are we talking about the first GetCapabilities request?
That additionally needs to parse and cache CRS defs from SECORE.
The bottleneck anyway are usually the sdom
requests, as the number of CRS used by the service is usually not so big.
comment:9 by , 11 years ago
Replying to pcampalani:
Are we talking about the first GetCapabilities request?
That additionally needs to parse and cache CRS defs from SECORE.
The bottleneck anyway are usually thesdom
requests, as the number of CRS used by the service is usually not so big.
All GetCapabilities requests are taking this time, so some way of caching them, or even using a statically generated one, or some similar hack would be useful.
comment:10 by , 11 years ago
I made a performance comparison between the current GetCapabilities management, and an optimized case where a single query to petascopedb
is needed to fetch the coverage name and nothing more (DbMetadataSource.coverages()
):
current | optimized | covs | |
---|---|---|---|
1st | ~17.000 s | ~4.000 s (25%) | 8 |
nth | ~ 0.700 s | ~0.035 s (5%) | 8 |
So there are considerable gains (I'm attaching detailed monitoring details on 400 subsequent requests right away).
I have some proposals for solving this relevant problem:
- add parameter to disable BBOX in GetCapabilities
- fetch the coverage type in
DbMetadataSource.coverages()
, in addition the coverage name;
This would provide quick responses but without BBOX nor OWS metadata, so the solution is not really optimal I guess.
If we want to cache either the whole XML document or the Java summaries objects (or directly the whole CoverageMetadata
instances) we have to discuss how/when to refresh the cache, eg:
- define a ReloadCapabilities request, like implemented for the WMS service;
- define a lighter ReloadCoverage request;
- ?
Additionally, we could add triggers in petascopedb
/RASBASE
when metadata/spatial-domains change for a coverage.
I suggest we continue the discussion on the m-list now.
by , 11 years ago
Attachment: | GetCapabilities_opt_400x_3mins.png added |
---|
GetCapabilities without BBOX/covType performance profile - 400reqs/8covs
by , 11 years ago
Attachment: | GetCapabilities_400x_10mins.png added |
---|
GetCapabilities with BBOX/covType performance profile - 400reqs/8covs
comment:11 by , 11 years ago
changeset:1b57f46123827e40ba0975892b15511c6c477907 changes from one request per axis to one per coverage.
comment:12 by , 11 years ago
Cc: | added |
---|
comment:13 by , 11 years ago
I see this ticket still open, so a thought here: databases are known as inefficient when it comes to single tuple shipping ("navigational access"). They excel when returning sets. So, why do we have one access per axis (or per coverage now)? Why not collect all ids and send one request instead?
comment:14 by , 11 years ago
That would not be possible in rasql, you can return only a single 'column', i.e. this is not possible
select sdom(rgb), sdom(mr2) from rgb, mr2
comment:15 by , 11 years ago
Maybe if we would allow a union of collections, something like
select sdom(c) from (rgb, mr2) as c
comment:17 by , 10 years ago
Priority: | major → blocker |
---|
Due to the serious performance issues of the capabilities document in v9, I am raising the priority of this ticket to blocker: I believe we really need to fix this for the next minor release.
I will proceed on the related topic in our m-list for discussions on how to fix this.
comment:18 by , 10 years ago
(Configurable) bbox in coverage summary in added in changeset:46aaa33.
Dimitar, let me know if I can close the ticket in a reasonable time, thanks. !
comment:19 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Perhaps it would've been better to have it disabled by default?
comment:20 by , 10 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
I have made the following timings since upgrading to v9.0.4 by running the command:
time curl -w %{size_download} --request GET "http://localhost/rasdaman/ows?service=WCS&request=GetCapabilities"
3 times and ignoring the first one if it is longer because of being the first request after a restart.
I notice 3 new settings in petascope.properties that seem relevant.
metadata_in_covsummary=true bbox_in_covsummary=true description_in_covsummary=true
Response time 50 seconds (1 minute 5 seconds first time after restart)
metadata_in_covsummary=true bbox_in_covsummary=false description_in_covsummary=true
Response time 49 seconds
metadata_in_covsummary=false bbox_in_covsummary=false description_in_covsummary=false
Response time 50 seconds (1 minute 5 seconds first time after restart)
So this problem doesn't seem to be fixed after all. Re-opening ticket.
comment:21 by , 10 years ago
Marcus, changeset:cdc7a85 has been applied.
Turn those 3 params to false
and then you should have a faster capabilities response.
Let us know, thank you very much.
comment:23 by , 10 years ago
Owner: | changed from | to
---|---|
Status: | reopened → assigned |
comment:24 by , 10 years ago
Priority: | blocker → major |
---|
The parameters are false by default now, this is not a blocker I'd say.
comment:25 by , 9 years ago
I've imported > 1000 coverages by wcst_import, the result is very quick (as 3 parameters are false by default)
time curl -w %{size_download} --request GET "http://localhost:8080/rasdaman/ows?service=WCS&request=GetCapabilities"
452997 real 0m2.003s user 0m0.001s sys 0m0.024s
comment:26 by , 9 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Yes with these parameters off it's pretty fast. Let's close this ticket.
As discussed with Dimitar, we could also add a further parameter in the
petascope.properties
(as done for ows:Metadata, see #314) to enable/disable BBOXes in coverage summaries.