<?xml version="1.0" encoding="utf-8"?>

<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
	<channel>
		<title>Actian Community Forums - Blogs - rhann</title>
		<link>http://community.actian.com/forum/blogs/rhann/</link>
		<description><![CDATA[Actian Corporation is a leading provider of open source database management software and support services. [Toll Free] +1 (888) 446-4737]]></description>
		<language>en</language>
		<lastBuildDate>Wed, 22 Feb 2012 22:21:21 GMT</lastBuildDate>
		<generator>vBulletin</generator>
		<ttl>60</ttl>
		<image>
			<url>http://community.actian.com/forum/ingres4/misc/rss.jpg</url>
			<title>Actian Community Forums - Blogs - rhann</title>
			<link>http://community.actian.com/forum/blogs/rhann/</link>
		</image>
		<item>
			<title>Friday came early this week</title>
			<link>http://community.actian.com/forum/blogs/rhann/81-friday-came-early-week.html</link>
			<pubDate>Wed, 13 Apr 2011 14:26:57 GMT</pubDate>
			<description><![CDATA[Image: http://community.ingres.com/forum/blog_attachment.php?attachmentid=8&stc=1&d=1302704628  
[Thanks to http://atom.smasher.org/highway/]]]></description>
			<content:encoded><![CDATA[<div><img src="http://community.ingres.com/forum/blog_attachment.php?attachmentid=8&amp;stc=1&amp;d=1302704628" border="0" alt="" /><br />
<font size="1">[Thanks to <a href="http://atom.smasher.org/highway/]" target="_blank">http://atom.smasher.org/highway/]</a></font></div>


<!-- attachments -->
	<div style="margin-top:10px">

		
			<fieldset class="fieldset">
				<legend>Attached Thumbnails</legend>
				<div style="padding:3px">
				
	<a href="http://community.actian.com/forum/blogs/rhann/attachments/8d1302704628-friday-came-early-week-vw.jpg" target="attachment" rel="Lightbox" id="attachment8"><img class="thumbnail" src="http://community.actian.com/forum/blogs/rhann/attachments/8d1302704628t-friday-came-early-week-vw.jpg" border="0" alt="Click image for larger version

Name:	vw.JPG
Views:	860
Size:	60.8 KB
ID:	8" /></a>
	&nbsp;
	

				</div>
			</fieldset>
		
		
		
		

	</div>
<!-- / attachments -->
]]></content:encoded>
			<dc:creator>rhann</dc:creator>
			<dc:publisher>131</dc:publisher>
			<guid isPermaLink="true">http://community.actian.com/forum/blogs/rhann/81-friday-came-early-week.html</guid>
		</item>
		<item>
			<title>VectorWise TPC-H Results Posted</title>
			<link>http://community.actian.com/forum/blogs/rhann/79-vectorwise-tpc-h-results-posted.html</link>
			<pubDate>Tue, 15 Feb 2011 14:42:23 GMT</pubDate>
			<description><![CDATA[The 100GB VectorWise TPC-H benchmark (http://www.tpc.org/tpch/results/tpch_perf_results.asp) has been out for a few days now, and it's already had a...]]></description>
			<content:encoded><![CDATA[<div>The <a href="http://www.tpc.org/tpch/results/tpch_perf_results.asp" target="_blank">100GB VectorWise TPC-H benchmark</a> has been out for a few days now, and it's already had a bit of attention in spite of the fact that the official media blitz hasn't started yet.<br />
<br />
The TPC-H benchmark is the official, audited version of the DBT-3 benchmark I've been writing about, with the addition that it does some updates too.<br />
<br />
There are a few interesting things I notice in the results.  First and foremost the official results are very much in line with what <a href="http://www.rationalcommerce.com/" target="_blank">we</a> were seeing using DBT-3.  I know that because one of the things we established with DBT-3 was the elapsed running time varies pretty much linearly with the size of the database.  Double the size of the database and you double the running time.  Just a glance at the numbers shows the official test system ran about as fast as our test system would with a 100Gb database.<br />
<br />
Two things are worth noting about that: firstly we weren't using trick hardware.  We put a bit of thought into configuring it and we tried a few things to get good performance, but the machine cost only about £5,000 all-in.  And we got performance that would top the TPC-H league table!<br />
<br />
Another thing that is worth mentioning is that the benchmark uses absolutely rock-bottom-of-the-line standard ANSI/ISO SQL.  There is no special dialect being used to access funny benchmark-specific tweaks or even proprietary capabilities.  Third-party technology-neutral tools talking to VectorWise through JDBC or whatever are going to see the kind of performance demonstrated in the benchmark.<br />
<br />
Finally, the logical database design used in this benchmark is a more or less obvious design that would be used in conventional transaction processing.  It is not a star or a snowflake or any other &quot;dimensional&quot; transformation.  It is therefore feasible to take your production database, maybe pruning off some of the irrelevant tables, maybe obfuscating any sensitive columns, and just use it as-is.  <br />
<br />
Thanks to low-cost hardware, thoroughly standard SQL, and accessible database designs, VectorWise has virtually no barriers to entry.</div>

]]></content:encoded>
			<dc:creator>rhann</dc:creator>
			<dc:publisher>131</dc:publisher>
			<guid isPermaLink="true">http://community.actian.com/forum/blogs/rhann/79-vectorwise-tpc-h-results-posted.html</guid>
		</item>
		<item>
			<title>Unleashing VectorWise on High-Performance Hardware</title>
			<link>http://community.actian.com/forum/blogs/rhann/71-unleashing-vectorwise-high-performance-hardware.html</link>
			<pubDate>Sat, 03 Jul 2010 09:49:10 GMT</pubDate>
			<description><![CDATA[It's been about a month since I last posted anything about VectorWise. I have been rather distracted by the UK IUA conference on June 8, normal...]]></description>
			<content:encoded><![CDATA[<div>It's been about a month since I last posted anything about VectorWise. I have been rather distracted by the UK IUA conference on June 8, normal everyday work, life, but mostly by the project that I am writing about today.<br />
<br />
This project started after I (frankly) hit the wall with the machine I've been writing about in my earlier postings. It was nice and fast for what it was, but some off-line emails made it clear people were less interested in breathing new life into systems assembled from office junk—like my machine, and were much more interested in what the ultimate limits of VectorWise performance are. As a result of our early successes and the encouragement of the emails, we started a project to configure a plausible high-end machine. &quot;Plausible&quot; in this context meant several things: affordable, available, and supportable, in other words: a machine you could expect to deploy in a live business-critical role.  We think we've nailed it now, and the results of our first round of testing are very impressive.  (We are well into a second round of testing now, and I'll say more about that in a future posting.)<br />
<br />
We had intended to have our prototype package available to show at the IUA conference but we ran into a number of technical and logistical problems. Happily those have now been overcome, with generous and very rapid assistance from the VectorWise team in particular. (Sometimes they worked over weekends to deliver fixes, even putting together a special build environment that matched our system. Special thanks are due to Giel de Nijs!)<br />
<br />
First off, before I tell you about the results of our testing, let me describe the machine. The configuration we've used for the purpose of this posting had a pair of Intel Xeon 5520 Nehalem/Gainestown  quad-core 2.26GHz CPUs, 32Gb of RAM and 450Gb of fault-tolerant SLC flash RAID 5. We're calling this machine the <i>V16 Vector Appliance™</i>. It comes in either a 6U rack-mount format or desk-side format. (BTW, the same basic platform allows up to 96Gb of RAM, and up to 1.35Tb of fault-tolerant SLC flash RAID 5—about enough for 3Tb of conventional database storage, depending on its compressibility. Also, to meet demand we are currently using the slower L5520 CPUs because they are/were more readily available. )<br />
<br />
I have temporarily put aside my benchmark based on the wikipedia statistics. I'll come back to that in benchmark in future. For this round of testing we've adopted some of the <a href="http://osdldbt.sourceforge.net" target="_blank">DBT-3™</a> benchmark. DBT-3™ is a decision support workload. It consists of a suite of business-oriented ad hoc queries and normally includes concurrent data modifications. DBT-3™ is a fair usage implementation of the Transaction Processing Performance Council's <a href="http://www.tpc.org/tpch" target="_blank">TPC-H™</a> benchmark specification.<br />
<br />
We ran four sets of tests, each set consisting of 21 queries.  All the queries ran successfully in the sense that they completed and produced correct  answers.  We used a database with a scaling factor of 30, which results in a database sized as follows:<br />
<blockquote><font face="Lucida Console">part:  6 million rows<br />
partsupp:   24 million rows<br />
lineitem:   180 million rows<br />
orders:  45 million rows<br />
supplier:   300,000 rows<br />
customers:   4.5 million rows<br />
nation:  25 rows<br />
regions:  5 rows</font></blockquote>We knew this system was going to be fast; we'd chosen all the components to achieve the best possible performance without compromising on reliability, availability or supportability. We were not disappointed. Compared with conventional Ingres using electromechanical disk but otherwise the same hardware, <b>Ingres VectorWise on our V16 Vector Appliance was between 80 and 160 times faster</b>, depending on which query you look at. In brief, the Ingres VectorWise results were:<br />
<blockquote><font face="Lucida Console">Pricing Summary Report: 00m 08s<br />
Minimum Cost Supplier: 00m 02s<br />
Shipping Priority: 00m 08s<br />
Order Priority Checking: 00m 06s<br />
Local Supplier Volume: 00m 08s<br />
Forecasting Revenue Change: 00m 01s<br />
Volume Shipping: 00m 06s<br />
National Market Share: 00m 05s<br />
Product Type Profit Measure: 00m 22s<br />
Returned Item Reporting: 00m 14s<br />
Important Stock Identification: 00m 02s<br />
Shipping Modes and Order Priority: 00m 08s<br />
Customer Distribution: 00m 12s<br />
Promotion Effect: 00m 04s<br />
Top Supplier: 00m 04s<br />
Parts/Supplier Relationship: 00m 09s<br />
Small-Quantity-Order Revenue: 00m 04s<br />
Large Volume Customer: 00m 16s<br />
Discounted Revenue: 00m 05s<br />
Potential Part Promotion: 00m 06s<br />
Suppliers Who Kept Orders Waiting: 00m 19s</font></blockquote>All the queries ran in much less than a minute.  For comparison, some of these queries took close to half an hour using conventional Ingres.<br />
<br />
It worth pointing out that none of the queries required any &quot;tinkering&quot; to obtain these astonishing results.  We did no special indexing, no SQL changes, no anything.  This is just what Ingres VectorWise does, straight out of the box.  All the work was in getting the software to cooperate with the hardware, which is just debugging, and it's done now. :)<br />
<br />
You can <a href="http://www.rationalcommerce.com/uploads/V16-Benchmark-20100630.pdf" target="_blank">download the full report</a>, with details of all our testing, along with the test SQL, result sets, and comparisons with conventional Ingres and with electromechanical disks.</div>

]]></content:encoded>
			<dc:creator>rhann</dc:creator>
			<dc:publisher>131</dc:publisher>
			<guid isPermaLink="true">http://community.actian.com/forum/blogs/rhann/71-unleashing-vectorwise-high-performance-hardware.html</guid>
		</item>
		<item>
			<title>VectorWise at the UK IUA Conference</title>
			<link>http://community.actian.com/forum/blogs/rhann/69-vectorwise-uk-iua-conference.html</link>
			<pubDate>Thu, 06 May 2010 22:19:19 GMT</pubDate>
			<description>Most of my limited free time over the last two weeks has been used up with organizing the annual UK IUA conference.  The UK IUA has always been...</description>
			<content:encoded><![CDATA[<div>Most of my limited free time over the last two weeks has been used up with organizing the annual UK IUA conference.  The UK IUA has always been totally independent of Ingres Corporation (and its precursors), and it has to fund the conferences itself.  Until you've run a conference like ours you can have no idea how phenomenally expensive it can be.  We've seen conference costs of well over £35,000 in past years.<br />
<br />
To try to stage an affordable event in these tough times we have done our own event management for the last two years, entirely with volunteer organizers.  That is the reason for the lack of any VectorWise-related posts here lately.  Fortunately the big effort is finished (for a while), and now that the conference agenda is set, the sponsors are starting to ante-up, and the registration process is running, I hope to get back to work with IVW.<br />
<br />
Happily, one benefit of taking the time off from writing about VectorWise is that the IUA Conference is going to have some very strong VectorWise content.  As I write, I know that <b>John Smedley</b> is going to be doing a 45-minute workshop on Getting Started with VectorWise (which will build on <b>Ray Fan</b>'s workshop on cloud-computing—so it will incidentally be about VectorWise in the cloud).  I also know that <b>Doug Inkster</b> is doing a 45-minute presentation on how VectorWise works and how it was integrated into Ingres.  Finally, <b>Rilson Nascimento</b> is going to compare the performance of VectorWise with a few other products using TCP-H inspired queries.  <br />
<br />
If you are interested in VectorWise and you are in the UK or can get to the UK for June 8 (and last year people came from Australia, Jordan, Denmark, Canada, and elsewhere), you should plan to attend the UK IUA Conference.  You can register at <a href="https://www.regonline.co.uk/IUA2010" target="_blank">https://www.regonline.co.uk/IUA2010</a>.  The membership fee to allow you to register is £75.  In fact,<b> if you register before May 12 you can join for just £50.</b>  A full agenda with session abstracts is available during the registration process.  (For a preview of the draft agenda, take a look at this posting on the <a href="http://community.ingres.com/forum/ingres-community/12008-register-2010-ingres-users-association-conference-post31278.html#post31278" target="_blank">Community Forum</a>.)<br />
<br />
I look forward to seeing you all there.  I am sure there is going to be more VectorWise content than I 've mentioned here.</div>

]]></content:encoded>
			<dc:creator>rhann</dc:creator>
			<dc:publisher>131</dc:publisher>
			<guid isPermaLink="true">http://community.actian.com/forum/blogs/rhann/69-vectorwise-uk-iua-conference.html</guid>
		</item>
		<item>
			<title>VectorWise Concurrency</title>
			<link>http://community.actian.com/forum/blogs/rhann/67-vectorwise-concurrency.html</link>
			<pubDate>Fri, 16 Apr 2010 20:12:14 GMT</pubDate>
			<description>Since I last wrote anything I have upgraded to the 104 build of VectorWise 1.0, which includes a number of bug fixes. 
 
In the comments on a...</description>
			<content:encoded><![CDATA[<div>Since I last wrote anything I have upgraded to the 104 build of VectorWise 1.0, which includes a number of bug fixes.<br />
<br />
In the comments on a <a href="http://community.ingres.com/forum/blogs/rhann/62-baseline-results-part-1.html" target="_blank">previous post</a> Dejan Lekic asked about how VectorWise handles concurrent users.  I am now looking into that, so this is the first of what I expect will be a series of posts on the subject.  In this one I am mainly just explaining my approach to this question.<br />
<br />
All computer systems saturate at some point.  A system is saturated when its utilization approaches 100% and it becomes so busy that a small marginal increase in demand causes an intolerable increase in response time.  That is, the arrival rate of new requests exceeds the completion rate of some critical resource, resulting in a queue of unfulfilled requests building up.  Initially, reponse time without a queue will be fairly constant, but as the saturation point is approached response times start to increase quite quickly.  Unless the request rate slows for some (external) reason, the response time will quite suddenly become effectively infinite.  To the users it will appear that the system is hung or even crashed.  That is really important to know: response time doesn't increase nicely; one minute the system is fine, the next it is &quot;crashed&quot;.  You must plan ahead to avoid these &quot;pseudo-crashes&quot; because you won't get any useful warning—it is much like skidding on ice.  <br />
<br />
It is quite easy to discover how fast a particular query runs and it's just as easy to discover how fast a single user can run a variety of queries.  My last two-part posting did that.  It is only a little harder to discover how fast a particular query runs when there are multiple concurrent users; see for example the graph in page 10 of the <a href="http://info.ingres.com/g/?M9CHJHQYTZ=clicksrc:ingres" target="_blank">Ingres VectorWise Sneak Preview whitepaper</a>).<br />
<br />
Unfortunately none of this kind of easily obtained information helps with capacity planning.  To do capacity planning we need to run a real or simulated workload with a fixed number of users (and hence a fixed request rate), then repeat the trial with more users.  If you are really clued up on queuing theory you can devise a queing model and just a few trials will be enough to estimate the parameters for the model.  From the model you can estimate when saturation will occur without really inducing it.  If you know the acceptable worst-case reponse time, and assuming your simulated workload is realistic, you can then predict the number of concurrent users the chosen system can safely handle.  (What to do if it turns out the system can't handle the required number of concurrent users is way beyond what I'll discuss here.)<br />
<br />
Alternatively, if you don't want to create a reliable queuing model or you would rather depend on observation than modelling, then you have to repeat the experiment, increasing the load each time until you actually see where saturation occurs.  That is what I have done for this blog entry.<br />
<br />
My workload simulator is very quick-and-dirty.  It's a multi-threaded C program that uses embedded SQL to simulate N users (one user per thread) each sending a work package (a set of queries) to the server at random intervals but a steady average pace (to simulate some kind of &quot;think time&quot; between packages.  For the purpose of this blog posting the pace is no more than two packages of work in every ten second interval.  The interval between the work packages of one user is random, but controlled so that they average out to no more than 12 a minute.  Each user gets to execute 100 work packages.  I am using the same very modest desktop workstation I described previously, and a 6.25M row dataset.<br />
<br />
As this is the first posting on this subject I want mainly to draw attention to the fact that there is an inflection point when the flow becomes unbalanced so that response time goes through the roof, and that it is the basis for capacity planning.  I have therefore kept this first simulation very simple by making every package of work a single identical query (QC_POINT_FACT).  QC_POINT_FACT is one of the fastest queries I've run, so it is also a &quot;best case&quot; workload.  In future postings I will enrich the job mix, I will use more complex (slower) queries, and I will also look at running additional background workload, such as database refreshes.<br />
<br />
<img src="http://community.ingres.com/forum/blog_attachment.php?attachmentid=3&amp;stc=1&amp;d=1271509013" border="0" alt="" /><br />
<br />
(This graph uses real measured results, and that step at around 60 users is real.  I have no idea what that is about.)<br />
<br />
If the acceptable worst-case response time for my business is agreed to be 2 seconds, then according to this trial, running this simple workload, my little desktop will support up to 101 users!  Which ain't bad.  For this workload.  (It will definitely not do so well with more demanding workloads, but that's another simulation for another day.)  <br />
<br />
By the way, the procedure I have described above applies equally well to most computer capacity planning problems, be it for an entire system or a single disk drive.  Timing a single task or a single user gives no clue about ultimate capacity or throughput.  You need to push it to find that inflection point—it may be lurking closer than you'd like!</div>


<!-- attachments -->
	<div style="margin-top:10px">

		
			<fieldset class="fieldset">
				<legend>Attached Thumbnails</legend>
				<div style="padding:3px">
				
	<a href="http://community.actian.com/forum/blogs/rhann/attachments/3d1271509011-vectorwise-concurrency-concurrency.gif" target="attachment" rel="Lightbox" id="attachment3"><img class="thumbnail" src="http://community.actian.com/forum/blogs/rhann/attachments/3d1271509011t-vectorwise-concurrency-concurrency.gif" border="0" alt="Click image for larger version

Name:	concurrency.GIF
Views:	421
Size:	4.0 KB
ID:	3" /></a>
	&nbsp;
	

				</div>
			</fieldset>
		
		
		
		

	</div>
<!-- / attachments -->
]]></content:encoded>
			<dc:creator>rhann</dc:creator>
			<dc:publisher>131</dc:publisher>
			<guid isPermaLink="true">http://community.actian.com/forum/blogs/rhann/67-vectorwise-concurrency.html</guid>
		</item>
		<item>
			<title>Baseline Results (Part 2)</title>
			<link>http://community.actian.com/forum/blogs/rhann/63-baseline-results-part-2.html</link>
			<pubDate>Sat, 03 Apr 2010 16:05:23 GMT</pubDate>
			<description>Continued from Part 1 (http://community.ingres.com/forum/blogs/rhann/62-baseline-results-part-1.html). 
 
*Q_MAXPAGE* 
 
Code: 
--------- 
SELECT...</description>
			<content:encoded><![CDATA[<div>Continued from <a href="http://community.ingres.com/forum/blogs/rhann/62-baseline-results-part-1.html" target="_blank">Part 1</a>.<br />
<br />
<b>Q_MAXPAGE</b><br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 130px;
		text-align: left;
		overflow: auto">SELECT FIRST 50 ps.page_id, max(ps.page_count) AS mx
FROM pagestat ps JOIN datesinfo di ON di.id=ps.date_id 
WHERE di.calmonth=2 
  AND di.calyear=2010 
  AND di.calday=18 
GROUP BY ps.page_id
ORDER BY mx DESC;</pre>
</div><b>0.1704 secs</b> @ 6.25M rows<br />
<b>0.3197 secs</b> @ 12.5M rows<br />
<b>0.6166 secs</b> @ 25M rows<br />
<b>1.2251 secs</b> @ 50M rows<br />
<br />
<b>Q_10000HITS_1</b><br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 130px;
		text-align: left;
		overflow: auto">SELECT FIRST 50 ps.page_id, max(ps.page_count) AS mx
FROM pagestat ps JOIN datesinfo di ON di.id=ps.date_id  
WHERE di.calmonth=3 
  AND di.calyear=2010 
  AND di.calday=3 
GROUP BY ps.page_id
HAVING max(ps.page_count) &gt; 10000;</pre>
</div><b>0.1748 secs</b> @ 6.25M rows<br />
<b>0.3281 secs</b> @ 12.5M rows<br />
<b>0.6367 secs</b> @ 25M rows<br />
<b>1.2785 secs</b> @ 50M rows<br />
<br />
<b>Q_10000HITS_2</b><br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 146px;
		text-align: left;
		overflow: auto">SELECT FIRST 50  pg.page, t.mx 
FROM pages pg JOIN (SELECT ps.page_id, max(ps.page_count) AS mx  
                    FROM pagestat ps JOIN datesinfo di ON di.id=ps.date_id 
                    WHERE di.calmonth=3 
                      AND di.calyear=2010 
                      AND di.calday=3   
                    GROUP BY ps.page_id  
                    HAVING max(page_count) &gt; 10000) t ON t.page_id=pg.id;</pre>
</div><b>0.2647 secs</b> @ 6.25M rows<br />
<b>0.5372 secs</b> @ 12.5M rows<br />
<b>1.0556 secs</b> @ 25M rows<br />
<b>2.0024 secs</b> @ 50M rows<br />
<br />
<b>Q_NOACCESS_SECOND_DAY</b><br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 322px;
		text-align: left;
		overflow: auto">SELECT pg.page 
FROM pages pg JOIN (SELECT t1.page_id 
                    FROM (SELECT ps1.page_id,sum(ps1.page_count) AS sm 
                          FROM pagestat ps1 JOIN datesinfo di1 ON di1.id=ps1.date_id 
                          WHERE di1.calmonth=2 
                            AND di1.calyear=2010 
                            AND di1.calday=16 
                            AND ps1.project_id=1524 
                          GROUP BY ps1.page_id 
                          HAVING sum(page_count)&gt;1000) t1  -- pages accessed more than 1000 times on Day 1
                    LEFT JOIN (SELECT ps2.page_id 
                               FROM pagestat ps2 JOIN datesinfo di2 ON di2.id=ps2.date_id
                               WHERE di2.calmonth=2 
                                 AND di2.calyear=2010 
                                 AND di2.calday=17 
                                 AND ps2.project_id=1524) t2 -- pages accessed on Day 2
                    ON t2.page_id=t1.page_id  
                    WHERE t2.page_id IS NULL) bq 
ON pg.id=bq.page_id;</pre>
</div><b>0.1018 secs</b> @ 6.25M rows<br />
<b>0.1469 secs</b> @ 12.5M rows<br />
<b>0.2398 secs</b> @ 25M rows<br />
<b>0.4059 secs</b> @ 50M rows<br />
<br />
<b>Q_PAGE_MONTHSTATS</b><br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 146px;
		text-align: left;
		overflow: auto">SELECT di.caldate, sum(ps.page_count) 
FROM pagestat ps JOIN datesinfo di ON ps.date_id = di.id
                 JOIN pages pg ON ps.page_id=pg.id
WHERE pg.page='Yugopolis' 
  AND di.calmonth=2 
  AND di.calyear=2010 
GROUP BY caldate 
ORDER BY caldate;</pre>
</div><b>0.2333 secs</b> @ 6.25M rows<br />
<b>0.4281 secs</b> @ 12.5M rows<br />
<b>0.7764 secs</b> @ 25M rows<br />
<b>1.4091 secs</b> @ 50M rows<br />
<br />
<b>Q_PROJECT_AVG</b><br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 114px;
		text-align: left;
		overflow: auto">SELECT pj.project, AVG(ps.page_count) 
FROM pagestat ps JOIN datesinfo di ON ps.date_id=di.id
                 JOIN projects pj ON ps.project_id=pj.id
WHERE calmonth=2 
  AND calyear=2010 
GROUP by project;</pre>
</div><b>0.7520 secs</b> @ 6.25M rows<br />
<b>1.6324 secs</b> @ 12.5M rows<br />
<b>2.9139 secs</b> @ 25M rows<br />
<b>5.9650 secs</b> @ 50M rows<br />
<br />
<b>Q_PAGES_PROJECT</b><br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 114px;
		text-align: left;
		overflow: auto">SELECT FIRST 100 ps.page_id, count(distinct ps.project_id) 
FROM pagestat ps JOIN datesinfo di ON di.id=ps.date_id 
WHERE di.calmonth=2 
  AND di.calyear=2010 
GROUP BY ps.page_id 
HAVING count(distinct ps.project_id) &gt; 10;</pre>
</div><b>6.0733 secs</b> @ 6.25M rows<br />
<b>13.1317 secs</b> @ 12.5M rows<br />
<b>30.4078 secs</b> @ 25M rows<br />
<b>68.2658 secs</b> @ 50M rows<br />
<br />
<b>Q_HOURS_DROP</b><br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 226px;
		text-align: left;
		overflow: auto">SELECT FIRST 100 ps1.page_id, ps1.page_count 
FROM pagestat ps1 
    JOIN datesinfo di1 ON ps1.date_id = di1.id 
    JOIN pagestat ps2 ON ps2.page_id=ps1.page_id 
    JOIN  datesinfo di2 ON ps2.date_id = di2.id 
WHERE di1.caldate='2010-02-16' 
  AND di1.dayhour=14 
  AND di2.caldate='2010-02-16' 
  AND di2.dayhour=15 
  AND ps1.project_id=1524 
  AND ps2.project_id=1524 
  AND ps1.page_count &gt; 2 * ps2.page_count 
  AND ps1.page_count&gt;=1000;</pre>
</div><b>0.0962 secs</b> @ 6.25M rows<br />
<b>0.1753 secs</b> @ 12.5M rows<br />
<b>0.3361 secs</b> @ 25M rows<br />
<b>0.7419 secs</b> @ 50M rows<br />
<br />
The next couple of queries are based on an intermediate table called <b>pagestat_daily</b> constructed as follows:<br />
<br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 258px;
		text-align: left;
		overflow: auto">CREATE TABLE pagestat_daily 
(
   page_id int NOT NULL,
   date_id_from smallint  NOT NULL,
   date_id_to smallint  NOT NULL,
   project_id smallint  NOT NULL,
   page_count int  NOT NULL 
) WITH STRUCTURE=VECTORWISE;

INSERT INTO pagestat_daily (project_id, page_id, date_id_from, date_id_to, page_count ) 
SELECT ps.project_id, ps.page_id, min(ps.date_id), max(ps.date_id), sum(ps.page_count) 
FROM pagestat ps JOIN datesinfo di ON ps.date_id=di.id 
WHERE calmonth=2 
  AND calyear=2010 
GROUP BY ps.project_id, ps.page_id, di.caldate;</pre>
</div><b>Q_DATE_RANGE</b><br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 178px;
		text-align: left;
		overflow: auto">SELECT pj.project, sum(pd.page_count) AS sm 
   FROM pagestat_daily pd 
    JOIN datesinfo dstart ON pd.date_id_from=dstart.id 
    JOIN datesinfo dend ON pd.date_id_to=dend.id 
    JOIN projects pj ON pd.project_id=pj.id
WHERE dstart.caldate&gt;='2010-02-15' 
  AND dend.caldate &lt;= '2010-02-20' 
GROUP BY pj.project 
HAVING sum(pd.page_count) &gt; 1000 
ORDER BY sm DESC;</pre>
</div><b>0.6308 secs</b> @ 6.25M rows<br />
<b>1.2273 secs</b> @ 12.5M rows<br />
<b>2.3717 secs</b> @ 25M rows<br />
<b>4.3561 secs</b> @ 50M rows<br />
<br />
<b>Q_PAGE_DATE_RANGE</b><br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 162px;
		text-align: left;
		overflow: auto">SELECT FIRST 100 pg.page, dt.sm FROM pages pg, 
   ( SELECT pd.page_id, sum(pd.page_count) AS sm 
     FROM pagestat_daily pd JOIN datesinfo dstart ON pd.date_id_from=dstart.id 
                            JOIN datesinfo dend ON pd.date_id_to=dend.id 
     WHERE dstart.caldate&gt;='2010-02-15' 
       AND dend.caldate &lt;= '2010-02-20' 
   GROUP BY pd.page_id ) dt 
WHERE pg.id=dt.page_id 
ORDER BY sm DESC;</pre>
</div><b>2.4418 secs</b> @ 6.25M rows<br />
<b>4.9174 secs</b> @ 12.5M rows<br />
<b>9.1677 secs</b> @ 25M rows<br />
<b>17.9668 secs</b> @ 50M rows<br />
<br />
<b>Q_HIT_SPIKE</b><br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 98px;
		text-align: left;
		overflow: auto">SELECT FIRST 10 ps.page_id, MAX(ps.page_count/pd.page_count) AS max_daily_spikiness
FROM pagestat ps INNER JOIN pagestat_daily pd ON ps.page_id=pd.page_id
                        AND ps.date_id BETWEEN pd.date_id_from AND pd.date_id_to
GROUP BY ps.page_id
ORDER BY max_daily_spikiness DESC;</pre>
</div><b>7.7353 secs</b> @ 6.25M rows<br />
<b>21.7507 secs</b> @ 12.5M rows<br />
<b>65.8813 secs</b> @ 25M rows<br />
<b>215.3931 secs</b> @ 50M rows<br />
<br />
<b>QC_POINT_FACT</b><br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 146px;
		text-align: left;
		overflow: auto">SELECT * 
FROM pagestat ps1 JOIN datesinfo di1 ON ps1.date_id = di1.id 
WHERE page_id=11610 
  AND project_id=1524 
  AND di1.calday=7 
  AND di1.calmonth=2 
  AND di1.calyear=2010 
  AND di1.dayhour=1;</pre>
</div><b>0.1328 secs</b> @ 6.25M rows<br />
<b>0.2115 secs</b> @ 12.5M rows<br />
<b>0.3747 secs</b> @ 25M rows<br />
<b>0.6587 secs</b> @ 50M rows<br />
<br />
<b>QC_POINT_PAGE</b><br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 34px;
		text-align: left;
		overflow: auto">SELECT * FROM pages pg WHERE pg.id=2065866;</pre>
</div><b>0.0066 secs</b> @ 6.25M rows<br />
<b>0.0068 secs</b> @ 12.5M rows<br />
<b>0.0070 secs</b> @ 25M rows<br />
<b>0.0069 secs</b> @ 50M rows<br />
<br />
Here are all the results again, in a more concise format for future reference.  (Remember there is about 0.0017 seconds overhead included in these.):<br />
<br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 434px;
		text-align: left;
		overflow: auto">+------------------------+---------+---------+---------+---------+
|query                   |6.25m row|12.5m row|25m rows |50m rows |
+------------------------+---------+---------+---------+---------+
|Q_10000HITS_2           |   0.2647|   0.5372|   1.0556|   2.0024|
|Q_1000HITS_1            |   0.1748|   0.3281|   0.6367|   1.2785|
|Q_DATE_RANGE            |   0.6308|   1.2273|   2.3717|   4.3561|
|Q_DAYSTAY               |   0.3245|   0.6406|   1.2530|   2.4830|
|Q_DAYWEEKSTAT           |   0.1681|   0.3212|   0.6162|   1.2079|
|Q_HIT_SPIKE             |   7.7353|  21.7507|  65.8813| 215.3931|
|Q_HOURPROJECTSTAT       |   0.1542|   0.2881|   0.5477|   1.0783|
|Q_HOURS_DROP            |   0.0962|   0.1753|   0.3361|   0.7419|
|Q_MAXPAGE               |   0.1704|   0.3197|   0.6166|   1.2251|
|Q_NOACCESS_SECOND_DAY   |   0.1018|   0.1469|   0.2398|   0.4059|
|Q_NOPROJECTS            |   0.3367|   0.8100|   1.3274|   2.7934|
|Q_NOPROJECTS_1          |   0.1123|   0.2107|   0.3935|   0.7832|
|Q_PAGES_PROJECT         |   6.0733|  13.1317|  30.4078|  68.2658|
|Q_PAGE_DATE_RANGE       |   2.4418|   4.9174|   9.1677|  17.9668|
|Q_PAGE_MONTHSTATS       |   0.2333|   0.4281|   0.7764|   1.4091|
|Q_POINT_FACT            |   0.1328|   0.2115|   0.3747|   0.6587|
|Q_POINT_PAGE            |   0.0066|   0.0068|   0.0070|   0.0069|
|Q_PROJECT_AVG           |   0.7520|   1.6324|   2.9139|   5.9650|
|Q_PROJECT_SUM           |   0.1906|   0.5411|   0.7768|   1.7320|
|Q_TOP50                 |   1.0425|   2.1400|   3.6009|   6.4686|
|Q_TOP_PROJECTS          |   0.7331|   1.5993|   2.8464|   5.8379|
|Q_UNIQ                  |   4.0147|   9.0174|  20.7797|  47.6740|
+------------------------+---------+---------+---------+---------+</pre>
</div></div>

]]></content:encoded>
			<dc:creator>rhann</dc:creator>
			<dc:publisher>131</dc:publisher>
			<guid isPermaLink="true">http://community.actian.com/forum/blogs/rhann/63-baseline-results-part-2.html</guid>
		</item>
		<item>
			<title>Baseline Results (Part 1)</title>
			<link>http://community.actian.com/forum/blogs/rhann/62-baseline-results-part-1.html</link>
			<pubDate>Sat, 03 Apr 2010 16:04:48 GMT</pubDate>
			<description>As I wrote in my first posting, my limited hardware means I have to be interested primarily in the usabilty and operation of VectorWise rather than...</description>
			<content:encoded><![CDATA[<div>As I wrote in my first posting, my limited hardware means I have to be interested primarily in the usabilty and operation of VectorWise rather than it's flat-out, edge-of-the-envelope speed.  In a couple of earlier posts I've looked at the sort of SQL it can cope with.  Almost all the queries I've used were originally intended for DBMSs other than Ingres, and apart from minor tweaks to overcome non-standard syntax they've all run without significant changes.  At this stage I am pretty confident that I could start using it for real work without having to throw away all my queries and start again from scratch.<br />
<br />
Now I am going to move on to look at how it copes with multiple concurrent users, and also the operational implications of needing to refresh the database, possibly while queries are concurrently running.  I am also going to be interested in how indexing helps, as I've not yet taken advantage of that new feature of the beta-release.<br />
<br />
I am going to need to know how these things affect performance.  For instance, how scalable is VectorWise?  How does performance change as I add concurrent users?  What is the practical limit on the number of users I can have?  What sort of guidelines should I use for capacity planning?<br />
<br />
For these kinds of questions I need to establish the baseline performance that I will use to compare with what happens during testing.<br />
<br />
Below, I show all 22 of the baseline queries, and the performance when I run them on each of four datasets.  In all these tests my procedure was as follows:<br />
<ol style="list-style-type: decimal"><li>create a new database</li>
<li>load one of my <a href="http://community.ingres.com/forum/blogs/rhann/59-upgrade-day-creating-sample-datasets.html" target="_blank">sampled</a> raw datasets</li>
<li>construct my dimension tables and fact table</li>
<li>shut down and restart Ingres/VectorWise</li>
<li>run the entire suite of 22 queries</li>
<li>repeat the entire suite five more times</li>
</ol><br />
My reason for repeating the queries is that even though it's a single-user system early testing showed that the quicker ones were very sensitive to other background system activity.  All the following results are therefore averaged over six runs.<br />
<br />
<b>Note:</b> I'm just using SQL scripts to capture these times, so there is potentially a lot of Ingres overhead included.  That is, the times I'm reporting are actually the time to run the query <b>plus</b> some of the time to record how long it took.  Based on the reported time for a null query I reckon the overhead is about 0.0017 seconds.  Obviously that is going to affect the very brief queries disproportionately, making them look slower than they really are.  I might come back to this in a future posting.<br />
<br />
To get some idea of how response time varies with the size of the dataset, I started with a fact table containing approximately 6.25 million rows and then moved on to use 12.5M rows, 25M rows and 50M rows.  (I'd previously established that my hardware runs out of breath somewhere between 50M and 100M rows.)<br />
<br />
For an ER diagram and other background on the database and queries, see <a href="http://community.ingres.com/forum/blogs/rhann/58-wikistats-queries.html" target="_blank">my earlier posting</a>.<br />
<br />
<b>Q_UNIQ</b> <br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 82px;
		text-align: left;
		overflow: auto">SELECT count(distinct ps.page_id) 
FROM pagestat ps JOIN datesinfo di 
ON  di.id=ps.date_id  
WHERE di.calmonth=2 and di.calyear=2010;</pre>
</div><b>4.0147 secs</b> @ 6.25M rows<br />
<b>9.0174 secs</b> @ 12.5M rows<br />
<b>20.7797 secs</b> @ 25M rows<br />
<b>47.6740 secs</b> @ 50M rows<br />
<br />
<b>Q_DAYSTAT</b><br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 114px;
		text-align: left;
		overflow: auto">SELECT di.caldate, sum(ps.page_count) 
FROM pagestat ps JOIN datesinfo di 
ON di.id=ps.date_id  
WHERE di.calmonth=2 and di.calyear=2010 
GROUP BY di.caldate 
ORDER BY di.caldate;</pre>
</div><b>0.3245 secs</b> @ 6.25M rows<br />
<b>0.6406 secs</b> @ 12.5M rows<br />
<b>1.2530 secs</b> @ 25M rows<br />
<b>2.4830 secs</b> @ 50M rows<br />
<br />
<b>Q_TOP_PROJECTS</b> <br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 130px;
		text-align: left;
		overflow: auto">SELECT FIRST 20 pj.project, sum(ps.page_count) AS sm 
FROM pagestat ps JOIN datesinfo di ON di.id=ps.date_id 
                 JOIN projects pj ON pj.id=ps.project_id  
WHERE di.calmonth=2 
  AND di.calyear=2010 
GROUP BY pj.project 
ORDER BY sm DESC;</pre>
</div><b>0.7331 secs</b> @ 6.25M rows<br />
<b>1.5993 secs</b> @ 12.5M rows<br />
<b>2.8464 secs</b> @ 25M rows<br />
<b>5.8379 secs</b> @ 50M rows<br />
<br />
<b>Q_PROJECT_SUM</b><br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 66px;
		text-align: left;
		overflow: auto">SELECT ps.project_id, sum(ps.page_count) AS sm
FROM pagestat ps
GROUP BY ps.project_id;</pre>
</div><b>0.1906 secs</b> @ 6.25M rows<br />
<b>0.5411 secs</b> @ 12.5M rows<br />
<b>0.7768 secs</b> @ 25M rows<br />
<b>1.7320 secs</b> @ 50M rows<br />
<br />
<b>Q_HOURPROJECTSTAT</b><br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 114px;
		text-align: left;
		overflow: auto">SELECT di.dayhour, sum(ps.page_count) sm
FROM pagestat ps JOIN datesinfo di ON di.id=date_id
                 JOIN projects pj ON pj.id=ps.project_id
WHERE di.caldate='2010-02-16'
  AND project='es'
GROUP BY di.dayhour;</pre>
</div><b>0.1542 secs</b> @ 6.25M rows<br />
<b>0.2881 secs</b> @ 12.5M rows<br />
<b>0.5477 secs</b> @ 25M rows<br />
<b>1.0783 secs</b> @ 50M rows<br />
<br />
<b>Q_NOPROJECTS</b> <br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 82px;
		text-align: left;
		overflow: auto">SELECT pj.project 
FROM projects pj LEFT OUTER JOIN pagestat ps ON ps.project_id = pj.id 
WHERE ps.project_id IS NULL 
ORDER BY project;</pre>
</div><b>0.3367 secs</b> @ 6.25M rows<br />
<b>0.8100 secs</b> @ 12.5M rows<br />
<b>1.3274 secs</b> @ 25M rows<br />
<b>2.7934 secs</b> @ 50M rows<br />
<br />
<b>Q_NOPROJECTS_1</b><br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 194px;
		text-align: left;
		overflow: auto">SELECT DISTINCT pj.project  
FROM projects pj LEFT OUTER JOIN (SELECT distinct ps.project_id AS pid 
                                  FROM pagestat ps
                                  WHERE date_id IN (SELECT di.id 
                                                   FROM datesinfo di 
                                                   WHERE di.calmonth=2 
                                                     AND di.calyear=2010 
                                                     AND di.calday=16)) t1 
ON (pj.id=t1.pid) 
WHERE t1.pid IS NULL 
ORDER BY project;</pre>
</div><b>0.1123 secs</b> @ 6.25M rows<br />
<b>0.2107 secs</b> @ 12.5M rows<br />
<b>0.3935 secs</b> @ 25M rows<br />
<b>0.7832 secs</b> @ 50M rows<br />
<br />
<b>Q_DAYWEEKSTAT</b><br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 130px;
		text-align: left;
		overflow: auto">SELECT di.dayofweek, sum(ps.page_count) AS sm 
FROM pagestat ps JOIN datesinfo di ON di.id=ps.date_id 
                 JOIN projects pj ON pj.id=ps.project_id 
WHERE di.calmonth=2 
  AND di.calyear=2010 
  AND pj.project='de' 
GROUP BY di.dayofweek;</pre>
</div><b>0.1681 secs</b> @ 6.25M rows<br />
<b>0.3212 secs</b> @ 12.5M rows<br />
<b>0.6162 secs</b> @ 25M rows<br />
<b>1.2079 secs</b> @ 50M rows<br />
<br />
<b>Q_TOP50</b><br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 178px;
		text-align: left;
		overflow: auto">SELECT FIRST 50 pg.page, sum(ps.page_count) AS sm 
FROM pagestat ps JOIN datesinfo di ON di.id=ps.date_id 
                 JOIN projects pj ON pj.id=ps.project_id 
                 JOIN pages pg ON pg.id=ps.page_id
WHERE di.calmonth=2 
  AND di.calyear=2010 
  AND di.calday=18 
  AND pj.project='de' 
GROUP BY pg.page 
ORDER BY sm DESC;</pre>
</div><b>1.0425 secs</b> @ 6.25M rows<br />
<b>2.1400 secs</b> @ 12.5M rows<br />
<b>3.6009 secs</b> @ 25M rows<br />
<b>6.4686 secs</b> @ 50M rows<br />
<br />
Continued in <a href="http://community.ingres.com/forum/blogs/rhann/63-baseline-results-part-2.html" target="_blank">Part 2</a>...</div>

]]></content:encoded>
			<dc:creator>rhann</dc:creator>
			<dc:publisher>131</dc:publisher>
			<guid isPermaLink="true">http://community.actian.com/forum/blogs/rhann/62-baseline-results-part-1.html</guid>
		</item>
		<item>
			<title>Flattening a query</title>
			<link>http://community.actian.com/forum/blogs/rhann/61-flattening-query.html</link>
			<pubDate>Tue, 30 Mar 2010 11:20:52 GMT</pubDate>
			<description><![CDATA[So far, out of the 20 or 30 significantly complicated SELECT statements  I've thrown at VectorWise, it has simply executed them.   The most I've had...]]></description>
			<content:encoded><![CDATA[<div>So far, out of the 20 or 30 significantly complicated SELECT statements  I've thrown at VectorWise, it has simply executed them.   The most I've had to change is non-standard syntax like &quot;LIMIT 50&quot; at the end of the query to &quot;FIRST 50&quot; at the start.<br />
<br />
Only one of the queries (originally intended for MySQL) has caused it to complain:<br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 210px;
		text-align: left;
		overflow: auto">SELECT pj.project
FROM projects pj
     LEFT OUTER JOIN (SELECT DISTINCT ps.project_id AS pid
     FROM pagestat ps
     WHERE date_id IN (SELECT di.id
                       FROM datesinfo di
                       WHERE di.calmonth=2
                       AND di.calyear=2010
                       AND di.calday=16)) t1
    ON (pj.id=t1.pid)
WHERE t1.pid IS NULL
ORDER BY project;</pre>
</div>The problem is that Ingres wasn't able to flatten this query automatically, and VectorWise can run only flattened queries.<br />
<br />
Normally I say queries should be written in the way that seems obvious to <i>you</i> so they reveal the way you thought about the problem.  I believe that makes it easier to read and understand your code, and it is up to the optimizer to find an efficient way to materialize it.  So normally I wouldn't have a problem with this query although I might have written it differently myself and I am not sure why the subquery needs to be DISTINCT.<br />
<br />
Personally I would probably have written it like this in the first place:<br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 178px;
		text-align: left;
		overflow: auto">SELECT DISTINCT pj.project 
FROM projects pj
     LEFT JOIN ( pagestat ps JOIN datesinfo di
                               ON ps.date_id = di.id
                              AND di.calmonth = 2
                              AND di.calyear = 2010
                              AND di.calday = 16 )
     ON pj.id = ps.project_id
WHERE ps.project_id IS NULL
ORDER BY pj.project;</pre>
</div>But I don't want to be messing about re-writing every query that comes along, mostly because I don't want to have to understand what it does.  I'm lazy that way. <br />
<br />
In this case, the fix was to move the puzzling DISTINCT clause from the subquery to &quot;outer&quot; query, and after that it just worked: <br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 210px;
		text-align: left;
		overflow: auto">SELECT DISTINCT  pj.project
FROM projects pj
     LEFT OUTER JOIN (SELECT ps.project_id AS pid
     FROM pagestat ps
     WHERE date_id IN (SELECT di.id
                       FROM datesinfo di
                       WHERE di.calmonth=2
                       AND di.calyear=2010
                       AND di.calday=16)) t1
    ON (pj.id=t1.pid)
WHERE t1.pid IS NULL
ORDER BY project;</pre>
</div>On the basis that all but one of the queries worked fine as they came, and this one was probably coded the way it was because of some mental quirk on the part of its author, it seems highly likely to me that automatically generated queries stand a very good chance of just working too.   Fingers crossed.</div>

]]></content:encoded>
			<dc:creator>rhann</dc:creator>
			<dc:publisher>131</dc:publisher>
			<guid isPermaLink="true">http://community.actian.com/forum/blogs/rhann/61-flattening-query.html</guid>
		</item>
		<item>
			<title>Upgrade Done</title>
			<link>http://community.actian.com/forum/blogs/rhann/60-upgrade-done.html</link>
			<pubDate>Tue, 30 Mar 2010 07:00:08 GMT</pubDate>
			<description><![CDATA[After a number of interruptions I got the beta-release of VectorWise 1.0 installed.  It wasn't an upgrade really, so I have spent most of my time...]]></description>
			<content:encoded><![CDATA[<div>After a number of interruptions I got the beta-release of VectorWise 1.0 installed.  It wasn't an upgrade really, so I have spent most of my time loading the data.    I don't imagine that future upgrades will be so disruptive, but even if they sometimes are, the fact that the database is effectively read-only means I'll be reloading it at intervals anyway. <br />
<br />
I haven't had much chance yet to thoroughly exercise it by running queries.<br />
<br />
One thing I <i>have</i> noticed is that VectorWise 1.0 makes better use of memory.  Just eye-balling the system monitor graph while building some of my tables (which involves doing some joins), memory use is down by a quarter or maybe even a third.</div>

]]></content:encoded>
			<dc:creator>rhann</dc:creator>
			<dc:publisher>131</dc:publisher>
			<guid isPermaLink="true">http://community.actian.com/forum/blogs/rhann/60-upgrade-done.html</guid>
		</item>
		<item>
			<title>The Wikistats Queries</title>
			<link>http://community.actian.com/forum/blogs/rhann/58-wikistats-queries.html</link>
			<pubDate>Fri, 26 Mar 2010 12:30:46 GMT</pubDate>
			<description><![CDATA[Marcin asked how complex my big table is.  I responded that I've been working on a posting about it but it's not ready to publish in full.  Thinking...]]></description>
			<content:encoded><![CDATA[<div>Marcin asked how complex my big table is.  I responded that I've been working on a posting about it but it's not ready to publish in full.  Thinking about it though, there's no reason I can't talk about the database design or the queries themselves, so here goes.<br />
<br />
Because I can't take away any real customer data to use in my VectorWise testing, and because I didn't want to spend a lot of time sitting in a customer's office running potentially unstable alpha-release software, and 'cos I'd already exhausted the limited challenges of my training database, I needed a substantial publicly available database on which to run realistic queries.  A few minutes on Google turned up the Wikipedia statistics and the work being done with them by Percona.com. They have proposed some <a href="http://www.percona.com/docs/wiki/benchmark:wikistat:start#queries" target="_blank">interesting queries</a> on the data, and they've helpfully named them (though they don't always use the names consistently, e.g. Q_TOP_PROJECTS is also referred to as Q_TOP20_PROJECTS).<br />
<br />
The <a href="http://www.percona.com/docs/wiki/_media/benchmark:wikistat:wikistat.png" target="_blank">database design </a>is a simple star schema like you find in data warehousing applications.<br />
<br />
There is a rapidly growing mountain of data available from Wikipedia so I grabbed just 28-days worth to play with.  When I loaded it into the pagestat table I ended up with over three billion rows.   Unfortunately my desktop machine has too little memory to run all the interesting queries with the full dataset, but I can create <a href="http://community.ingres.com/forum/blogs/rhann/59-upgrade-day-creating-sample-datasets.html" target="_blank">smaller subsets</a> to make a start on testing how response time grows with data volume and maybe also attempt to extrapolate what is possible in principle if I had enough RAM.  Small datasets also allowed me to fully test <a href="http://community.ingres.com/forum/blogs/rhann/56-how-full-featured-vectorwise-select.html" target="_blank">how much of the Ingres SQL dialect is usable with VectorWise</a>.<br />
<br />
Just to tantalize you a little more, here's the running times for the Q_TOP50 query on 6.25M rows (0.2487 secs), 12.5M rows (0.4051 secs), 25M rows (0.7008 secs) and 50M rows (1.2959 secs).  I'll leave the extrapolation as an exercise for the reader. :)<br />
<br />
More soon.  I promise.</div>

]]></content:encoded>
			<dc:creator>rhann</dc:creator>
			<dc:publisher>131</dc:publisher>
			<guid isPermaLink="true">http://community.actian.com/forum/blogs/rhann/58-wikistats-queries.html</guid>
		</item>
		<item>
			<title>Upgrade Day, and Creating Sample Datasets</title>
			<link>http://community.actian.com/forum/blogs/rhann/59-upgrade-day-creating-sample-datasets.html</link>
			<pubDate>Fri, 26 Mar 2010 09:24:33 GMT</pubDate>
			<description><![CDATA[I've not posted anything for a few days, and it looks like today is going to be spent unloading my database so I can install VectorWise 1.0.  As much...]]></description>
			<content:encoded><![CDATA[<div>I've not posted anything for a few days, and it looks like today is going to be spent unloading my database so I can install VectorWise 1.0.  As much as I've enjoyed playing with the 0.8 release, the extra features in 1.0 are too inviting to ignore, and besides, what would be the point of familiarizing myself with the features (and limitations) of a stale version?<br />
<br />
I'm using unloaddb in the usual way.  Most of the most frequently used utilities work with VectorWise, with only a few documented exceptions.  In this case unloaddb is unloading a mix of VectorWise tables and classic Ingres tables from the same database.  I did hit a snag with the script generated by unloaddb, but I reckon it's alpha software and besides it was easily fixed.  I logged the problem anyway.<br />
<br />
To kill some time while I am waiting for the unload to finish I thought I'd show how I created subsets of my largest table to produce quick test cases.  As I have mentioned repeatedly, one of my tables is really enormous.  Although I can run some queries on it, others are infeasible owing to the lack of physical resources on the machine I'm using.  Also, no matter how fast VectorWise is, there are times when I just want to prove that a sequence of steps produces a correct result so I want nearly instant response time and a dataset I can check by other means.  I'm not going to go through a 3.3 billlion row table by hand to verify VectorWise's arithmetic.<br />
<br />
I decided to take my big table and produce a series of subsets of it, each half the size of the previous one.  The idea is that I can start any testing with the smallest subset (6.25 million rows).  It is small enough to load into classic Ingres to run verification queries in a reasonable amount of time, and it gives nearly instant response times using VectorWise.  I can then use a set twice the size to see if that works within my limited resources, and if it does, I can double the size again, and so on until I find the limit.<br />
<br />
To preserve a realistic distribution of values I created the first subset by randomly choosing half the rows from  3.3 billion row table. Every row had an equal probability of being selected.  It took a bit less than 55 minutes to create a 1.6 billion row subset like so: <div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 82px;
		text-align: left;
		overflow: auto">create table subset as
select * from  rawdata
where randomf() &lt; .5
with structure=vectorwise;</pre>
</div>I repeated the procedure on that subset to get an 800 million row subset, then again on that one to get 400 million, and so on, finishing up with a 6.25 million row subset.  <br />
<br />
The subsets are subsets of my rawdata.  Each subset was used to populate the tables of its corresponding star schema.  It was easy to verify that this procedure had successfully chosen a representative sample preserving the relative frequency of key values so that the only difference was the size of the dataset.</div>

]]></content:encoded>
			<dc:creator>rhann</dc:creator>
			<dc:publisher>131</dc:publisher>
			<guid isPermaLink="true">http://community.actian.com/forum/blogs/rhann/59-upgrade-day-creating-sample-datasets.html</guid>
		</item>
		<item>
			<title>Big Tables</title>
			<link>http://community.actian.com/forum/blogs/rhann/57-big-tables.html</link>
			<pubDate>Tue, 23 Mar 2010 07:57:29 GMT</pubDate>
			<description><![CDATA[I've been droning on about the speed of VectorWise in the last few posts because that is its most eye-catching feature.   
 
Before I allow my...]]></description>
			<content:encoded><![CDATA[<div>I've been droning on about the speed of VectorWise in the last few posts because that is its most eye-catching feature.  <br />
<br />
Before I allow my attentention to be drawn back to speed again, I do want to say a few words about another quality which is probably at least as important if not quite so glamorous, and that is the compactness of a VectorWise database.  <br />
<br />
VectorWise tables are potentially very much smaller than classic Ingres tables containing the same information.  VectorWise tables are compressed to overcome the limited bandwidth of the disk subsystem and even main memory.  The compression algorithm is designed to allow maximum information to be moved as fast as possible by compressing it without incurring too much computational overhead.   This makes the database pretty small but also allows it to be inflated pretty fast.   (It could be made smaller, but doing so would make it slower which would defeat the point of it.)<br />
<br />
As an example, I have a table with 3.3 billion 1094-byte rows which occupies 180Gb.  The equivalent classic Ingres B-tree table using 16k pages would need about 5Tb just for the data pages, never mind the overheads.  With normal Ingres DATA compression and a bit of luck you might get it down to a quarter of that, but it would still be six or seven times bigger than the VectorWise table.<br />
<br />
There are all sorts of obvious benefits from the compression even if we ignore the speed.   For a start, it means we can keep an entire datawarehouse on a desktop or a laptop.  It also means we can start to think of using SSD (solid-state disk) without having to spend squillions to buy enough capacity.   SSDs dramatically reduce power requirements (needing as little as 1/30th as much as a conventional disk), which is good if you have CO2 reduction targets to meet or just power bills to slash.  You can also put them in places you wouldn't put disks.<br />
<br />
SSDs are also potentially very, very fast, offering transfer speeds orders of magnitude faster than disks, and I/O rates two orders of magnitude faster.  (Hmm.  I seem to be gnawing on the ol' speed bone again.)  <br />
<br />
Other advantages of compactness are reduced time to do backups; the ability to retain more backups; the ability to retain data you might once have reluctantly deleted, or just the ability to keep pace with the <a href="http://www.economist.com/surveys/displaystory.cfm?story_id=15557443" target="_blank">unstoppable torrent</a> of data your business is generating.<br />
<br />
Of course if you have a small VecorWise table you can always choose not to compress it.</div>

]]></content:encoded>
			<dc:creator>rhann</dc:creator>
			<dc:publisher>131</dc:publisher>
			<guid isPermaLink="true">http://community.actian.com/forum/blogs/rhann/57-big-tables.html</guid>
		</item>
		<item>
			<title>How Full-Featured is the VectorWise SELECT?</title>
			<link>http://community.actian.com/forum/blogs/rhann/56-how-full-featured-vectorwise-select.html</link>
			<pubDate>Thu, 18 Mar 2010 16:19:26 GMT</pubDate>
			<description><![CDATA[I have been tied up with other things for the last couple of days so I haven't got around to capturing the results of querying my big database yet. ...]]></description>
			<content:encoded><![CDATA[<div>I have been tied up with other things for the last couple of days so I haven't got around to capturing the results of querying my big database yet.  For now, I'm going to report some more very early experiences from a couple of weeks ago.<br />
<br />
The database I'm writing about today implements a simple star-schema for analysing Wikipedia page statistics.  The Wikipedia statistics are freely available like everything else on Wikipedia, and they are ideal for my purpose here because it's not synthetic data and the dataset is immense (growing at about 5 million rows/hour).<br />
<br />
To start with I loaded three hours' data, which is about 15 million rows.  Because the raw data is not in a form suitable for loading directly into the star schema I needed to do some preliminary processing on it (e.g. attaching surrogate keys to the rows in the dimension tables, and also working out what day of the week each date is).  VectorWise doesn't currently allow updating so I had to pre-process the data using a mix of Linux tools and classic Ingres before loading it.  (I should have used a proper ETL tool but my ETL expert wasn't available, and I was impatient.)  Once I'd loaded the pre-processed data I was able to construct my fact table using VectorWise with a <font face="Courier New">CREATE TABLE pagestat AS SELECT...WITH STRUCTURE=VECTORWISE</font> statement.  It could hardly be easier.  (Unfortunately I don't have a record of how long it took, but nothing took more than a few seconds.)<br />
<br />
Let's take a look at some queries.  One of my tables is called <b>pages</b>.  How many rows in it?<br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 226px;
		text-align: left;
		overflow: auto">-- report database size
SELECT count(*) from pages;
Fri Mar  5 15:00:41 2010
Executing . . .


+----------------------+
|col1                  |
+----------------------+
|              14994911|
+----------------------+
(1 row)
Fri Mar  5 15:00:41 2010</pre>
</div>That took much less than a second to count almost 15 million rows.  That is going to be handy in future.  I frequently want to use a row count in a more complex calculation and now instead of computing it ahead of time and storing it in a session global temporary table and writing some potentially obscure SQL every time I need the count, I can just do a count.  <br />
<br />
Let's try something more complex.  For each day in a specific month, how many page impressions were delivered?<br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 306px;
		text-align: left;
		overflow: auto">-- Q_DAYSTAT (year,month) 
SELECT di.caldate, sum(ps.page_count) 
FROM pagestat ps JOIN datesinfo di 
ON di.id=ps.date_id  
WHERE di.calmonth=2 and di.calyear=2010 
GROUP BY di.caldate 
ORDER BY di.caldate;
Fri Mar  5 15:00:55 2010
Executing . . .


+----------+----------------------+
|caldate   |col2                  |
+----------+----------------------+
|2010-02-16|              87899109|
+----------+----------------------+
(1 row)
Fri Mar  5 15:00:57 2010</pre>
</div>So far I have data for just three hours so there is only one date.  This was a less trivial query though, and it took about 2 seconds to query the pagestat table's ~14 million rows.<br />
<br />
I'm not going to bother reporting any more times using this little database running on a puny desktop machine. (I'll come back to the speed reports when I look at my big database.)  As I stated in my first posting about my objectives, I am more interested in functionality.  VectorWise is currently more or less read-only, but it supports nearly all the Ingres SQL SELECT syntax.  Here are some examples of queries that it successfully executes:<br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 146px;
		text-align: left;
		overflow: auto">-- Q_TOP_PROJECTS ( N,year,month )
SELECT FIRST 20 pj.project, sum(ps.page_count) AS sm 
FROM pagestat ps JOIN datesinfo di ON di.id=ps.date_id 
                 JOIN projects pj ON pj.id=ps.project_id  
WHERE di.calmonth=2 
  AND di.calyear=2010 
GROUP BY pj.project 
ORDER BY sm DESC;</pre>
</div><div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 98px;
		text-align: left;
		overflow: auto">-- Q_NOPROJECTS(Year,Month,Day) 
SELECT pj.project 
FROM projects pj LEFT OUTER JOIN pagestat ps ON ps.project_id = pj.id 
WHERE ps.project_id IS NULL 
ORDER BY project;</pre>
</div><div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 162px;
		text-align: left;
		overflow: auto">-- Q_10000HITS_2(Year,Month,Day)
SELECT FIRST 50  pg.page, t.mx 
FROM pages pg JOIN (SELECT ps.page_id, max(ps.page_count) AS mx  
                    FROM pagestat ps JOIN datesinfo di ON di.id=ps.date_id 
                    WHERE di.calmonth=2 
                      AND di.calyear=2010 
                      AND di.calday=16   
                    GROUP BY ps.page_id  
                    HAVING max(page_count) &gt; 10000) t ON t.page_id=pg.id;</pre>
</div>The SQL need not be particularly elegant:<br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 210px;
		text-align: left;
		overflow: auto">-- Q_NOPROJECTS_1(Year,Month,Day)
SELECT pj.project 
FROM projects pj LEFT OUTER JOIN (SELECT distinct ps.project_id AS pid 
      FROM pagestat ps
      WHERE date_id IN (SELECT di.id 
             FROM datesinfo di 
             WHERE di.calmonth=2 
                  AND di.calyear=2010 
                  AND di.calday=16)) t1 
ON pj.id=t1.pid 
WHERE t1.pid IS NULL 
ORDER BY project;</pre>
</div>VectorWise is not <i>entirely</i>  read-only:<br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 258px;
		text-align: left;
		overflow: auto">CREATE TABLE pagestat_daily 
(
   page_id int NOT NULL,
   date_id_from smallint  NOT NULL,
   date_id_to smallint  NOT NULL,
   project_id smallint  NOT NULL,
   page_count int  NOT NULL 
) WITH STRUCTURE=VECTORWISE;

INSERT INTO pagestat_daily (project_id, page_id, date_id_from, date_id_to, page_count ) 
SELECT ps.project_id, ps.page_id, min(ps.date_id), max(ps.date_id), sum(ps.page_count) 
FROM pagestat ps JOIN datesinfo di ON ps.date_id=di.id 
WHERE calmonth=2 
  AND calyear=2010 
GROUP BY ps.project_id, ps.page_id, di.caldate;</pre>
</div>There is one caveat that probably should be borne in mind.  If your query contains a subquery, as many of these do, VectorWise can execute it only if the subquery can be flattened.  Ingres always tries to flatten queries and is very good at it; it automatically flattened all the ones above.  But not all queries can be flattened automatically.  If you run into one of those cases—which so far I haven't—you will have to re-write it so that it can be.</div>

]]></content:encoded>
			<dc:creator>rhann</dc:creator>
			<dc:publisher>131</dc:publisher>
			<guid isPermaLink="true">http://community.actian.com/forum/blogs/rhann/56-how-full-featured-vectorwise-select.html</guid>
		</item>
		<item>
			<title><![CDATA[It's Not a Gold-Plated Lily (Yet)]]></title>
			<link>http://community.actian.com/forum/blogs/rhann/55-its-not-gold-plated-lily-yet.html</link>
			<pubDate>Tue, 16 Mar 2010 11:49:58 GMT</pubDate>
			<description><![CDATA[Judging from some of the very enthusiastic comments I've received, I thought I should post a quick item on what VectorWise can't do (yet). 
 
For...]]></description>
			<content:encoded><![CDATA[<div>Judging from some of the very enthusiastic comments I've received, I thought I should post a quick item on what VectorWise can't do (yet).<br />
<br />
For very good and obvious reasons the developers have had to set some priorities for the initial release of VectorWise, so it won't do everything that is possible in principle right from Day 1.  The alpha-release and the beta-release that you'll see soon is intended for analytics and BI.<br />
<br />
In practice this means that you can load data into it, you can query it with very few restrictions, and you can load more data into it.  You can also clear out tables to load new data.  But that is about it.  In particular you can't update tables, and nothing is transactional.  You won't be able to use VectorWise tables for OLTP for quite some time yet.<br />
<br />
Another notable limitation at present is that a single query cannot refer to a mix of classic and VectorWise tables, although both types of table can exist in a single database.  (There is a way to get around this limitation, but I'll save that for another day.)</div>

]]></content:encoded>
			<dc:creator>rhann</dc:creator>
			<dc:publisher>131</dc:publisher>
			<guid isPermaLink="true">http://community.actian.com/forum/blogs/rhann/55-its-not-gold-plated-lily-yet.html</guid>
		</item>
		<item>
			<title>Running Some Hello World Queries: Instant Gratification</title>
			<link>http://community.actian.com/forum/blogs/rhann/54-running-some-hello-world-queries-instant-gratification.html</link>
			<pubDate>Tue, 16 Mar 2010 10:12:06 GMT</pubDate>
			<description>As we saw, loading data into VectorWise tables is no different than loading it into conventional Ingres tables.  Almost all the same data types are...</description>
			<content:encoded><![CDATA[<div>As we saw, loading data into VectorWise tables is no different than loading it into conventional Ingres tables.  Almost all the same data types are now supported, with the exception of long &quot;VAR&quot; types, the Ingres DATE type (use ANSI dates instead), and a couple of other fairly obscure ones.  <br />
<br />
I finished loading my &quot;Hello World&quot; database in two or three seconds.  It took another couple of seconds to run <b>optimizdb </b>on the VectorWise tables.  (Running optimizedb is currently an obligatory manual step when using VectorWise, but that is expected to be automated in the next release.)   My classic Ingres tables took about the same time to load, give or take, plus I took the time to index them in a reasonable way.  <br />
<br />
The complete model is shown below.<br />
<img src="http://community.actian.com/forum/blogs/rhann/attachments/2d1268731373-running-some-hello-world-queries-instant-gratification-vectorwise-sm.jpg" border="0" alt="Name:  vectorwise-sm.jpg
Views: 354
Size:  19.1 KB" style="margin: 2px" /><br />
<br />
So, time to run some queries.  The first one was:<br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 146px;
		text-align: left;
		overflow: auto">-- Show filled order lines for part numbers 
-- in the range 325000000 to 330000000

SELECT *
FROM orders o, orderitems oi
WHERE o.filled = 'Y'
    AND oi.partnr BETWEEN 325000000 AND 330000000
    AND o.ordernr = oi.ordernr;</pre>
</div>Classic Ingres took 1.5371 seconds to complete this query.  Ingres with VectorWise took 0.0688 seconds, which is a little more than <b>22 times faster</b>.  Well, that was pleasing, especially considering that by this point I'd accumulated a total of about 10 minutes experience with the product since first installing it.<br />
<br />
The next one I tried was this:<br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 162px;
		text-align: left;
		overflow: auto">-- Show profit by division and credit status

SELECT d.division,c.creditok,
     sum(oi.qty*oi.price)-sum(oi.qty*p.price) AS profit
FROM customers c JOIN orders o ON c.customernr = o.customernr
    JOIN divisions d ON o.divisionnr = d.divisionnr
    JOIN orderitems oi ON o.ordernr = oi.ordernr
    JOIN parts p ON p.partnr = oi.partnr
GROUP BY d.division,c.creditok</pre>
</div>This query, which joins all my tables with no constant restrictions took 3.3486 seconds with classic Ingres and 0.2111 seconds with VectorWise, making it almost 16 times faster on this query.  Considering how hard we're usually willing to work to speed queries up by factors of 2 or 3, that's pretty darned good.  I now have a total of 15 minutes experience with the product--most of it spent typing the commands.<br />
<br />
Let's try one last one:<br />
<div style="margin:20px; margin-top:5px">
	<div class="smallfont" style="margin-bottom:2px">Code:</div>
	<pre class="alt2" dir="ltr" style="
		margin: 0px;
		padding: 6px;
		border: 1px inset;
		width: 640px;
		height: 178px;
		text-align: left;
		overflow: auto">-- Show the profit on the highest priced item 
-- on each unfilled order
SELECT o.ordernr, p.description,
    oi.qty * (oi.price - p.price) as profit
FROM orders o JOIN orderitems oi ON o.ordernr = oi.ordernr
    JOIN parts p ON p.partnr = oi.partnr
WHERE o.filled = 'N'
    AND oi.price = (SELECT MAX(price)
                    FROM orderitems oi1
                    WHERE oi1.ordernr = oi.ordernr)</pre>
</div>Classic Ingres didn't do too badly with this one, taking 0.7243 seconds over it.  Ingres with added VectorWise took 0.1363 seconds; a bit more than 5 times faster.<br />
<br />
Apart from the obvious fact that VectorWise is a lot faster, so much so that clear differences are visible even with tiny toy databases, it is worth pointing out that VectorWise is lending its processing speed to Ingres.  Ingres is providing the SQL parser (and optimizer) so most of your existing Ingres queries will just work, althoug there are currently a few limitations that I'll discuss in more detail in future.  I will also start looking at some properly big tables next time.</div>

]]></content:encoded>
			<dc:creator>rhann</dc:creator>
			<dc:publisher>131</dc:publisher>
			<guid isPermaLink="true">http://community.actian.com/forum/blogs/rhann/54-running-some-hello-world-queries-instant-gratification.html</guid>
		</item>
	</channel>
</rss>

