Strix Devlogs

Post a reply

This question is a means of preventing automated form submissions by spambots.
:D :) ;) :( :o :shock: :? 8-) :lol: :x :P :oops: :cry: :evil: :twisted: :roll: :!: :?: :idea: :arrow: :| :mrgreen: :geek: :ugeek:

BBCode is ON
[img] is ON
[flash] is OFF
[url] is ON
Smilies are ON

Topic review

Expand view Topic review: Strix Devlogs

Re: Strix Devlogs

by dendiz » Tue Dec 11, 2018 6:13 am

So some more progress this week, mostly optimization to existing jobs - I needed to get things running smoothly before I can concentrate on the new features I have planned. The issues for the new features are on gitea waiting to be tackled but accumulating too much technical debt makes in harder in the long run to get a smooth running system. Here's a summary of this devlog:

- stoch overfiring
- fix premature exit in predictions

- Perf Eval runs in parallel now
- parallel scans, ohlcv
- Module run optimizations: don't run if no new data, don't sync if no data on IEX
- Skip untriggered events in perf eval
- Class property connections on dao
- Parallel correlation proc
- sql2o batching sux, implement grouped insert
- Save ohlcv to KV store for faster access
- track last scan date
- extract exchange param (job pipeline optimization)

- Turtle exit for predictions
- BBSqueeze detection change

New features
- Bollinger/ma/ohlcv in KV store, display charts for scans

First off, a couple of bug fixes. Finding bugs in this system notoriously difficult as testing isn't done on behaviors but generated data. Verifying the data against a source of truth is quite time consuming and I only have a certain amount of time I can dedicate to manual testing. One of the defects that caught my attention this week was stochastic signals firing for both oversold and overbought on the same day, which is absolute non-sense. At first I thought it a date issue because the symbols I checked were from the top gainers and had huge increases that could pull up the stochastic from oversold to overbought in a day. My estimate was that the stochastic was oversold on day T-1 and became overbought on day T. But deeper investigation and hours of debugging showed that this was not the case. The triggering code for the indicator is fairly straight forward

Code: Select all

if ("overbought") && d[len - 2] < ob && ob < d[len - 1]) {
            return new ScanResult(data, def);
        if ("oversold") && d[len - 2] > os && os > d[len - 1]) {
            return new ScanResult(data, def);

        if ("neutral") && (d[len - 2] > ob && ob > d[len - 1]) ||
                (d[len - 2] < os && os < d[len - 1])) {
            return new ScanResult(data, def);
There is subtle bug in this code even though it looks quite simple. The 3rd condition: a & b | c. The precedence order is equal for both operators so it would trigger for both overbought and oversold :roll: The correct code is a & (b | c). Yet again the great syntax of Java produces a bug that is easy to miss and will make you go blind in the process. Well it did cost a couple of hours but at least it was an easy fix.

Another bug that I introduced during the performance evaluator developments came to light after I tackled some optimization tasks on that module. I was expecting way more scans to be score than there were which led me to investigate the scoring code that revealed a premature exit from the evaluation loop. A misplaced return statement instead of a continue statement was causing the loop to terminate early and not score the remaining predictions. Another easy fix at least.

Optimizations were the meat of this weeks work. Originally I thought that running the modules on a single thread with good caching would suffice in terms of performance but I was wrong. :| So I went ahead to parallelize the portions of the modules that made sense. These were

- Performance Evaluations
- Scans
- Correlations processing

Thanks to Java 8 parallel streams this turned out to be quite easy. I just had to make sure that critical sections in code were atomic, and used classes that were thread safe. One aspect to take into consideration is to preserve the locality of the cache and not to evict items that from the cache only to have a second loop query that item again. So the order of processing is important.

Code: Select all

symbols -> combos -> dates
will make sure that the cache contains the symbol data ready to be processed and will not run a DB query. Initially I had just parallelized the dates loop but that loop doesn't contain enough data to make it worth while. The CPU cores were only busy at 60% which is not ideal. So I moved the parallel streaming up 2 levels to the symbol level which now utilized all cores at 100%. PerfEval and Scans use the same code base so that was a bit easier than the Correlation Processor that needed some extra attention due to memory issues. I operate under a RAM constraint (because I need to keep server costs to a minimum) so I needed to implement a specific cache for correlation calculations that just holds the last 30 closing prices and symbols.

The seconds area of optimization was for syncing data from IEX and running scans on the synced data. I currently orchestrate the module jobs via Jenkins and the pipeline only supports triggering based on a fail/success return code. This means that every time I run a sync job, a scan job will trigger on success - even if there is no new data. It will just calculate the same results over and over again. This wasn't really a problem when the scan job only took a couple of minutes to complete with 1 years worth of data but it doesn't work with 5 years of data. So I implemented a check on the scan module that will only run the scans if the latest scan date for a symbol is less than the latest OHLC date. The job will still run but it will just skip the scans so it takes only about 3 - 4 minutes to complete as opposed to 1 - 1.5 hours. Another issue is with IEX not clearly defining when they will update the API with the days stock data. Previously I was fetching the data at 7 pm and 11 pm local time and processing the data even if the data was old (the one at 7 pm, they would have the data updated by 11 pm for sure). But this means that the new scans/signals are not shown until almost 12 am which is less than ideal. I didn't want to check every hour or two because it would trigger the whole job pipeline and it's a lot of data to download. My solution to this came after I discovered an API endpoint that listed the symbols, which had a date field that showed the time it was updated. Why didn't I just check a symbol for the last date? because at any given date a symbol may not be traded. The probability that AAPL not being traded is rather low, but still I prefer a robust solution if there is one. The cost of making this API call to the symbol list is low, so now I poll every hour for new data between 4 pm - 11 pm on the week days. There is still no way to abort the pipe line without a failure on Jenkins but with the scan checking this is now less of a problem.

A huge pain point was the duration of the Performance evaluation. I realized this week that I was doing a lot unnecessary processing that was causing the job to take forever. I only needed to calculate performance for the scans that had been triggered for that symbol, and I was running all the scans for that symbol. Duh! :roll: With a new filter that skips scans that are not relevant and parallel processing the performance evaluation now takes 1 hour to complete which is reasonable. Even though I won't be running this job that often it kind of became my holy grail to optimize this. So looking back at the comments on the issue on gitea the first iteration resulted in 1K scored scans and 6K unscored (this was due to the bug I mentioned previously, which at the time I didn't know). This was way to little so I thought I'd throw more data at it and increased the data interval to 5 years from 1. This increase resulted in 2.4K vs 6K. Still not good enough. After fixing the premature loop termination issue the final ratio is 5K/6K which to me looks OK. It is possible that some scans just didn't occur frequent enough to be scored.

The DAO layer also got some love this week. I had previously implemented the DAO's in a way that each operation would open a new connection to the DB even though this is not good practice. I didn't want to integrate a connection pooling solution as the processes are not long lived so I just refactored the connection objects to be reused class wide. I'm using a relatively new SQL library called sql2o which has a nice plain API for DB operations but the way they implemented batch insertions is not optimal. It just wraps the inserts around a transaction and still inserts each row individually. MySQL's grouped insert performs much better than this so I refactored batch inserts to generate a grouped insert query. This increased performance of the inserts quite a bit, even though I didn't measure by how much.

After examining the MySQL advisor on PhpMyAdmin I saw that it complained about a lot of row sorting. The culprit was that each query to the OHLCV table needed a sort by date for the caching and range query to work properly. The problem here is that I already got the data sorted from the data API and lost that information after the insertion into the OHLCV table. I though about getting rid of the table and querying from the JSON data directly but some of the functions in the TOP module and market overview module take advantage of this table to reduce the amount of code and off load processing to the database, so the table had to stay. I stored the API response in K/V store only for query by the ChartData component for caching. I also saw that selecting the last scan date from the scan_result table for doing full table scans so I save those in the K/V store too. This type of optimization can be good for performance but it's important not to let these data points get out of sync. I also implemented a fall back mechanism to querying the DB if the value was not found in the K/V store. This case can occur if a symbol is introduced and the last scan date is not yet inserted.

The Jenkins job pipeline also needs to be separated by exchange to reduce the amount of processing. This can be achieved by extracting the exchange parameter as a CLI parameter to the jobs. No need to run scans for IEX after BFX data synced, just run the BFX scans :P

I decided to change the way the prediction checker scores scans. My initial implementation was exit on +/- 2 x ATR of the symbol. This yielded around 50% average scores. Next I tried a 15 period low/high exit strategy. There is really no correct way of doing this, as strategy is very personal. My reasoning was that on an up trend the 15 period low would still decent gains and on a down trend it would exit quite early to cut losses. After the performance evaluation runs the average score was 0.23959762958591568 and average error was 0.17946781069881107 for a 95% confidence.
I will still run the strategies of 2xATR, 3xATR and a percentage based exit to determine their outputs. Maybe running multiple strategies and selecting the best performing one and showing that could also be a nice feature, but again it's very personal. I believe a 1:1 take profit / stop loss ratio will result in an average score of around 50%, and 2:1 will result with an average score of 33% as it's basically based on expected value since the price changes are distributed randomly.

I also changed the way the Bollinger Bands squeeze scan was working. It was scanning the last N periods and triggering if the last period bands were within a certain limit of the minimum band width of the last N periods. This is kind of over complicated as I just want to get the periods were the bandwidth is low, so now it will trigger if the bandwidth is < 4%.

New features
As I was sifting through the scans I realized that I was looking for a chart to see the relevant data that triggered the scan. If the scan is a Bollinger Bands squeeze I wanted to the bands on the chart. So I added this feature. This required me to store the band data in the K/V store because that's the only storage the API will access. To be consistent I also stored the last 100 period OHLC data in the K/V store to generate the candle stick data for the chart. The K/V store is backed by MySQL but this may change in the future. Redis is a strong contender for K/V storage, but I'll cross that bridge when I get there. I added a new field to the scan definitions file that defines which indicators will shown on the chart when that scan is triggered.
Here is a screenshot of this new feature in action:
Screenshot 2018-12-10 at 11.49.36 PM.png

Wow, yet another very long post for a short week. Looks like a lot has been done and development is going full steam ahead.

Re: Strix Devlogs

by dendiz » Mon Dec 03, 2018 4:08 am

A new month a new dev log for the stuff that's been going on :P Huge change yet again for the engine which I will get into later. First order of business is a summary of the new stuff:

  1. get rid of all but USD pairings for cryptos
  2. reduce symbol listing response size from 1.2M to 0.8M
  3. new scanner: trend start
  4. S/R zones to charts
  5. using TaLib java API
  6. Candle stick patterns scanners
  7. port back over to java w/o spring + hibernate
  8. eclim, idea + X forward, Che adventures
  9. market overview calculations in SQL
  10. store technical calculations in KV store
  11. correlation calculations in technical calculations
Good bye pairings
Some of the alt coins are worth so less that compared with BTC they end up taking 8-9 decimal places. This is a disaster for the display and the layout of the app in general. At first I had thought about displaying these types of currency pairs as satoshis but that was met with high resistance from my previous team members. The pair name is XXX/BTC so you cannot display it in Satoshi's was the justification. Fair enough I guess. Another fix for this could be decreasing the font size if there are a lot of decimal places, but I have tried this. It sounds like it should work in theory but I'd have to experiment with it to be convinced that it does. So for the time being the easiest solution is to drop pairings with BTC and keep only the pairs that are traded against the USD. All of the big coins are included in this so no big loss their.

slim responses
The autocomplete component of Vuetify requires a list of the items to complete (though I'm pretty sure their should be a version that can do partial searches with AJAX) and that list in the previous version was huge around 1.2 megabytes. The initial delay after the autocomplete trigger was around 2 seconds which made it appear to be not responding. My primary solution to this was to trigger a request to the symbol listing endpoint after the app loaded in the background and since requests are cached the searches would not have that initial lag. This worked out like I thought but the payload size is still more than I cared for so I got rid of some of the fields in the response and change the structure from a JSON object to an array dropping the keys and therefore reducing the size even further.

more scanners
A saw a scanner on STB that I wanted to incorporate as I think it's important: The new trend started scanner. It fires when the ADX crosses the 25 line. Even though ADX lags it's still useful to know that a trend has begun.

more charts
Data visualizations are always cool, you can never get enough of them. This motivated me to add a chart displaying the support and resistance zones calculated using the clustering method I blogged about a couple of weeks back.
Screenshot 2018-12-02 at 6.39.44 PM.png
I plan on adding charts to each triggered scan with the relevant indicators like MA's for MA crossovers etc. But it's a low priority task right now.

New technical analysis library is a mature technical analysis library that supports way more indicators that I built into talib4j. I was quite happy with the results I got when
using the python wrapper so it's back in. I plan on doing a write up on the performance vs talib4j as the code is probably transpiled from the C version and impossible to read. The Java API is god awful because of this generated code, which means the C API is just as disgusting. To limit the exposure to this filth I wrapper the API with a custom class and I can just substitute that for talib4j any time if I have all the indicators coded in.

Even more scanners
Using Talib also gives me access to candle pattern recognition. I didn't want to write a scanner for each of the patterns so I had to resort to using the java reflection API to figure out the correct method to call from the scan definition file. So the definition file is something like this:

Code: Select all

"id": "..."
"name": "some scanner":
"module": "CDL2CROWS"
I had all the definitions generated from the documentation on the talib site with a small python script (around 60 different scans) and look up the method from the module attribute of the definition. With the addition of all the candle patterns each run consists now of around 16K scans!

Back to the drawing board
Now that I am running 16K scans the python code took 48 hours to complete a days worth of scans. This of course is unacceptable as the scans need to be done quickly. It's no use to display scans 2 days after the market closes. I poked around with multi threading in the python code but to no avail as there is this thing called the "Global Interpreter Lock" which limits you to the total performance of a single core. Also using multi threaded code consume too much memory - something which I will not have that much of in production. So back to a faster language: Java. But this time I didn't want all the bloat that comes with spring + hibernate. They consume way too much memory that was one of the reasons for going with python. So this time around it's a plain old Java application with small utility libraries for database operations. Now 16K scans take 1 hour to complete without multi threaded code. I was initially thinking of keeping the python engine for other stuff like synchronization, and technical calculations etc. but ended up porting all the engine code back to Java.

Have Chromebook will code
I got a new toy from the Black Friday sales events so naturally I want to use it all the time. But it's not buffed in terms of hardware so I needed to find a way of coding Java without running a full blown IDE on my Chromebook. It would probably run the IDE OK but the engine + database would put to much strain and it would start to crawl. So I started coding in Emacs. Plain old Emacs with no packages. It sounds crazy and it actually is quite crazy. It was nice just suspending the machine and reattaching the session to continue where I left off, but no syntax checking, not auto imports makes it a hassle. Not being able to see the parameters of a method call was the worst. So I checked out what packages people were using for coding on Emacs. The most effective one seemed to be Eclim. I had used Eclim before with vim (which is the original purpose of Eclim: Eclipse + vim) but somebody had wrote a wrapper around the binary for Emacs. I tried to get it running on my Debian development server but I could not get it to work. It would just not connect. So I gave up on Eclim and checked out another package called Megahanada. Found that one too complicated and didn't really find the functions it provided useful. Then I tried X forwarding with Intellij Idea. This felt like home a familiarity that was much appreciated. This lasted for a week or 2 until the inefficient X protocol drove me nuts with the stupid lagging of the UI. I looked around for some alternatives and came across xpra which was supposed to be faster but really wasn't. And the fonts and graphics were blurry with Xpra so it went out the window. I thought that maybe it was Swing that wasn't playing nicely so I tried X forwarding with VS Code. Same problems. But on a side note I really liked the Java plugin for VS Code - it's lightweight and provides all the great functionality that I was searching for. Anyways then I remembered Eclipse Che - A web based IDE. I had tried Che before in it's early stages and wasn't really impressed with it's current state. But know they've created a Docker image which is super easy to install and get running and they've also added Git support so worst I could use Che to write the code push and compile/test on my development server. But Che workspaces already come with Java + Maven so I could basically do everything I wanted in the Che environment. Another bad thing with X Forwarding was that once the laptop suspended the SSH connection was lost and the applications died on me. Since I have leave the computer to do other stuff (like burping, diaper changes, etc) this was bugging me quite a bit. Now in Che the tab remains open and I can continue where I left off without a problem.

bugs bugs bugs
Porting over code is also a great chance to check what I've actually written. I figured out major bugs in the technical calculations code which I fixed. I was doing all of these calculations in application code, which is probably slower than doing them on the DB side so I moved the things I could calculate on the DB over there.

few architectural changes
I was letting the API calculate some of the data on the symbol detail view on the fly, but now I decided to have these pre-calculated for the last trade date and store them in the key/value store. I really wan't refrain from having the API do calculations to consolidate all the logic on the engine code base. This meant moving the technical and overview tab data to the key/value store. I also wanted to merge these to requests as they did have a couple of overlapping values. I also decided to store the correlation data in the key/value store as it was taking up a lot of rows in the table. I was keeping a history of the correlated items which is probably not necessary. While at it I also removed the dedicated job for correlated calculations and merged it into the technical calculations.

Wow that was quite a long post - a lot has happened in the past week.

Re: Strix Devlogs

by dendiz » Tue Nov 20, 2018 6:31 am

So I've been working on the engine in past 2 weeks every chance I get, which is not a lot these days. I've also had to adapt the
web client to some of the breaking API changes (mostly fields names, and simple structure changes). With major changes in the database structure, some of the old tables being merged into the key value table, the engine code is a bit clearer now. MYSQL should be able to handle the queries to the KV table with ease as it's properly indexed and the record cardinality is much lower because most of the data is stored in a JSON structure now. I implemented the top activity module, correlation and the news sync module. The correlation finder takes a long time as python is kind of slow when iterating over a lot of records and each symbol has to be checked against all other symbols to find a correlation. This made me want to switch to something faster but I resisted the urge as the correlation finder can be run maybe every other week and it's OK if it takes 12 hours to run. I also got rid of the old API code that was hogging memory thanks to spring + hibernate storing tons of classes and garbage in memory. I went with flask which is a simple micro framework for creating API's. Currently I create a new connection for each request to the database and I need to test if this will scale under the load. What I have read is that the old "connections are expensive" is now a myth with newer databases, but still the network overhead could prove this theory wrong. In the second half of this month I adjusted the web client code to the new API responses and fixed cosmetics here and there. I can probably say I ported all the old code to the new API with maybe a couple of features missing that I will add in the following days. A major change on the client was switching from the Google Charts JS library to static images to display the candle sticks. My initial thought was it would be good to offload the chart creation to the client to lessen the load on the server, but it this turned out to have 2 disadvantages: 1. slower mobile clients take forever to render the chart (my Samsung tablet). 2. a ton of charting data is transferred to the client which slows the page loading. So I struggled for a day with the excellent matplotlib for python to get a nice candle chart with a volume overlay and I think it turned out quite well.
Screenshot 2018-11-19 at 10.31.13 PM.png
Before this was completed I used a chart from Finviz as a placeholder and inspiration. I also managed to squeeze in the android client build by using an excellent plugin for Vue which was quite painless to setup. I side loaded the app on my phone and tablet and they seem to work great. After loading the apps I realized that some things like pull to refresh were missing. It's essential to convert to a mobile app and try it out to get a good feel for the user experience, even though I'm actually developing the client in a browser. My plan for the upcoming days are using the app to iron out some more user experience quirks, then I need to get into StockTwits integration and start with marketing stuff. The launch timing seems quite bad as the markets have taken a turn for the worse - or maybe people will be searching for opportunities in this turmoil and can use TechScan to seek out these opportunities?

Here is the latest look:
Screenshot 2018-11-19 at 10.26.39 PM.png

Re: Strix Devlogs

by dendiz » Fri Nov 02, 2018 9:48 pm

Well I’ve been offline for about 2 weeks due to a life changing event: My baby girl. The past 2 weeks have been adjusting to a new life style, new sleep cycle and new tasks. I’ve become a master diaper changer after changing almost 150 diapers :) I tried to set the launch date for techscan to be the 15th of October for a reason: I knew that I would not have the time for at least a couple of weeks after a new baby, so I was hoping that the rest of the team could continue with UI improvements and marketing while I was offline. Of course this did not happen, they have also been offline the whole time. I just can’t get my expectations met with these guys. Just before the arrival I was toying around with the idea of porting the engine code to python. My reasons were:
  • I like to procrastinate and experiment
  • The Java code base was getting out of hand with 30k lines of code (mostly boiler plate)
  • The memory consumption of the Java code is high
  • The application is single threaded so no point in not using python
  • TA-Lib has bindings for python
It was pretty easy porting the code, I’m complete with the major parts. Memory consumption decreased from 4G to 1G, code is so much more concise and less. I can work on it from my iPad with an SSH connection. I have easy low level access to the database so it’s faster and easier to do stuff in SQL. I’ve created this version of the engine on my own git server as I am not planning to share this. I think I’ll be continuing on my own from now on, as I believe the old version and the old project will be forgotten by the other members of the team in time.

Re: Strix Devlogs

by dendiz » Sat Oct 13, 2018 9:48 am

  • more on team dynamics - resolution
I have been considering the situation of the team over the past week, and I also had a talk with both my team mates about the situation.
The talks were very positive and their attitudes during these talks convinced me that they are doing their best towards launching this.
I also realized that my expectations were too high of them, but this in itself is not a bad thing.

The worst part of this whole charade was that I decided to do a project with friends. This is a bad decision. Becoming friends with colleagues
is fine, but the other way around is not - but too late to change that. Anyways the gist of talks were motivational issues caused by the lack of design
as all humans will be deterred by an ugly site - which I believe too, but choosing between functionality and design and the number of features we had to
give up on one. In the end I convinced them to give up on the number of features, and introduced a new design with Vuetify. This fresh look boosted up
morale that's for sure. And I went extreme with trimming the features - the first version of the site is read-only. No user input at all including
login and user registration. It's no use to register to a site if you are not have personal data stored anyways. This seemed quite extreme when I first
said it out aloud, but now it makes so much sense. One interesting point I realized during this is that a clean look for the site has more impact on
perception and motivation than I would have considered. O. said that it felt like a hobby project after a I said that we should not be focusing on design
anymore - like an ugly project would not attract any users so it doesn't feel like a real project. I kind of under stand the view point but as in many things
it's important to find the optimal balance.

Re: Strix Devlogs

by dendiz » Wed Oct 10, 2018 8:51 pm

  • rants about team dynamics
Working with a team on anything is a problem in its own. One of the most difficult parts,
a part that has a lot of moving parts and delicate balances is cooperating with people.
I started working on Strix alone, then paused for sometime and did something else. Then I
resurrected the project and took on two other guys (old friends) to work with me on the project.
Let's call them G. and O. for short. I like bouncing of ideas off of other people and discussion lead to progress.
But the people you choose for this are of importance. It's not like I had a lot of options to choose from,
as I'm not hiring anyone and I need to be able to trust / get along with the people I work with.
I knew that O had no experience in developing a product or marketing. I knew the capabilities of
G were limited or to put it another way he worked slowly, but I believed or wanted to believe that
passion could overcome these - both the inexperience and the slow progress. I believed this because
this is how it has always been for me. If I don't know how machine learning works and I want to use it
for something, I will obsess over it until I can do what I need to do, or until I'm burnt out and cannot
continue anymore. At this point it means that I've lost interest in that particular subject and move on
to something else. If I'm not interested in something I would acknowledge this up front and not even
get into it - if I'm in, I'm all in.

Now it's been around a month and a half of development on the web client, and I can't accept that
we have not come along far at all, and that during the last meeting the time frames pronounced to complete
the rest are twice as long. It's so simple to me, it's a weeks work or maybe two to get this done. I had
announced from the very beginning that the time frame to get this out is limited, and yet I still see days
where there is no activity on the code repository. I just cannot bring my self to accept this. Then there is also
the issue of prioritizing trivial things that are cosmetic, and the issue of simple one-liner things that are easily
found in the documentations taking days. Use case stories not being written in detail, no holistic thinking, no
detailed thinking just putting down a mashup of 2, 3 different sites as an incoherent mess as wire frames.
Testing being done on only what was supposed to be done, no extra defect issues being opened,
no methodological way of testing documentation for the web client, no understanding of agility and the need for it.

I think I could go on venting for a long time, but I have reached a point where I'm about to throw in the towel
on the team - not the project. The major problem is that these people are my friends so I don't want to let them down too harshly.
I guess this is one of the reasons why you don't want to form friendships with the people you work with and keep it as just colleagues.
This is not the first time I've done this, but it's the first time I'm so frustrated about it. So what I'm doing for certain is continuing
on my own. There is no need to carry dead weight. But the question is how to do the transition?. I could be upfront tell my
frustrations (which I have done before once, telling the team that the progress is slow, more on this later) or I could just stop
developing the thing on the shared repository. One might think that I'm communicating with them about all these issues but I have
repeatedly told them, but spoken and written that progress was slow, testing was inadequate and the design looks unprofessional.
Trimming down everything due to slow progress on the client side, left us with a plan for a "read-only" version of the site. And this
version will take 2 more months to complete? No way. I started ranting again :)
So back to the transition options. I think I'm going to be doing a mix of both. I'll wait for the deadline we had set when starting the project,
and after that has passed I'll just take the repository offline. Letting go of my team will also mean that I will be taking on the whole
financial risk, so I have to also think about optimizations on that. I can't have 5 servers running the application, I have to reduce it.
I will have only a third of the marketing budget, so I have to come up with other ways of marketing. We already had a limited budget
and needed these ways, but O. never came up with anything creative. So if I'm gonna have to come up with it anyways,
I don't need him on the team.

The only thing I'll miss I guess was having something to talk about and discuss on Slack - even though these discussions would
sometimes be pointless, there were times they were productive.

Re: Strix Devlogs

by dendiz » Wed Oct 10, 2018 8:50 pm


* stuff done in September 2018
* Jenkins automation
* persistence problems
* return to single thread mode
* scan refactoring and parameter change support
* scan performance improvements
* java 10 adventures
* parameter recommendation tests
* hybrid client developments

Wow, September was a month that I would only describe as "I lost my humanity and became a beast". After checking the git commit logs I see 480 new commits, touching 26410 lines for the API and Engine, which is quite a bit. There is basically too much stuff that I have done to cover in detail so I'll just have to go over the most import changes for this month.

First off I want to start with the automation tasks. I was running the BFX, IEX sync tasks, scanner tasks via crontab on the test machine. This was OK for some time, but it wasn't getting notifications about failed jobs or any information about the task duration. So I moved all these periodic engine tasks to Jenkins. Another added bonus is it's way easier to manage the task pipeline (which task should run after which task, which can run in parallel) from Jenkins rather than a bunch of ad-hoc shell scripts. I also changed the development cycle from committing code to master to committing first into a development branch, and then asking Jenkins via Hubbot through slack to test and merge and push the changes. This keeps the master branch at a stable state all the time. G. requested that any pushes to the master branch of the web client repository be deployed immediately so I had to do some GitLab trickery to get that working, as GitLab doesn't allow that for unpaid accounts. But it can be done via web hooks.

The current flow of crons are looking like this
1-m79Makp8mOHomhro.png (19.76 KiB) Viewed 713 times
Using spring boot is a MUST if you are developing with Java - the advantages it provides are countless, but it does come with it's own quirks. I had been getting "transaction manage not found - cannot remove entity" errors on some of the data sync services. The solutions on the internet were all about adding a transactional annotation to the service method and people had accepted those answers. But I guess something changed along the way in spring development as they did not work for me. The solution is was to add the annotation to the repository interface. Sometimes something so simple can eat up a lot of precious development time, but it's satisfying in the end when you solve it. As I was fiddling around with this I also decided to save the response JSON from the providers instead of parsing them into domain entities and saving those. Now instead of having 1 M daily data points, I have 8 K key value items and I process them in memory. I was caching the domain objects anyways so this spares some extra load on the database.

During development I wanted to get the engine results as fast as possible so I was parallelizing all the operations in the engine. I noticed along the way that "parallelStream" is not the answer to all of the question. I ran into cases where parallelizing would really screw up the cache, and serial processing with a good cache usage is way better (in terms of efficiency) than just using all the cores. I can get more bang for a core than I could when the processing was done in parallel. This decision was also in part due to the fact that I want to run the engine on the same machine as the API (yes, I am basically poor) and I can't have the engine hogging all the CPU and exhausting RAM. I can't also have the entire data set in memory (or most of it) and this tends to happen when threads are running at the same time. So with the serial processing using the cache in an optimal way, the scanning process takes 2.5 K seconds on a single core. Running in parallel took 600 seconds on 8 cores.

A prerequisite to serialize the scanners was to improve their performance. I installed the excellent YourKit profiler trail version (which by the way is very expensive - otherwise I would buy it) and tracked down the bottle necks to unnecessary object wrapping (using my own SuperList class with convenience methods for accessing elements, like getLast(N), getTail(..), etc.) and the parsing of string to dates and vice versa. After hunting these down and refactoring the code to get rid of extra layers and work, there was a 3x increase in speed which brings the processing time to acceptable limits.

An integral part of the system is the performance evaluator for scans based on past performance. I had coded this in a hurry and it was a bit disgusting, so I refactored this into it's own service. The functionality is the same but it's simplified and runs a bit faster.

I hate the verbosity of Java so I thought I'd look into what was going on in the Java world, which I hadn't done since the release of Java 8 and the streaming API. I found out that Java 10 was released March 2018 and finally had support for the "var" keyword, which meant less verbosity in assignments. I know it's kind of trivial but I wanted to give it a go, since spring boot it supposed to support Java 10. So I went ahead and updated the development and test environments to Java 10 and compiled the code. A couple of warnings of unsafe access in spring but otherwise everything seemed OK. That is until I tried running the API. Then a lot of exceptions about Redis not being happy with something (which I don't remembers, and could not find an easy solution for), so I reverted back to 8, which is way more stable. I have long lines of code, but at least everything works correctly :).

A great Idea that I had for the engine was parameter optimizations. RSI oversold is 80, but why? why not 60?
I guess the guy who invented it had success with 80 and kept it as a default and now everybody uses that. But wouldn't it be great if I could figure out the optimal number by scanning the past RSI turning points? That's what parameter optimization/recommendation is about. There are a LOT of combinations to calculate so this feature has to be selective on the symbols and the parameter values that it tries. I still have to bake this idea, but I did put down a PoC service that does this.

I also started with the mobile client development using PhoneGap and Framework 7 with Vue.js. Vue.js is pretty amazing - at last someone has come up with a good framework for JavaScript development. I used to be a Mithril person but the lack of templating and using that m() function was a bit annoying. Also the component system in Mithril is complicated and not easy as Vue.js. Framework7 looks OK'ish but there are some quirks with it too. I ran into a problem with the router, that would break the back button randomly. It's surprisingly difficult to find answers to question - I guess they don't have a large community so I'll probably drop F7 in favor Vuetify.

That was a long post, but I guess this is how it's gonna be if I do it monthly. I still want to do a write up about the team dynamics, and the frustration that I'm having with that.

Re: Strix Devlogs

by dendiz » Wed Oct 10, 2018 8:46 pm

title: "devlog #5"
date: 2018-09-05

  • current status
Current status

It's been quite a while since I wrote a devlog, and quite a lot has happened in the project in between. I'll try to merge the stuff from
the commit logs and stuff I posted on mastodon to give an overview. I've been coding the scanner for most of the month. We consolidated
the scanner into categories, with some constraints on the combinator based on categories, so that you don't have 2 scanners of the same
category combined. Also you don't want bullish/bearish scanner combinations. There are 70 scanners and it's hell to have to change anything
on the scanner interface. I've partially solved this problem by extending the scanners from an abstract class, but for a feature called
parameter customization I still have to go over almost all scanners and add their default parameters. I've also encountered a lot of
performance issues with scanner performance calculations. Doing the calculation based on last years data took quite a while when I was
using the scan results from the database, so I decided to truncate the scan results kept in the database to a months worth of data, and
do all scanner runs on a years worth of data online during the scanner performance calculations. Less DB round trips increased the performance.
There was also an issue that ran into a sort of N select problem, where the DB was queried in each iteration of the loop. I fixed this
by using an IN query to get all the stuff I needed before going into the loop.

I had started using MongoDB for raw data such as OHLCV data and scanner results, but I decided to get everything into MySQL as complex
queries on MongoDB are a pain. So I got rid of MongoDB for all modules in the project.

I have a couple of ideas for parameter recommendations for indicators based on the indicators values. I actually wanted to brute-force
my way though some parameter combos to find the best scoring combo for that symbol and indicator but that leads to an unmanagable number
of scans. Now I will try to recommend RSI/Stoch overbought/oversold parameters based on the number that actually was used as the turning
point in the charts.

There was progress on the web client as well. I setup a nice continuous integration server that deploys all pushed commits and I also
setup my router with DynDNS and port forwarding so that the team can use the test environment. I re-purposed my workstation as a Proxmox
host (something I had already done before) and I'm using my laptop as my workstation now. This kind of sucks because it's a weak weak
machine. I don't really want to spend money on a new work station right now, as I have received my EAD and I'm planning to start working
somewhere after I get this project live. So back to the web client, we have the main page almost setup, but since nobody in the team is
a designer or has any background related to design it doesn't look professional to me. This could be just a bias, I'm not sure. I put
out the idea of paying someone to design it, but it wasn't received well - probably because we can't visualize what we want and cannot
really tell the designer what he should do. The pricing page, login and registration pages exists and are functional, but not really
tested. Testing and product management is a weak point in the team.

So, product management for this project is actually quite simple: I expect wire frames, use cases, and some testing from our PM.
He has no experience, but I just can't understand why somebody can't simple research all this stuff and do it. I have to step in
at almost every step, and this is slowing us down and demotivating me, as I don't want to do this - that's the point of having a
product manager.

I've set a tight deadline to go live, per peter principle I want to keep everybody on their toes, but my current status on this is
yellow, that's why I'm also working on a B plan. The opportunity cost is just too high now that I have the EAD.

Re: Strix Devlogs

by dendiz » Wed Oct 10, 2018 8:43 pm

title: "devlog #4"
date: 2018-08-10

  • reboot the project
  • current status

Rebooting the project

A technical analyzer is one of those project I keep returning to. I had already laid down a nice foundation with TaLib4J for the technical
calculations so it's usually cleaning up the glue code that is changing with each new iteration. This time around I've taken a new approach
by building a team around the project to drive me to turn it into a product. I've created a multi module spring project with the following
modules: API, Databus, Engine, corelib. The API is the gateway for clients to access analysis data and takes care of user management.
The databus module is an API for the api module to access the data. The engine is a command line application that does the actual
calculations and the corelib is the module that contains common classes shared by the modules. After initial testing it turned out that
having the Databus as a separate service comes with severe performance penalties due to JSON data conversions etc, so I integrated the
databus module into the corelib module. You want to go service oriented until you realize that there are only 2 other services that
want to consume it and then you decide to integrate it back. Keeping the engine separate made sense though as it runs as a batch processor.

Current status

As of today there are 33 unique scanners, and I also run the 2 element combinations of these scanners for a total of 33c2 (528) scanners.
This produces around 6 M scan results for a 20 month worth of stock EOD data. I want to increase the combination count but I'm not sure
if the server I have in mind for production deployment can handle the amount. That's on my try this list and will report how it goes.
I have most of the basic user management features completed in the API, except for the integrations to payment and transactional emails.
I'm also adding performance evaluation for the scanners by means of running a test from the date the signal was generated going forward
and checking the price went up/down X ATRs confirming the signal. This calculation is affected by the amount of scan results so increasing
the combination count will have an impact on the performance calculation. I've asked a question on the math StackExchange forum about
calculating the conditional probability of 2 technical indicator but have yet to receive a satisfying answer. Being able to calculate a
reasonably (~%1 error maybe?) accurate approximation for this would mean that I do not have the actually run all the combinations to
get their scores. I'm also using an error rate calculation based on Z-tables to give a confidence interval on the scanner score.
A nice optimization I did was to keep all the stock OHLCV data in cache and use a binary search to query data between given dates instead
of hitting the DB for each time. In the earlier version I was actually keeping the raw OHLCV data in files but reading them into cache
takes longer than reading them from a DB plus using a DB also gives opportunities for querying in different ways which I need in the
Yesterday I noticed that one the scanners used 4 conditions to check for a signal ma5 > ma26 , ma26 > ma50, ma50 > ma200, stoch < 20.
This led me to the idea that I should actually make each of these conditions a scanner on it's own and brute force my way through
all of the combinations to reach the ideal scenario for each symbol by calculating the score. Maybe the best results for a stock are
when ma5 < ma26, ma26 > ma50, ... because there was a short term fall in the price for the stoch to reach a low etc.
I also implemented an optimization for the score calculator yesterday. Pre-optimization I was looping each symbol, fetching the scan
results for each symbol and calculating the score for the scanners from those results. This was doing too much DB round trips. I changed
it to looping through each scanner combination and storing the results of the scans in a map keyed by it's symbol. This is 1 less loop
and less DB requests.

Re: Strix Devlogs

by dendiz » Wed Oct 10, 2018 8:40 pm

title: "devlog #3"
date: 2018-06-01

  • UI design
A new month is here and time goes by fast. There is a theory about the
perception that the older you get to faster time seems to go by is due
to the fact that the percentage of time relative to your age gets smaller.
E.g a year when you are 5 years old is 20% of your age and is a long duration.
But when you are 40 it's only 2.5% of your age and it goes by faster.

So what's new? I've concentrated on the stock viewing page today. Every
parameter that I had hard coded during the design phase is now the
actual value it's supposed to be. So the main part of the stock view
page is complete. The Thymeleaf style does take a bit getting used to
but I have figured out everything I needed to do.

A nasty bug I came across was the dates of the technical and scans
was the date the user wanted to scan. Randomly I tried a date when testing
and I saw in the database that this was a weekend. So I fixed that by using
the last date in the data returned from the API.

Another related issue was that dates from the API were showing up as weekends.
This is due to the fact that the IEX API returns a string for the date
field and the JSON parser just uses the current timezone and this ends
up being a date before the date in the string. Easy fix, just set
the timezone to EST where the stock exchanges are.