tag:blogger.com,1999:blog-27030912216725441132024-02-06T21:03:19.337-08:00A ScratchpadBurak Bayramlihttp://www.blogger.com/profile/02849512629200782790noreply@blogger.comBlogger55125tag:blogger.com,1999:blog-2703091221672544113.post-2674333508354824982016-02-03T04:00:00.004-08:002016-02-03T04:01:37.732-08:00Dr. Shalizi's Book<div dir="ltr" style="text-align: left;" trbidi="on">
Being mentioned in the Acknowledgements section of Dr. Cosma Shalizi's excellent book <i>Advanced Data Analysis from An Elementary Point of View</i> (<a href="http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf">link</a>). It is an honor! I'd found a small mistake in a formula and informed Shalizi about it.<br />
<br /></div>
Burak Bayramlihttp://www.blogger.com/profile/02849512629200782790noreply@blogger.com0tag:blogger.com,1999:blog-2703091221672544113.post-8131078788758032712015-12-25T02:52:00.012-08:002020-08-30T12:04:39.434-07:00Some Tutorials in Turkish<div dir="ltr" style="text-align: left;" trbidi="on"><div><div><a href="https://burakbayramli.github.io/dersblog/linear">Lineer Cebir (Linear Algebra)</a></div><div><br /></div><div><a href="https://burakbayramli.github.io/dersblog/ode">Diferansiyel Denklemler (Ordinary Differential Equations)</a></div><div><br /></div><div><a href="https://burakbayramli.github.io/dersblog/calc_multi">Çok Değişkenli Calculus (Multivariable Calculus)</a></div><div><br /></div><div><a href="https://burakbayramli.github.io/dersblog/compscieng">Hesapsal Bilim (Computational Science)</a></div><div><br /></div><div><a href="https://burakbayramli.github.io/dersblog/stat">İstatistik, Yapay Öğrenim, Veri Analizi (Statistics, Machine Learning, Data Analysis)</a></div><div><br /></div><div><a href="https://burakbayramli.github.io/dersblog/tser">Zaman Serileri ve Finans (Time Series and Finance)</a></div><div><br /></div><div><a href="https://burakbayramli.github.io/dersblog/pde">Kısmi Diferansiyel Denklemler (Partial Differential Equations)</a></div><div><br /></div><div><a href="https://burakbayramli.github.io/dersblog/func_analysis">Fonksiyonel Analiz (Functional Analysis)</a></div><div><br /></div><div><a href="https://burakbayramli.github.io/dersblog/algs/index.html">Bilgisayar Bilim, Yapay Zeka (Computer Science, AI)</a></div><div><br /></div><div><a href="https://burakbayramli.github.io/dersblog/chaos">Gayri Lineer Dinamik ve Kaos (Non-Linear Dynamics and Chaos)</a></div><div><br /></div><div><a href="https://burakbayramli.github.io/dersblog/vision">Yapay Görüş (Computer Vision)</a></div><div><br /></div><div><a href="https://burakbayramli.github.io/dersblog/phy/index.html">Fizik</a></div><div><br /></div><div><a href="https://burakbayramli.github.io/dersblog/sk/index.html">IT, Bilisim</a></div><div><br /></div></div>
</div>
Burak Bayramlihttp://www.blogger.com/profile/02849512629200782790noreply@blogger.com0tag:blogger.com,1999:blog-2703091221672544113.post-43389211299121271532015-12-18T00:32:00.003-08:002016-05-13T04:48:23.611-07:00Backtesting<div dir="ltr" style="text-align: left;" trbidi="on">
For stock trading one usually needs a backtesting framework. I prefer Python; and here is a comprehensive list,<br />
<br />
<a href="http://quant.stackexchange.com/questions/8896/except-zipline-are-there-any-other-pythonic-algorithmic-trading-library-i-can-c">Link</a><br />
<br />
Heard about this link from <a href="http://qoppac.blogspot.co.uk/2015/12/pysystemtrade.html">here</a> - the author was announcing his own backtester.<br />
<br />
I just played with <a href="http://gbeced.github.io/pyalgotrade/">pyalgotrade</a>, and it looks good.<br />
<br /></div>
Burak Bayramlihttp://www.blogger.com/profile/02849512629200782790noreply@blogger.com0tag:blogger.com,1999:blog-2703091221672544113.post-50456179527853440612015-09-21T00:48:00.002-07:002017-05-04T05:36:04.411-07:00Python Code for the Algorithmic Trading Book<div dir="ltr" style="text-align: left;" trbidi="on">
I converted some of the code for Dr. Ernie Chan's <i>Algorithmic Trading</i> book into Python. It is open-sourced <a href="https://github.com/burakbayramli/quant_at">here</a>. Dr. Chan's mention about our project is <a href="http://epchan.blogspot.de/2015/09/interview-with-euan-sinclair.html">here</a> (the end of the post).<br />
<div>
<div>
<br /></div>
</div>
</div>
Burak Bayramlihttp://www.blogger.com/profile/02849512629200782790noreply@blogger.com0tag:blogger.com,1999:blog-2703091221672544113.post-66029938288818414442015-03-18T03:29:00.001-07:002015-03-18T03:29:53.074-07:00Data Science Done Well Looks Easy<div dir="ltr" style="text-align: left;" trbidi="on">
<a href="http://www.simplystatistics.org/2015/03/17/data-science-done-well-looks-easy-and-that-is-a-big-problem-for-data-scientists/">Link</a><br />
<br />
Data science has a ton of different definitions. For the purposes of this post I'm going to use the definition of data science we used when creating our Data Science program online. Data science is:<br /><br />Data science is the process of formulating a quantitative question that can be answered with data, collecting and cleaning the data, analyzing the data, and communicating the answer to the question to a relevant audience [..].<br /><br />A good data science project answers a real scientific or business analytics question. In almost all of these experiments the vast majority of the analyst's time is spent on getting and cleaning the data (steps 2-3) and communication and reproducibility (6-7). In most cases, if the data scientist has done her job right the statistical models don't need to be incredibly complicated to identify the important relationships the project is trying to find. In fact, if a complicated statistical model seems necessary, it often means that you don't have the right data to answer the question you really want to answer. One option is to spend a huge amount of time trying to tune a statistical model to try to answer the question but serious data scientist's usually instead try to go back and get the right data.<br /><br />The result of this process is that most well executed and successful data science projects don't (a) use super complicated tools or (b) fit super complicated statistical models. The characteristics of the most successful data science projects I've evaluated or been a part of are: (a) a laser focus on solving the scientific problem, (b) careful and thoughtful consideration of whether the data is the right data and whether there are any lurking confounders or biases and (c) relatively simple statistical models applied and interpreted skeptically.<br /><br />It turns out doing those three things is actually surprisingly hard and very, very time consuming. It is my experience that data science projects take a solid 2-3 times as long to complete as a project in theoretical statistics. The reason is that inevitably the data are a mess and you have to clean them up, then you find out the data aren't quite what you wanted to answer the question, so you go find a new data set and clean it up, etc. After a ton of work like that, you have a nice set of data to which you fit simple statistical models and then it looks super easy to someone who either doesn't know about the data collection and cleaning process or doesn't care.<br /><br />This poses a major public relations problem for serious data scientists. When you show someone a good data science project they almost invariably think "oh that is easy" or "that is just a trivial statistical/machine learning model" and don't see all of the work that goes into solving the real problems in data science. <br />
<br /></div>
Burak Bayramlihttp://www.blogger.com/profile/02849512629200782790noreply@blogger.com0tag:blogger.com,1999:blog-2703091221672544113.post-62282576007836639532014-11-12T01:29:00.000-08:002014-11-12T01:29:00.902-08:00Movielens, Funk SVD, Numba<div dir="ltr" style="text-align: left;" trbidi="on">
The Python version of Funk SVD coded with Numba (to execute at C speeds) of can be found <a href="http://sayilarvekuramlar.blogspot.de/2014/11/movielens-funk-svd-numba.html">here</a>.<br />
<br /></div>
Burak Bayramlihttp://www.blogger.com/profile/02849512629200782790noreply@blogger.com0tag:blogger.com,1999:blog-2703091221672544113.post-32916676344669178152014-10-16T00:44:00.000-07:002014-10-16T00:44:08.828-07:00emacs-ipython<div dir="ltr" style="text-align: left;" trbidi="on">
Here is an Emacs extension <a href="https://github.com/burakbayramli/emacs-ipython">emacs-ipython</a> that allows one to execute ipython code snippets from inside Emacs LaTeX buffer and display the results (as graphics or text) directly in the same buffer. The mode was developed to avoid the ipython notebook Web interface and ipynb files which are uneditable through simple text editors. This way best of both worlds is used, Emacs for editing TeX, and ipython for running code.<br />
<br /></div>
Burak Bayramlihttp://www.blogger.com/profile/02849512629200782790noreply@blogger.com0tag:blogger.com,1999:blog-2703091221672544113.post-55964582336195964872013-09-05T08:06:00.003-07:002013-09-05T08:06:28.920-07:00Google Paper and Data Science Tips<div dir="ltr" style="text-align: left;" trbidi="on">
Great <a href="http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/ja//pubs/archive/41159.pdf">paper</a> by Google with great tips.<br />
<br /></div>
Burak Bayramlihttp://www.blogger.com/profile/02849512629200782790noreply@blogger.com0tag:blogger.com,1999:blog-2703091221672544113.post-29619797154198464112013-08-22T07:17:00.001-07:002013-08-22T07:17:40.959-07:00What Hackers Should Know About Machine Learning <div dir="ltr" style="text-align: left;" trbidi="on">
<a href="http://www.fastcolabs.com/3016160/what-hackers-should-know-about-machine-learning">Link </a><br />
<br />
Data analysis as an exploratory endeavor should be the first part of
anything. You should never go into a project and say “The thing that I
want to do is classification so I'm always going to run my favorite
classification algorithm.” For the first half of the book we talk about
“Here's a dataset, here's how to clean it up.” The chapters that John
Miles White wrote on means, medians, modes, and distributions are always
the things that you should do in the beginning. We want to hammer home
that it's not just input-output. Input, look around, see what's going
on, find structure in the data, then make the choice for methods. And
then maybe iterate a couple of them. It's very cyclic. It's not linear [..]<br />
<br />My thinking has evolved on presenting results. The way I think
about presenting results now is always in the browser as an interactive
thing. There's a tremendous amount of value in providing the audience
with the ability to ask second-order questions about what they are
observing rather than first-order ones. Imagine the thing you are
looking at is just a simple scatterplot and you see one outlier. So a
first-order question would be who is that outlier? If you have an
interactive thing where you can go over the dot and it tells you who
that is, and the second order question is why is that an outlier?<br />
<br /></div>
Burak Bayramlihttp://www.blogger.com/profile/02849512629200782790noreply@blogger.com0tag:blogger.com,1999:blog-2703091221672544113.post-29190886276598920342013-08-05T00:42:00.003-07:002013-08-05T00:42:47.707-07:0010 Best Practices in Operational Analytics<div dir="ltr" style="text-align: left;" trbidi="on">
Great set of <a href="http://www.slideshare.net/jamet123/10-best-practices-in-operational-analytics-6871966">slides</a> on ensembles, feature engineering, data preperation.<br />
<br /></div>
Burak Bayramlihttp://www.blogger.com/profile/02849512629200782790noreply@blogger.com0tag:blogger.com,1999:blog-2703091221672544113.post-10434227665562986812013-07-10T04:51:00.001-07:002013-07-10T06:47:29.242-07:00Data Agnosticism: Feature Engineering Without Domain Expertise<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<br />
<iframe allowfullscreen="" frameborder="0" height="360" src="http://www.youtube.com/embed/bL4b1sGnILU?feature=player_embedded" width="400"></iframe><br />
<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmPiPgPe7E1WWio0HU_FHLY2VWKTLxnAazUoGYsMO-LkKdtE8IZdY47qGmrhwQm1RUZn7HL-osN5XrL39c80WLnuUODWZllIsoQfx4kveb5kaFxxKX4ba6TYYda15JjdN8TPvb6257wDU/s1600/Screenshot+from+2013-07-10+14:12:17.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="254" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmPiPgPe7E1WWio0HU_FHLY2VWKTLxnAazUoGYsMO-LkKdtE8IZdY47qGmrhwQm1RUZn7HL-osN5XrL39c80WLnuUODWZllIsoQfx4kveb5kaFxxKX4ba6TYYda15JjdN8TPvb6257wDU/s320/Screenshot+from+2013-07-10+14:12:17.png" width="320" /></a> </div>
Burak Bayramlihttp://www.blogger.com/profile/02849512629200782790noreply@blogger.com0tag:blogger.com,1999:blog-2703091221672544113.post-57178138170671191822013-05-02T13:46:00.002-07:002013-05-02T13:51:26.514-07:00Getting to know your data<div dir="ltr" style="text-align: left;" trbidi="on">
<i>Witten, Data Mining, Practical Machine Learning Tools and Techniques, pg 60</i><br />
<br />
There is no substitute for getting to know your data. Simple tools that show histograms of the distribution of values of nominal attributes, and graphs of the values of numeric attributes (perhaps sorted or simply graphed against instance number), are very helpful. These graphical visualizations of the data make it easy to identify outliers, which may well represent errors in the data file—or arcane conventions for coding unusual situations, such as a missing year as 9999 or a missing weight as -1 kg, that no one has thought to tell you about. Domain experts need to be consulted to explain anomalies, missing values, the significance of integers that represent categories rather than numeric quantities, and so on. Pairwise plots of one attribute against another, or each attribute against the class value, can be extremely revealing.<br />
<br />
Data cleaning is a time-consuming and labor-intensive procedure but one that is absolutely necessary for successful data mining. With a large dataset, people often give up—how can they possibly check it all? Instead, you should sample a few instances and examine them carefully. You’ll be surprised at what you find. Time looking at your data is always well spent.</div>
Burak Bayramlihttp://www.blogger.com/profile/02849512629200782790noreply@blogger.com0tag:blogger.com,1999:blog-2703091221672544113.post-65974100675716912572013-03-17T04:33:00.000-07:002013-03-17T04:33:05.721-07:00Data Science<div dir="ltr" style="text-align: left;" trbidi="on">
Typical data science analysis'<br />
<br />
Recommendation engines – increase cross-sell and repeat purchases by identifying other products in which a customer or prospect is likely to be interested<br /><br />Web analytics - advanced click-stream, golden path analysis, viewer engagement, segmentation, and more. <br /><br />Cross-channel marketing attribution – move beyond the skewed input of last click analysis to accurately determine campaign impact effectiveness across all channels<br /><br />Influencer analysis – understand whose actions have impact in the network to encourage the behavior of peers for purchases, attrition, or just engagement. <br /></div>
Burak Bayramlihttp://www.blogger.com/profile/02849512629200782790noreply@blogger.com0tag:blogger.com,1999:blog-2703091221672544113.post-11771012956311282792013-03-06T03:15:00.005-08:002013-03-06T03:15:55.832-08:00Practical machine learning tricks - KDD 2011<div dir="ltr" style="text-align: left;" trbidi="on">
<a href="http://blog.david-andrzejewski.com/machine-learning/practical-machine-learning-tricks-from-the-kdd-2011-best-industry-paper/">Link</a><br />
<br />
At first glance, this might appear to be a "Hello-World" machine
learning problem straight out of a textbook or tutorial: we simply
train a Naive Bayes on a set of bad ads versus a set of good ones.
However this is apparently <strong>far</strong> from being the case - while Google
is understandably shy about hard numbers, the paper mentions several
issues which make this especially challenging and notes that this is a
business-critical problem for Google.<br />
<br />
--<br />
<br />
There are many useful suggestions in this post. </div>
Burak Bayramlihttp://www.blogger.com/profile/02849512629200782790noreply@blogger.com0tag:blogger.com,1999:blog-2703091221672544113.post-56875844015790444972012-12-18T09:36:00.000-08:002012-12-18T09:37:15.289-08:00Geometry, Machine Learning and Deep Learning<div dir="ltr" style="text-align: left;" trbidi="on">
<a href="http://geomblog.blogspot.de/2012/12/nips-ii-deep-learning-and-evolution-of.html">Link </a><br />
<br />
Over the years [..] the way we construct mental models of data has changed. And as I've argued before, understanding how we think about data, and what shape we give it, is key to the whole enterprise of finding patterns in data.<br />
<br />
The model that one always starts with is Euclidean space. Data = points, features = dimensions, and so on. And as a first approximation of a data model, it isn't terrible.<br />
<br />
There are many ways to modify this space. You can replace the ℓ2 norm by ℓ1. You can normalize the points (again with ℓ2 or ℓ1, sending you to the sphere or the simplex). You can weight the dimensions, or even do a wholesale scale-rotation.<br />
<br />
But that's not all. Kernels take this to another level. You can encode weak nonlinearity in the data by assuming that it's flat once you lift it. In a sense, this is still an ℓ2 space, but a larger class of spaces that you can work with. The entire SVM enterprise was founded on this principle.<br />
<br />
But that's not all either. The curse of dimensionality means that it's difficult to find patterns in such high dimensional data. Arguably, "real data" is in fact NOT high dimensional, or is not generated by a process with many parameters, and so sparsity-focused methods like compressed sensing start playing a role.<br />
<br />
But it gets even more interesting. Maybe the data is low-dimensional, but doesn't actually lie in a subspace. This gets you into manifold learning and variants: the data lies on a low-dimensional curved sheet of some kind, and you need to learn<br />
on that space.<br />
<br />
While the challenge for geometry (and algorithms) is to keep up with the new data models, the challenge for data analysts is to design data models that are realistic and workable.<br />
<br />
So what does this have to do with deep learning ?<br />
<br />
Deep learning networks "work" in that they appear to be able to identify interesting semantic structures in data that can be quite noisy. But to me it's not entirely clear why that is [..].<br />
<br />
A central idea of [Deep Learning] work is that deep belief networks can be trained "layer by layer", where each layer uses features identified from the previous layer.<br />
<br />
If you stare at these things long enough, you begin to see a picture not of sparse data, or low-rank data, or even manifold data. What you see is a certain hierarchical collection of subspaces, where low-dimensional spaces interact in a low dimensional way to form higher level spaces, and so on. So you might have a low-level "lip" feature described by a collection of 2-3 dimensional noisy subspaces in an image space. These "lip" features in turn combine with "eye" features and so on.<br />
<br /></div>
Burak Bayramlihttp://www.blogger.com/profile/02849512629200782790noreply@blogger.com0tag:blogger.com,1999:blog-2703091221672544113.post-793173099270526112012-11-13T23:56:00.000-08:002012-11-14T02:28:57.042-08:00The Data Science Loop<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<a href="http://seanjtaylor.com/2012/09/18/the-data-science-loop/">Link </a><br />
<br />
Ask a good question.<br />
<br />
Answer the question while economizing on resources.<br />
<br />
Communicate your results.<br />
<br />
(Sometimes) Make recommendations to engineers or managers.<br />
<br />
Asking a good question is probably the hardest thing to get right. If
you neglect this step, you'll spend days of your life working on
something that will have little impact. It's a skill that people who
focus on technical training tend to be bad at [..].<br />
<br />
The real art to asking good questions is to <i>consider your audience</i>.
Who is going to be interested in the results and why are they going to
care? I find that the best questions have punchy answers, are usually
interesting to everyone, and usually affect a potential decision. On
the last point, the key is to think about how someone within your
organization might change their strategy due to your answer.<br />
<br />
Effectively answering questions is where technical skills become
important. It's easy to get caught up in fancy algorithms and methods,
but those approaches are usually <a class="reference external" href="http://c2.com/cgi/wiki?PrematureOptimization">premature optimizations</a>.
The best answers are 1) cheap and 2) easy to explain. Give me a table
of counts or event rates over regression coefficients or the first
eigenvector of your matrix decomposition. Perhaps it's a bit modest,
but I often describe data science as "advanced applied counting." [..] <br />
<br />
Fancy, new, and complicated are usually bad qualities for a method. Take it from <a class="reference external" href="https://twitter.com/jaykreps/status/219977241839411200">Jay Kreps</a>, "read current 3-5 pubs and note the stupid simple thing they all claim to beat, implement that."<br />
<br />
The other pattern I notice here is the unreasonable effectiveness of <a class="reference external" href="http://en.wikipedia.org/wiki/How_to_Solve_It">Polya's advice</a>
for solving a math problem, particularly this aphorism: "If you can't
solve a problem, then there is an easier problem you can solve: find
it." Paraphrased for data scientists, if there is a question you can't
answer, there is an easier question you can answer (usually counting
something!). <br />
<br />
I firmly believe that data scientists should not be engineers or
managers. Engineers build things, managers make decisions, data
scientists answer questions. This is not to trivialize the role of data
scientists, who plausibly account 2/3 of the steps in the <a class="reference external" href="http://lean.st/principles/build-measure-learn">build-measure-learn loop</a>. The answers can (and should) inform decisions that managers make and help engineers build better products, but answers <i>always</i> lead to more (and better!) questions.<br />
<br />
Don't let the data science technical jargon drive your impression of
what is actually done in the field. In my experience, it's a research
job where you have autonomy to ask and answer some really interesting
questions. The fundamental challenge is being savvy enough to pick good
questions and find concise answers using minimal resources. Then you
must convince everyone to listen to you about what you found. In many
ways it's similar to academic research, but the differences are that the
cycle is tighter and your answers will often effect changes in the
business. <br />
<br /></div>
Burak Bayramlihttp://www.blogger.com/profile/02849512629200782790noreply@blogger.com0tag:blogger.com,1999:blog-2703091221672544113.post-9099856873526216852012-08-15T14:10:00.003-07:002012-08-18T06:01:14.560-07:00Using T-Mobile USB Modem on Ubuntu in Germany<div dir="ltr" style="text-align: left;" trbidi="on">
The modem is Mobilcom Debitel.<br />
<br />
First install<br />
<br />
<span style="font-family: "Courier New",Courier,monospace;">sudo apt-get install usb-modeswitch usb-modeswitch-data wvdial</span><br />
<br />
Your /etc/wvdial.conf should contain<br />
<div style="font-family: "Courier New",Courier,monospace;">
<br /></div>
<div style="font-family: "Courier New",Courier,monospace;">
[Dialer Defaults]</div>
<div style="font-family: "Courier New",Courier,monospace;">
Phone = *99#</div>
<div style="font-family: "Courier New",Courier,monospace;">
Username = t-mobile</div>
<div style="font-family: "Courier New",Courier,monospace;">
Password = tm</div>
<div style="font-family: "Courier New",Courier,monospace;">
Stupid Mode = 1</div>
<div style="font-family: "Courier New",Courier,monospace;">
Dial Command = ATDT</div>
<div style="font-family: "Courier New",Courier,monospace;">
Modem = /dev/ttyUSB2</div>
<div style="font-family: "Courier New",Courier,monospace;">
<br /></div>
<div style="font-family: "Courier New",Courier,monospace;">
[Dialer tmo]</div>
<div style="font-family: "Courier New",Courier,monospace;">
Modem = /dev/ttyUSB2</div>
<div style="font-family: "Courier New",Courier,monospace;">
Baud = 460800</div>
<div style="font-family: "Courier New",Courier,monospace;">
Init1 = ATZ</div>
<div style="font-family: "Courier New",Courier,monospace;">
Init2 = ATQ0 V1 E1 S0=0 &C1 &D2 +FCLASS=0</div>
<div style="font-family: "Courier New",Courier,monospace;">
ISDN = 0</div>
<div style="font-family: "Courier New",Courier,monospace;">
Modem Type = Analog Modem</div>
<br />
Type<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;">
usb-devices</div>
<br />
Look at the list and find out vendor and product id. They will be used for -v and -p respectively. <br />
<br />
<div style="font-family: "Courier New",Courier,monospace;">
sudo usb_modeswitch -v [VENDOR] -p [PROD ID] -M '55534243123456780000000080000606f50402527000000000000000000000'</div>
<br />
<div style="font-family: "Courier New",Courier,monospace;">
sudo modprobe option</div>
<div style="font-family: "Courier New",Courier,monospace;">
<br /></div>
<div style="font-family: "Courier New",Courier,monospace;">
echo "1c9e [PRODUCT]" | sudo tee /sys/bus/usb-serial/drivers/option1/new_id</div>
<br />
Now<br />
<div style="font-family: "Courier New",Courier,monospace;">
<br /></div>
<div style="font-family: "Courier New",Courier,monospace;">
sudo wvdial tmo</div>
<br />
A couple of times I had to do this twice, at these times a dialogbox would open and I had to enter my (T-mobile) pin, and it said it "unlocked" the pin; after that, I didnt have to do it again. <br />
<br />
Some postings on the Internet suggest going into editing Ubuntu network connections, adding a connection for Mobile Broadband (seperate tab next to Wireless Network), and settings things there. I did not need this, using the commands above seem to suffice.</div>
Burak Bayramlihttp://www.blogger.com/profile/02849512629200782790noreply@blogger.com0tag:blogger.com,1999:blog-2703091221672544113.post-14415363917081076792012-03-03T05:17:00.002-08:002012-08-12T09:59:22.830-07:00Skillicorn Data Mining Book Matlab Code, Data<div dir="ltr" style="text-align: left;" trbidi="on">
We are trying to collect all relevant data and code for Skillicorn's <span style="font-style: italic;">Understanding Complex Datasets with Matrix Decomposition</span> book. We follow the links shared in the bibliography and get relevant code, data when possible. The ones we found are in the zip below, it will grow as we find more.<br />
<br />
<a href="https://github.com/burakbayramli/kod/tree/master/books/skillicorn_data_mining">Link</a></div>Burak Bayramlihttp://www.blogger.com/profile/02849512629200782790noreply@blogger.com0tag:blogger.com,1999:blog-2703091221672544113.post-84054238914819334112011-12-21T14:44:00.001-08:002012-08-15T23:56:26.282-07:00Automatic PDF Form Filler<div dir="ltr" style="text-align: left;" trbidi="on">
Filling out forms is one of my least favorite activities; especially for programmers / IT people who are used to doing everything electronically, looking at that form with pen in hand somehow brings time to a crawl. The boxes are always too small, if there are mistakes, you need to reprint, and repeat the whole thing again. By hand.<br />
<br />
Here is a collection of Python scripts that will allow you fill out a PDF form automatically. First, you need to convert the PDF to a collection of jpgs, using<br />
<br />
python convert.py DOC.pdf [target dir]<br />
<br />
In [target dir] you should now see DOC-0.jpg, DOC-1.jpg, etc.<br />
<br />
Then, you need to identify box locations. For that use locs.py<br />
<br />
python locs.py [target dir]/DOC-0.jpg<br />
<br />
This brings up a UI tool; as you click on boxes, the coordinates of those boxes will be written to a [target dir]/DOC-0.jpg.loc file. Make sure you click on the boxes in a logical order, most forms specify a number on the page for each box anyway, use that order for instance.The coordinates are written to the loc file as you click, so once you are done, simply shut off locs.py.<br />
<br />
Now in [target dir], start a new file called DOC-0.jpg.fill<br />
<br />
This file will carry the values to be used to fill our your PDF form. Each line in this file should correspond to the line specified in DOC-0.jpg.loc. The line orders must match. You can manually tell fill.py to skip pixels in up or down direction by using e.g.<br />
<br />
[down=40]bla bla bla<br />
<br />
You can also use up, left, right commands. If you need to change the font size, e.g. for size 20 use [font=20].<br />
<br />
Once that is done,<br />
<br />
python fill.py [target dir]/DOC-0.jpg<br />
<br />
This will use the loc file, fill file, and generate a final DOC-0.jpg-out.jpg<br />
<br />
In this file you will see stuff from fill file placed in proper coordinates.<br />
<br />
This tool uses ImageMagick, so make sure you install that first. Also, for the necessary Python libraries on Ubuntu you can use<br />
<br />
sudo apt-get install python python-tk idle python-pmw python-imaging python-imaging-tk<br />
<br />
An improvement to this code could be using a vision algorithm to automatically detect the location of each box. There is a certain visual pattern to a form -- words are in straight lines, there are big empty spaces in between, and the whole thing is usually surrounded by lines.<br />
<br />
<a href="https://github.com/burakbayramli/kod/tree/master/formfill">Download</a></div>
Burak Bayramlihttp://www.blogger.com/profile/02849512629200782790noreply@blogger.com1tag:blogger.com,1999:blog-2703091221672544113.post-52641377243368729292011-10-27T04:17:00.000-07:002011-10-27T04:23:30.604-07:00Optflow C++Here is a slimmed down mainline C++ code that uses Seppo Pulkkinen's <a href="http://code.google.com/p/optflow">optflow</a> library. This library uses CImg internally.<pre><br />#include "CImg_config.h"<br />#include <CImg.h><br />#include <sstream><br />#include <string><br /><br />#include "DenseVectorFieldIO.h"<br />#include "DualDenseMotionExtractor.h"<br />#include "PyramidalLucasKanade.h"<br />#include "SparseVectorFieldIO.h"<br />#include "VectorFieldIllustrator.h"<br /><br />using namespace cimg_library;<br /><br />int main() {<br /><br /> CImg< unsigned char > I1("../examples/test1.png");<br /> CImg< unsigned char > I2("../examples/test2.png");<br /><br /> const int W = I1.dimx();<br /> const int H = I1.dimy();<br /> CImg< unsigned char > I1_smoothed;<br /> CImg< unsigned char > I2_smoothed;<br /> CImg< unsigned char > motionImageF(W, H, 1, 3);<br /> CImg< double > VF, VB;<br /><br /> I1_smoothed = I1.get_channel(0);<br /> I2_smoothed = I2.get_channel(0);<br /><br /> motionImageF.get_shared_channel(0) = I1_smoothed * 0.75;<br /> motionImageF.get_shared_channel(1) = I1_smoothed * 0.75;<br /> motionImageF.get_shared_channel(2) = I1_smoothed * 0.75;<br /><br /> I1_smoothed.blur(3.0, 3.0, 3.0);<br /> I2_smoothed.blur(3.0, 3.0, 3.0);<br /><br /> DenseMotionExtractor* e = new PyramidalLucasKanade(8,3,0.0025,0.0,4,true);<br /> e->compute(I1_smoothed, I2_smoothed, VF, VB);<br /> printf("%f\n",VF[100,100]);<br /><br /><br /> return 0;<br />}<br /></pre><br />To compile drop this file under lib, run make, create the so, then compile as<br /><br />export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:.<br />/usr/bin/c++ -L. -Doptflow_EXPORTS -fPIC -I. -Wall -O2 -frounding-math \<br />-loptflow -o main main.cppBurak Bayramlihttp://www.blogger.com/profile/02849512629200782790noreply@blogger.com0tag:blogger.com,1999:blog-2703091221672544113.post-73438417803522159492011-10-06T03:11:00.000-07:002011-10-06T03:18:02.406-07:00Mumford on Math"Mathematicians believe in this Platonic universe in that, there is a pre-existing bunch of facts which are true and you never invent anything, you are discovering".<br /><br /><iframe src="http://www.youtube.com/embed/7WrhzkBYiwM" allowfullscreen="" frameborder="0" height="315" width="420"></iframe>Burak Bayramlihttp://www.blogger.com/profile/02849512629200782790noreply@blogger.com0tag:blogger.com,1999:blog-2703091221672544113.post-47695114720536808702011-10-01T05:06:00.000-07:002017-10-28T05:22:58.941-07:00Optical Flow, Lucas Kanade in Python<div dir="ltr" style="text-align: left;" trbidi="on">
Following is the Lucas Kanade optical flow algorithm in Python. We used it successfully on two png images, as well as through OpenCV to follow a point in successive frames. More details are at <a href="https://github.com/burakbayramli/classnotes/tree/master/pde/pde_lk">Github</a>.<br />
<pre>import numpy as np
import scipy.signal as si
from PIL import Image
def gauss_kern():
h1 = 15
h2 = 15
x, y = np.mgrid[0:h2, 0:h1]
x = x-h2/2
y = y-h1/2
sigma = 1.5
g = np.exp( -( x**2 + y**2 ) / (2*sigma**2) );
return g / g.sum()
def deriv(im1, im2):
g = gauss_kern()
Img_smooth = si.convolve(im1,g,mode='same')
fx,fy=np.gradient(Img_smooth)
ft = si.convolve2d(im1, 0.25 * np.ones((2,2))) + \
si.convolve2d(im2, -0.25 * np.ones((2,2)))
fx = fx[0:fx.shape[0]-1, 0:fx.shape[1]-1]
fy = fy[0:fy.shape[0]-1, 0:fy.shape[1]-1];
ft = ft[0:ft.shape[0]-1, 0:ft.shape[1]-1];
return fx, fy, ft
import matplotlib.pyplot as plt
import numpy as np
import scipy.signal as si
from PIL import Image
import deriv
import numpy.linalg as lin
def lk(im1, im2, i, j, window_size) :
fx, fy, ft = deriv.deriv(im1, im2)
halfWindow = np.floor(window_size/2)
curFx = fx[i-halfWindow-1:i+halfWindow,
j-halfWindow-1:j+halfWindow]
curFy = fy[i-halfWindow-1:i+halfWindow,
j-halfWindow-1:j+halfWindow]
curFt = ft[i-halfWindow-1:i+halfWindow,
j-halfWindow-1:j+halfWindow]
curFx = curFx.T
curFy = curFy.T
curFt = curFt.T
curFx = curFx.flatten(order='F')
curFy = curFy.flatten(order='F')
curFt = -curFt.flatten(order='F')
A = np.vstack((curFx, curFy)).T
U = np.dot(np.dot(lin.pinv(np.dot(A.T,A)),A.T),curFt)
return U[0], U[1]</pre>
</div>
Burak Bayramlihttp://www.blogger.com/profile/02849512629200782790noreply@blogger.com8tag:blogger.com,1999:blog-2703091221672544113.post-45747192045815921442011-08-22T07:00:00.001-07:002011-08-29T12:24:24.829-07:00Plotting a Complex ExponentialWe rewrote one of the MIT OCW 18.03 ODE <a href="http://math.mit.edu/mathlets/mathlets/complex-exponential/">Mathlets</a> in Python. This mathlet was for plotting complex exponentials.
<br /><pre class="prettyprint">from pylab import *
<br />from matplotlib.widgets import Slider
<br />
<br />ax = subplot(121)
<br />subplots_adjust(left=0.1, bottom=0.25)
<br />l1, = plot(None,None, lw=2, color='red')
<br />axis([-1, 1, -8, 8])
<br />title ('$(a + bi)t$', color='blue')
<br />grid()
<br />
<br />ax = subplot(122)
<br />subplots_adjust(left=0.1, bottom=0.25)
<br />l2, = plot(None,None, lw=2, color='red')
<br />axis([-3, 3, -3, 3])
<br />title ('$e^{(a + bi)t}$', color='blue')
<br />grid()
<br />
<br />axcolor = 'lightgoldenrodyellow'
<br />axa = axes([0.15, 0.1, 0.65, 0.03], axisbg=axcolor)
<br />axb = axes([0.15, 0.15, 0.65, 0.03], axisbg=axcolor)
<br />
<br />slidera = Slider(axa, 'a', -1.0, 1.0, valinit=0)
<br />sliderb = Slider(axb, 'b', -8.0, 8.0, valinit=0)
<br />
<br />def update(val):
<br /> a = slidera.val
<br /> b = sliderb.val
<br /> t = arange(-1.0, 1.0, 0.001)
<br /> l1.set_xdata(t)
<br /> l1.set_ydata((b/a)*t)
<br />
<br /> t = arange(-3.0, 3.0, 0.001)
<br /> l2.set_xdata(exp(a*t)*cos(b*t))
<br /> l2.set_ydata(exp(a*t)*sin(b*t))
<br /> draw()
<br />
<br />slidera.on_changed(update)
<br />sliderb.on_changed(update)
<br />
<br />show()
<br /></pre><div style="text-align: center;"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrHKkc6dViuLm-_2pB6aUhlFGzWzcwiLkL-b7zu8WFHVDzrXxoCJJcWLyYMV4doCnOxAP1zzwkdyH4a4tZTsntAUz44MUqxMGP37h540B06Upl1q4gZGL608D7jp75idFp3ycsoPCl29Q/s400/compexp.png" width="300px" />
<br /></div>
<br />Burak Bayramlihttp://www.blogger.com/profile/02849512629200782790noreply@blogger.com0tag:blogger.com,1999:blog-2703091221672544113.post-34515168153212016172011-07-23T09:18:00.001-07:002012-08-16T15:43:10.536-07:00Clustering, Image Segmentation, Eigenvectors and Python<div dir="ltr" style="text-align: left;" trbidi="on">
Here is example code for eigenvector based segmentation in Python. For more details, see <a href="https://github.com/burakbayramli/classnotes/tree/master/app-math-tr/eigseg">code</a> here.<br />
<pre>import matplotlib.pyplot as plt
import numpy as np
Img = plt.imread("twoObj.jpg")
n = Img.shape[0]
Img2 = Img.flatten(order='C')
nn = Img2.shape[0]
A = np.zeros((nn,nn))
for i in range(nn):
for j in range(nn):
A[i,j]=np.exp(-((Img2[i]-Img2[j])**2))
V,D = np.linalg.eig(A)
V = np.real(V)
a = np.real(D[0])
print a
threshold = 0 # filter
a = np.reshape(a, (n,n))
Img[a<threshold] = 255
plt.imshow(Img)
plt.show()</pre>
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEik1OqIn7kFHmzlS15I0bQQiMc1__9c7hyZ3614YlwWQlEP2NTa7OGxFZQYFbR0StWe-F5IMbfM08JiwBF8FF41UhXFM4YWYvf5LMj2wwEahVKTKrgz-weOHxDesfVjHbfvTjANWDo3XPg/s1600/twoObj.jpg"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5632583524549326018" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEik1OqIn7kFHmzlS15I0bQQiMc1__9c7hyZ3614YlwWQlEP2NTa7OGxFZQYFbR0StWe-F5IMbfM08JiwBF8FF41UhXFM4YWYvf5LMj2wwEahVKTKrgz-weOHxDesfVjHbfvTjANWDo3XPg/s400/twoObj.jpg" style="cursor: pointer; display: block; height: 62px; margin: 0px auto 10px; text-align: center; width: 62px;" /></a><br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjm9xLFPupjfh_lI8YqXSCNGwl0oXy7yvLJrYpnpkSYYZ279gJf2thDP5_lR_VpsLpFa_Rx89Mc72NFwqPerFn65D6JM1qdVBX3cdS0u8Rl-OLsCvZ37aSTxJlBee1pzwInS5pb1kqH7tE/s1600/eigseg.png"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5632583623387579154" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjm9xLFPupjfh_lI8YqXSCNGwl0oXy7yvLJrYpnpkSYYZ279gJf2thDP5_lR_VpsLpFa_Rx89Mc72NFwqPerFn65D6JM1qdVBX3cdS0u8Rl-OLsCvZ37aSTxJlBee1pzwInS5pb1kqH7tE/s400/eigseg.png" style="cursor: pointer; display: block; height: 80px; margin: 0px auto 10px; text-align: center; width: 107px;" /></a></div>
Burak Bayramlihttp://www.blogger.com/profile/02849512629200782790noreply@blogger.com0tag:blogger.com,1999:blog-2703091221672544113.post-26211116070204775322011-04-09T04:01:00.001-07:002017-10-28T05:56:22.880-07:00Myers-Briggs Test in Javascript / Python<div dir="ltr" style="text-align: left;" trbidi="on">
Hunch.com apparently uses this method - Myers-Briggs Test is a psychology, profile evaluation system, expanded upon later by David Keirsey. We coded the evaluation scheme in Python, making a few additions. In the original version in David Keirsey book <span style="font-style: italic;">Please Understand Me II</span>, the answer to each question is either A, or B, these choices in Python code as -1, +1 then sum up appropiate array values. Recent versions of this questionaire carry more (sometimes even five) choices. We noticed an additional 'neutral' choice can made an improvement, our version carries 3 answers. Eval code substitutes -1 for A, +1 for B, and 0 for neutral.<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhk113qxrNemyevyVmyset79LiduUBq9FnuLBnQrfN653ojzC5TImVOxZh04a7vjtPkk-GKfPjA7fidcH7kFx5FhRXKw4FrVoBkI_axrtRJMt8EyZ8P8NEnCG8nRYh_EAZMY6_jz9R0Ues/s1600/myersbriggs.png"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5593539965521546706" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhk113qxrNemyevyVmyset79LiduUBq9FnuLBnQrfN653ojzC5TImVOxZh04a7vjtPkk-GKfPjA7fidcH7kFx5FhRXKw4FrVoBkI_axrtRJMt8EyZ8P8NEnCG8nRYh_EAZMY6_jz9R0Ues/s400/myersbriggs.png" style="cursor: pointer; display: block; height: 374px; margin: 0px auto 10px; text-align: center; width: 400px;" /></a>An example of the evaluation algorithm Keirsey uses in his book is above. We simply generate indexes that correspond to the columns seen above (answers arrive in a straight list, numbered from 1 to 70), then do the addition.<br />
<pre>def calculate_mb(choices):
new_choices = []
for i in range(1,8):
new_choices.append([int(choices[j-1]) for j in range(i,71,7) ])
res = list("XXXX")
ei = sum(new_choices[0])
if ei < 0: res[0] = 'E'
else: res[0] = 'I'
sn = sum(new_choices[1]) + sum(new_choices[2])
if sn < 0: res[1] = 'S'
else: res[1] = 'N'
tf = sum(new_choices[3]) + sum(new_choices[4])
if tf < 0: res[2] = 'T'
else: res[2] = 'F'
jp = sum(new_choices[5]) + sum(new_choices[6])
if jp < 0: res[3] = 'J'
else: res[3] = 'P'
logging.debug(choices)
return str(''.join(res))</pre>
<br />
Another version, in HTML using Javascript, can be found <a href="https://github.com/burakbayramli/kod/blob/master/guide/doc/mbti_en.html">here</a>.<br />
<br /></div>
Burak Bayramlihttp://www.blogger.com/profile/02849512629200782790noreply@blogger.com0